.. highlight:: rst .. highlight:: python :linenothreshold: 5 Input / Output of numerical data ================================ In this section we would like to focus on a last but very important point once you will do real world numerical simulations. It's all about how to store numerical simulation data into files in an efficient and portable way. We will give a short introduction to a general purpose file format called HDF, the `Hierarchical Data Format`, which is carefully designed to store and organize large amounts of numerical data [#1]_. Introduction to HDF ------------------- When we said that HDF is a file format this is not the whole truth. HDF is rather some kind of scheme for creating file formats. Maybe we can say that in some aspects it's similar to e.g. html which is not a file format but a language for describing data. A HDF file has the following file structure and includes only two major types of objects: * Datasets, which are multidimensional arrays of a homogenous type * Groups, which are container structures which can hold datasets and other groups This results in a truly hierarchical, filesystem-like data format. In fact, resources (for now read: data sets) in an HDF file are even accessed using the POSIX-like syntax "/path/to/resource". Metadata is stored in the form of user-defined, named attributes attached to groups and datasets. In addition HDF includes an type system for specifying data types (are the numbers here 32 bit real floats or 64 bit complex floats or ...), and dataspace objects which represent selections over dataset regions. The API is object-oriented with respect to datasets, groups, attributes, types, dataspaces and property lists. By the way, currently there exist two major versions of HDF, HDF4 and HDF5, which differ significantly in design and API. We will only deal with HDF5 here. Handling HDF files with python ------------------------------ Until now we just talked about the ideas of HDF. Let's now look how to work with HDF files. The library that handles HDF files has interfaces to many programing languages and there are python bindings too. Here we will present the ``h5py`` python package. Let's just cite the official description from the ``h5py`` developers: "The HDF5 library is a versatile, mature library designed for the storage of numerical data. The h5py package provides a simple, Pythonic interface to HDF5. A straightforward high-level interface allows the manipulation of HDF5 files, groups and datasets using established Python and NumPy metaphors." The ``h5py`` project is located at http://www.h5py.org/ where you can find the python package as well as a very detailed user reference manual and many examples of usage. A first example --------------- Ok, enough theory by now, let's go to a simple hands-on example. Hopefully the code is self-explanatory with all these comments in place. .. literalinclude:: TutCodes/hdf_example.py :language: python HDFView ------- The program HDFView is a tool for browsing and editing HDF files. Using HDFView, you can view the internal file hierarchy in a tree structure, create new files, add or delete groups and datasets, view and modify the content of a dataset and much more. There are even some basic plotting facilities which can serve for a first glance at the numerical data. You can download ``hdfview`` from the URL http://www.hdfgroup.org/hdf-java-html/hdfview/. The installation should pose no problems, the software is written in the Java language and available for various operating systems. If we open the file created by the above example with the program ``hdfview`` then we will see something like the following image. On the left there is the structural tree shown with its two groups ``grid`` and `values`` and the five leaves containing the data. On the right the actual values of the data node ``valsx`` is shown as a two-dimensional table (remember that the data array was two-dimensional). And on the bottom we see some information about the data currently examined node, for example the data type (64 bit floats in this case) and the size of the array (here 100 times 100 entries). .. image:: images/hdfview_example.png :scale: 100 % Of course you can do much more with the HDF data format and we can only show the tip of this iceberg. For example, HDF supports transparent compression of the data, further descriptive attributes for nodes, special sorting and indexing of data for fast retrieval and many things more. For more information you should read the ``h5py`` user guide http://docs.h5py.org/en/2.3/. .. rubric:: Footnotes .. [#1] The people responsible for taking care of the HDF standard have their home at http://www.hdfgroup.org/.