sampledoc's Scientific Python Tools

Table Of Contents

Previous topic

Gluing Stand-Alone Applications

Next topic

Combining Python with Fortran, C and C++

This Page

Input / Output of numerical data

In this section we would like to focus on a last but very important point once you will do real world numerical simulations. It’s all about how to store numerical simulation data into files in an efficient and portable way. We will give a short introduction to a general purpose file format called HDF, the Hierarchical Data Format, which is carefully designed to store and organize large amounts of numerical data [2].

Introduction to HDF

When we said that HDF is a file format this is not the whole truth. HDF is rather some kind of scheme for creating file formats. Maybe we can say that in some aspects it’s similar to e.g. html which is not a file format but a language for describing data.

A HDF file has the following file structure and includes only two major types of objects:

  • Datasets, which are multidimensional arrays of a homogenous type
  • Groups, which are container structures which can hold datasets and other groups

This results in a truly hierarchical, filesystem-like data format. In fact, resources (for now read: data sets) in an HDF file are even accessed using the POSIX-like syntax “/path/to/resource”. Metadata is stored in the form of user-defined, named attributes attached to groups and datasets.

In addition HDF includes an type system for specifying data types (are the numbers here 32 bit real floats or 64 bit complex floats or ...), and dataspace objects which represent selections over dataset regions. The API is object-oriented with respect to datasets, groups, attributes, types, dataspaces and property lists.

By the way, currently there exist two major versions of HDF, HDF4 and HDF5, which differ significantly in design and API. We will only deal with HDF5 here.

Handling HDF files with python

Until now we just talked about the ideas of HDF. Let’s now look how to work with HDF files. The library that handles HDF files has interfaces to many programing languages and there are python bindings too. Here we will present the h5py python package. Let’s just cite the official description from the h5py developers:

“The HDF5 library is a versatile, mature library designed for the storage of numerical data. The h5py package provides a simple, Pythonic interface to HDF5. A straightforward high-level interface allows the manipulation of HDF5 files, groups and datasets using established Python and NumPy metaphors.”

The h5py project is located at http://www.h5py.org/ where you can find the python package as well as a very detailed user reference manual and many examples of usage.

A first example

Ok, enough theory by now, let’s go to a simple hands-on example. Hopefully the code is self-explanatory with all these comments in place.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import numpy as np
import scipy as sp
import pylab as pl
import h5py as hdf

# A grid
x = np.linspace(-sp.pi,sp.pi,100)
X, Y = np.meshgrid(x,x)

# Some values
U = sp.sin(X)
V = sp.cos(X)


# Save the data into a hdf5 file
filename = "data.hdf5"

# Create and open the file with the given name
outfile = hdf.File(filename)

# Store some data under /axisgrid with direct assignment of the data x
outfile.create_dataset("axisgridx", data=x)

# Create a group for storing further data under /grid/*
grp_grid = outfile.create_group("grid")

# Store some data under /grid/grid* with direct assignment of the data X and Y
grp_grid.create_dataset("gridx", data=X)
grp_grid.create_dataset("gridy", data=Y)

# Store the vector field values under another group called /values
grp_vals = outfile.create_group("values")

# Prepare space for storing the values but do not assign values yet
grp_vals.create_dataset("valsx", U.shape, np.floating)
grp_vals.create_dataset("valsy", V.shape, np.floating)

# Now assign the data for U and V and show slicing via ":" and ellipsis "..."
outfile["/values/valsx"][:] = U
outfile["/values/valsy"][...] = V

# And close the file
outfile.close()


# Open the file again, now in read only mode (we don't want to write data anymore!)
infile = hdf.File(filename, "r")

# Get the data, but omit every second value, just to show off some slicing
a = infile["/grid/gridx"][::2,::2]
b = infile["/grid/gridy"][::2,::2]
c = infile["/values/valsx"][::2,::2]
d = infile["/values/valsy"][::2,::2]

# Plot the data as usual
pl.figure()
pl.quiver(a,b, c,d)
pl.savefig("hdf_example.png")

HDFView

The program HDFView is a tool for browsing and editing HDF files. Using HDFView, you can view the internal file hierarchy in a tree structure, create new files, add or delete groups and datasets, view and modify the content of a dataset and much more. There are even some basic plotting facilities which can serve for a first glance at the numerical data.

You can download hdfview from the URL http://www.hdfgroup.org/hdf-java-html/hdfview/. The installation should pose no problems, the software is written in the Java language and available for various operating systems.

If we open the file created by the above example with the program hdfview then we will see something like the following image. On the left there is the structural tree shown with its two groups grid and values` and the five leaves containing the data. On the right the actual values of the data node valsx is shown as a two-dimensional table (remember that the data array was two-dimensional). And on the bottom we see some information about the data currently examined node, for example the data type (64 bit floats in this case) and the size of the array (here 100 times 100 entries).

_images/hdfview_example.png

Of course you can do much more with the HDF data format and we can only show the tip of this iceberg. For example, HDF supports transparent compression of the data, further descriptive attributes for nodes, special sorting and indexing of data for fast retrieval and many things more. For more information you should read the h5py user guide http://docs.h5py.org/en/2.3/.

Footnotes

[2]The people responsible for taking care of the HDF standard have their home at http://www.hdfgroup.org/.