h5cube File Specification¶
Gaussian \(\small \textbf{CUBE}\) files are a common format for storing molecular geometric and
volumetric field data from quantum/computational chemical calculations. The data
in these files is stored as plain text, and thus their compressibility by
standard tools such as gzip
and bzip2
is limited, even in cases where
the numerical structure of the dataset itself might be well suited for
greater compression by other means.
It is the purpose of this document to define a file specification for storing
the data contained in Gaussian \(\small \textbf{CUBE}\) files in the binary
HDF5 format. HDF5
was chosen due to its acceptance across numerous disciplines as a standardized,
cross-platform data storage format, and for the free, cross-platform compression
algorithms integrated into it
(see here ).
In circumstances where ‘semi-lossy’ compression (e.g., truncation of precision
and/or data thresholding) is
acceptable, particularly large reductions in file size are feasible.
Documentation at the companion Python project h5cube
(ReadTheDocs |
GitHub ) will eventually
illustrate representative compression factors achievable in the \(\small \textbf{h5cube}\)
file format; preliminary data can be found at this
Google Spreadsheet
.
Definition of an explicit specification for the \(\small \textbf{h5cube}\) format is anticipated to facilitate direct, cross-platform binary read and write of \(\small \textbf{CUBE}\) data. Particular advantages should be observed by applications aware of the HDF5 format and of this specification when reading data, gained from rapid retrieval of individual data points or subsets directly from \(\small \textbf{h5cube}\) files without the need to load the full dataset. Even in instances where the full dataset must be loaded into memory, the disk usage will still be significantly reduced in most cases, since it should be unnecessary to recreate an intermediate uncompressed \(\small \textbf{CUBE}\) file.
The Gaussian \(\small \textbf{CUBE}\) file format itself is also delineated here, using a “field”-style syntax for convenient cross-reference from the various version(s) of the \(\small \textbf{h5cube}\) specification.
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119. The specification versioning follows in the spirit of Semantic Versioning; see the root specifications page for more information.