Gaussian CUBE File Format

Disclaimer

The \(\small \textbf{CUBE}\) file format as described here is NOT an official specification, sanctioned by Gaussian, Inc. It is instead a best effort to define the contents of a representative subset of \(\small \textbf{CUBE}\) files in circulation. FILES FORMATTED TO THIS SPECIFICATION MAY NOT BE COMPATIBLE WITH ALL SOFTWARE SUPPORTING \(\small \textbf{CUBE}\) FILE INPUT.

Overview

The \(\small \textbf{CUBE}\) file format is described on the Gaussian webpage as part of the documentation of the cubegen utility [Gau16]. As noted there, all data in \(\small \textbf{CUBE}\) files MUST be stored in atomic units (electrons and Bohrs, and units derived from these).

The format specification on the webpage of the VMD visualization program [UIUC16] provides a cleaner layout of one possible arrangement of \(\small \textbf{CUBE}\) file contents. In particular, the Gaussian specification is ambiguous about whitespace requirements, so parsing of \(\small \textbf{CUBE}\) files SHOULD accommodate some variation in the format, including (i) variable amounts/types of whitespace between the values on a given line, and (ii) the presence of leading and/or trailing whitespace on a given line.

The \(\small \textbf{CUBE}\) file format as laid out below uses tagged fields ({FIELD (type)}) to indicate the types of the various data elements and where they are located in the file. Descriptions of the fields are provided below the field layout. Lowercase algebraic symbols \(\left(x\right.\), \(y\), \(\left. z\right)\) indicate coordinates in the frame of the molecular geometry, whereas uppercase algebraic symbols \(\left(X\right.\), \(Y\), \(\left. Z\right)\) indicate coordinates in the voxel grid defined by {XAXIS}, {YAXIS}, and {ZAXIS}.

All fields except for {DSET_IDS} and {NVAL} MUST be present in all files.

{DSET_IDS} MUST be present if {NATOMS} is negative; it MUST NOT be present if {NATOMS} is positive.

{NVAL} MAY be omitted if its value would be equal to one; it MUST be absent or have a value of one if {NATOMS} is negative.

Field Layout

{COMMENT1 (str)}
{COMMENT2 (str)}
{NATOMS (int)} {ORIGIN (3x float)} {NVAL (int)}
{XAXIS (int) (3x float)}
{YAXIS (int) (3x float)}
{ZAXIS (int) (3x float)}
{GEOM (int) (float) (3x float)}
      .
      .
{DSET_IDS (#x int)}
      .
      .
{DATA (#x scinot)}
      .
      .

Field Descriptions

{COMMENT1 (str)} and {COMMENT2 (str)}

Two lines of text at the head of the file. Per VMD [UIUC16], by convention {COMMENT1} is typically the title of the system and {COMMENT2} is a description of the property/content stored in the file, but they MAY be anything. For robustness, both of these fields SHOULD NOT be zero-length. As well, while there is no defined maximum length for either of these fields, both SHOULD NOT exceed 80 characters in length.

{NATOMS (int)}

The absolute value of this first field on the third line indicates the number of atoms \(N_A\) present in the system. A negative value indicates the \(\small \textbf{CUBE}\) file MUST contain the {DSET_IDS} line(s); a positive value indicates the file MUST NOT contain this/these lines.

The value of \(N_A\) also specifies the number of rows of molecular geometry data that MUST be present in {GEOM}.

The \(\small \textbf{CUBE}\) specification is silent as to whether a zero value is permitted for {NATOMS}; regardless, it is probable that many applications do not support \(\small \textbf{CUBE}\) files with no atoms. Accordingly, this specification hereby declares that {NATOMS} MUST be nonzero.

{ORIGIN (3x float)}

This set of three fields defines the displacement vector from the geometric origin of the system \(\left(0,0,0\right)\) to the reference point \(\left(x_0, y_0, z_0\right)\) for the spanning vectors defined in {XAXIS}, {YAXIS}, and {ZAXIS}.

{NVAL (int)}

If {NATOMS} is positive, this field indicates the number of data values \(N_V\) that are recorded at each point in the voxel grid; it MAY be omitted, in which case a value of one is assumed.

If {NATOMS} is negative, this field MUST be either absent or have a value of one.

{XAXIS (int) (3x float)}

The first field on this line is an integer indicating the number of voxels \(N_X\) present along the \(X\)-axis of the volumetric region represented by the \(\small \textbf{CUBE}\) file. This value SHOULD always be positive; whereas the input to the cubegen [Gau16] utility allows a negative value here as a flag for the units of the axis dimensions, in a \(\small \textbf{CUBE}\) file distance units MUST always be in Bohrs, and thus the ‘units flag’ function of a negative sign is superfluous. It is prudent to design applications to handle gracefully (viz., disregard the sign of) a negative value here, however.

The second through fourth values on this line are the components of the vector \(\vec X\) defining the voxel \(X\)-axis. As noted in the Gaussian documentation [Gau16], the voxel axes need neither be orthogonal nor aligned with the geometry axes. However, many tools only support voxel axes that are aligned with the geometry axes (and thus are also orthogonal). In this case, the first float value \(\left(X_x\right)\) will be positive and the other two \(\left(X_y\right.\) and \(\left.X_z\right)\) will be identically zero.

{YAXIS (int) (3x float)}

This line defines the \(Y\)-axis of the volumetric region of the \(\small \textbf{CUBE}\) file, in nearly identical fashion as for {XAXIS}. The key differences are: (1) the first integer field \(N_Y\) MUST always be positive; and (2) in the situation where the voxel axes aligned with the geometry axes, the second float field \(\left(Y_y\right)\) will be positive and the first and third float fields \(\left(Y_x\right.\) and \(\left.Y_z\right)\) will be identically zero.

{ZAXIS (int) (3x float)}

This line defines the \(Z\)-axis of the volumetric region of the \(\small \textbf{CUBE}\) file, in nearly identical fashion as for {YAXIS}. The key difference is that in the situation where the voxel axes are aligned with the geometry axes, the third float field \(\left(Z_z\right)\) will be positive and the first and second float fields \(\left(Z_x\right.\) and \(\left.Z_y\right)\) will be identically zero.

{GEOM (int) (float) (3x float)}

This field MUST have \(N_A\) rows of the below composition.

Each row of this field provides atom identity and position information for an atom in the molecular system of the \(\small \textbf{CUBE}\) file:

  • (int) - Atomic number of atom \(a\)
  • (float) - Nuclear charge of atom \(a\) (will deviate from the atomic number when an ECP is used)
  • (3x float) - Position of the atom in the geometric frame of reference \(\left(x_a, y_a, z_a\right)\)

{DSET_IDS (#x int)}

This field is only present if {NATOMS} is negative

This field comprises one or more rows of integers, representing identifiers associated with multiple {DATA} values at each voxel, with a total of \(m+1\) values present. The most common meaning of these identifiers is orbital indices, in \(\small \textbf{CUBE}\) files containing wavefunction data. The first value MUST be positive and equal to \(m\), to indicate the length of the rest of the list. Each of these \(m\) values may be any integer, with the constraint that all values SHOULD be unique. Further, all \(m\) values SHOULD be non-negative, as unpredictable behavior may result in some applications if negative integers are provided.

{DATA (#x scinot)}

This field encompasses the remainder of the \(\small \textbf{CUBE}\) file. Typical formatted \(\small \textbf{CUBE}\) output has up to six values on each line, in whitespace-separated scientific notation. Non-numeric data values are not supported and MUST NOT be present.

If {NATOMS} is positive, a total of \(N_X N_Y N_Z N_V\) values should be present, flattened as follows (in the below Python pseudocode the for-loop variables are iterated starting from zero):

for i in range(NX):
    for j in range(NY):
        for k in range(NZ):
            for l in range(NV):

                write(data_array[i, j, k, l])
                if (k*NV + l) mod 6 == 5:
                    write('\n')

        write('\n')

If {NATOMS} is negative and \(m\) datasets are present (see {DSET_IDS} above), a total of \(N_X N_Y N_Z m\) values should be present, flattened as follows:

for i in range(NX):
    for j in range(NY):
        for k in range(NZ):
            for l in range(m):

                write(data_array[i, j, k, l])
                if (k*m + l) mod 6 == 5:
                    write('\n')

        write('\n')

The sequence of the data values along the last (l) dimension of the data array for each i, j, k MUST match the sequence of the identifiers provided in {DSET_IDS} in order for the dataset to be interpreted properly.

Regardless of the sign of {NATOMS}, as illustrated above a newline is typically inserted after the block of data corresponding to each \(\left(X_i, Y_j\right)\) pair.