Simulation base section¶
In NOMAD, all the simulation metadata is defined in the Simulation section. You can find its Python schema definition in src/nomad_simulations/general.py. This section will appear under the data section for the archive metadata structure of each entry.
The Simulation section inherits from a base section BaseSimulation. In NOMAD, a set of base sections derived from the Basic Formal Ontology (BFO) are defined. We used them to define BaseSimulation as an Activity. The UML diagram is:
BaseSimulation contains the general information about the Program used, as well as general times of the simulation, e.g., the datetime at which it started (datetime) and ended (datetime_end). Simulation contains further information about the specific input and output sections (see below) The detailed UML diagram of quantities and functions defined for Simulation is thus:
Notation for the section attributes in the UML diagram
We included the information of each attributes / quantities after its definition. The notation is:
<name-of-quantity>: <type-of-quantity>, <units-of-quantity>
Thus, cpu1_start: np.float64, s means that there is a quantity named 'cpu1_start' of type numpy.float64 and whose units are 's' (seconds).
We also include the existance of sub-sections by bolding the name, i.e.:
<name-of-sub-section>: <sub-section-definition>
E.g., there is a sub-section under Simulation named 'model_method' whose section defintion can be found in the ModelMethod section. We will represent this sub-section containment in more complex UML diagrams in the future using the containment arrow (see below for an example using Program).
We use double inheritance from EntryData in order to populate the data section in the NOMAD archive. All of the base sections discussed here are subject to the public normalize function in NOMAD. The private function set_system_branch_depth() is related with the ModelSystem base section.
Main sub-sections in Simulation¶
The Simulation base section is composed of 4 main sub-sections:
- Program: contains all the program information, e.g.,- nameof the program,- version, etc.
- ModelSystem: contains all the system information about geometrical positions of atoms, their states, simulation cells, symmetry information, etc.
- ModelMethod: contains all the methodological information, and it is divided in two main aspects: the mathematical model or approximation used in the simulation (e.g.,- DFT,- GW,- ForceFields, etc.) and the numerical settings used to compute the properties (e.g., meshes, self-consistent parameters, basis sets settings, etc.).
- Outputs: contains all the output properties, as well as references to the- ModelSystemused to obtain such properties. It might also contain information which will populate- ModelSystem(e.g., atomic occupations, atomic moments, crystal field energies, etc.).
Self-consistent steps, SinglePoint entries, and more complex workflows.
The minimal unit for storing data in the NOMAD archive is an entry. In the context of simulation data, an entry may contain data from a calculation on an individual system configuration (e.g., a single-point DFT calculation) using only the above-mentioned sections of the Simulation section. Information from self-consistent iterations to converge properties for this configuration are also contained within these sections.
More complex calculations that involve multiple configurations require the definition of a workflow section within the archive. Depending on the situation, the information from individual workflow steps may be stored within a single or multiple entries. For example, for efficiency, the data from workflows involving a large amount of configurations, e.g., molecular dynamics trajectories, are stored within a single entry. Other standard workflows store the single-point data in separate entries, e.g.,  a GW calculation is composed of a DFT SinglePoint entry and a GW SinglePoint entry. Higher-level workflows, which simply connect a series of standard or custom workflows, are typically stored as a separate entry. You can check the NOMAD simulations workflow schema for more information.
The following schematic represents a simplified representation of the Simulation section (note that the arrows here are a simple way of visually defining inputs and outputs):
Program¶
The Program base section contains all the information about the program / software / code used to perform the simulation. We consider it to be a (Continuant) Entity and contained within BaseSimulation as a sub-section. The detailed UML diagram is:
When writing a parser, we recommend to start by instantiating the Program section and populating its quantities, in order to get acquainted with the NOMAD parsing infrastructure.
For example, imagine we have a file which we want to parse with the following information:
We can parse the program name and version by matching the texts (see, e.g., Wikipedia page for Regular expressions, also called regex):
from nomad.parsing.file_parser import TextParser, Quantity
from nomad_simulations import Simulation, Program
class SUPERCODEParser:
    """
    Class responsible to populate the NOMAD `archive` from the files given by a
    SUPERCODE simulation.
    """
    def parse(self, filepath, archive, logger):
        output_parser = TextParser(
            quantities=[
                Quantity('program_version', r'version *([\d\.]+) *', repeats=False)
            ]
        )
        output_parser.mainfile = filepath
        simulation = Simulation()
        simulation.program = Program(
            name='SUPERCODE',
            version=output_parser.get('program_version'),
        )
        # append `Simulation` as an `archive.data` section
        archive.data.append(simulation)