Simulation
base section¶
In NOMAD, all the simulation metadata is defined in the Simulation
section. You can find its Python schema definition in src/nomad_simulations/general.py. This section will appear under the data
section for the archive metadata structure of each entry.
The Simulation
section inherits from a base section BaseSimulation
. In NOMAD, a set of base sections derived from the Basic Formal Ontology (BFO) are defined. We used them to define BaseSimulation
as an Activity
. The UML diagram is:
BaseSimulation
contains the general information about the Program
used, as well as general times of the simulation, e.g., the datetime at which it started (datetime
) and ended (datetime_end
). Simulation
contains further information about the specific input and output sections (see below) The detailed UML diagram of quantities and functions defined for Simulation
is thus:
Notation for the section attributes in the UML diagram
We included the information of each attributes / quantities after its definition. The notation is:
<name-of-quantity>: <type-of-quantity>, <units-of-quantity>
Thus, cpu1_start: np.float64, s
means that there is a quantity named 'cpu1_start'
of type numpy.float64
and whose units are 's'
(seconds).
We also include the existance of sub-sections by bolding the name, i.e.:
<name-of-sub-section>: <sub-section-definition>
E.g., there is a sub-section under Simulation
named 'model_method'
whose section defintion can be found in the ModelMethod
section. We will represent this sub-section containment in more complex UML diagrams in the future using the containment arrow (see below for an example using Program
).
We use double inheritance from EntryData
in order to populate the data
section in the NOMAD archive. All of the base sections discussed here are subject to the public normalize function in NOMAD. The private function set_system_branch_depth()
is related with the ModelSystem base section.
Main sub-sections in Simulation
¶
The Simulation
base section is composed of 4 main sub-sections:
Program
: contains all the program information, e.g.,name
of the program,version
, etc.ModelSystem
: contains all the system information about geometrical positions of atoms, their states, simulation cells, symmetry information, etc.ModelMethod
: contains all the methodological information, and it is divided in two main aspects: the mathematical model or approximation used in the simulation (e.g.,DFT
,GW
,ForceFields
, etc.) and the numerical settings used to compute the properties (e.g., meshes, self-consistent parameters, basis sets settings, etc.).Outputs
: contains all the output properties, as well as references to theModelSystem
used to obtain such properties. It might also contain information which will populateModelSystem
(e.g., atomic occupations, atomic moments, crystal field energies, etc.).
Self-consistent steps, SinglePoint entries, and more complex workflows.
The minimal unit for storing data in the NOMAD archive is an entry. In the context of simulation data, an entry may contain data from a calculation on an individual system configuration (e.g., a single-point DFT calculation) using only the above-mentioned sections of the Simulation
section. Information from self-consistent iterations to converge properties for this configuration are also contained within these sections.
More complex calculations that involve multiple configurations require the definition of a workflow section within the archive. Depending on the situation, the information from individual workflow steps may be stored within a single or multiple entries. For example, for efficiency, the data from workflows involving a large amount of configurations, e.g., molecular dynamics trajectories, are stored within a single entry. Other standard workflows store the single-point data in separate entries, e.g., a GW
calculation is composed of a DFT SinglePoint
entry and a GW SinglePoint
entry. Higher-level workflows, which simply connect a series of standard or custom workflows, are typically stored as a separate entry. You can check the NOMAD simulations workflow schema for more information.
The following schematic represents a simplified representation of the Simulation
section (note that the arrows here are a simple way of visually defining inputs and outputs):
Program
¶
The Program
base section contains all the information about the program / software / code used to perform the simulation. We consider it to be a (Continuant) Entity
and contained within BaseSimulation
as a sub-section. The detailed UML diagram is:
When writing a parser, we recommend to start by instantiating the Program
section and populating its quantities, in order to get acquainted with the NOMAD parsing infrastructure.
For example, imagine we have a file which we want to parse with the following information:
We can parse the program name
and version
by matching the texts (see, e.g., Wikipedia page for Regular expressions, also called regex):
from nomad.parsing.file_parser import TextParser, Quantity
from nomad_simulations import Simulation, Program
class SUPERCODEParser:
"""
Class responsible to populate the NOMAD `archive` from the files given by a
SUPERCODE simulation.
"""
def parse(self, filepath, archive, logger):
output_parser = TextParser(
quantities=[
Quantity('program_version', r'version *([\d\.]+) *', repeats=False)
]
)
output_parser.mainfile = filepath
simulation = Simulation()
simulation.program = Program(
name='SUPERCODE',
version=output_parser.get('program_version'),
)
# append `Simulation` as an `archive.data` section
archive.data.append(simulation)