2 * This file is part of the GROMACS molecular simulation package.
4 * Copyright (c) 2018, by the GROMACS development team, led by
5 * Mark Abraham, David van der Spoel, Berk Hess, and Erik Lindahl,
6 * and including many others, as listed in the AUTHORS file in the
7 * top-level source directory and at http://www.gromacs.org.
9 * GROMACS is free software; you can redistribute it and/or
10 * modify it under the terms of the GNU Lesser General Public License
11 * as published by the Free Software Foundation; either version 2.1
12 * of the License, or (at your option) any later version.
14 * GROMACS is distributed in the hope that it will be useful,
15 * but WITHOUT ANY WARRANTY; without even the implied warranty of
16 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
17 * Lesser General Public License for more details.
19 * You should have received a copy of the GNU Lesser General Public
20 * License along with GROMACS; if not, see
21 * http://www.gnu.org/licenses, or write to the Free Software Foundation,
22 * Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
24 * If you want to redistribute modifications to GROMACS, please
25 * consider that scientific software is very special. Version
26 * control is crucial - bugs must be traceable. We will be happy to
27 * consider code for inclusion in the official distribution, but
28 * derived work must not be called official GROMACS. Details are found
29 * in the README & COPYING files - if they are missing, get the
30 * official version at http://www.gromacs.org.
32 * To help us fund GROMACS development, we humbly ask that you cite
33 * the research papers on the package. Check out http://www.gromacs.org.
36 #ifndef GMXAPI_WORKFLOW_H
37 #define GMXAPI_WORKFLOW_H
40 * \brief Declare public interface for Workflow and related infrastructure.
45 #include <forward_list>
54 * \brief Uniquely identify a workflow node in the graph.
56 * The key probably needs a human-readable aspect, some machine-decipherable encoding of roles taken by the node,
57 * and a hash to uniquely identify the output of the node (i.e. deterministic input parameters). It is probably not
58 * necessary for nodes to refer to the consumers of their output by key, but they should abstractly refer to their
59 * inputs by a key that is not dependent on a currently-running workflow.
61 * Requirements and roles:
63 * * serve as a key for use by other nodes to name their inputs
64 * * encode workflow scheduling hints (TBD)
65 * * provide robust assurance of reproducible results and restartability
66 * * allow nodes to specify only their immediately dependent nodes (inwards directed edges)
68 * Workflow specifications need to be serializeable and portable across job restarts and porting to other computing
69 * resources. The data graph manager and/or work scheduler need to be able to look at the inputs specified for a node
70 * and be able to determine that the required node or its output is available. If a node is used as the input for
71 * multiple other nodes, it should be clear how to avoid wasting resources when meeting the data requirement. If
72 * similar looking nodes have different inputs or parameters, they must not be mistaken to be equivalent.
74 * Context-dependent aspects of the workflow specification cannot be included in a hash, then, but context-independent
75 * aspects that affect the output of a node must be reflected.
77 * For example, an input filename should be included as identifying information, but the absolute path should not,
78 * though path hints or conventions should be clear in the context. The filename is sufficient as a parameter with which
79 * to construct the workflow node in an execution context, but is insufficient to uniquely identify the file since
80 * several names get reused a lot. Some sort of checksum of the file should also be included so that the inputs of the
81 * workflow at execution time can be checked against the inputs when the workflow was specified.
83 * Uniqueness of inputs could be more elaborate. For instance, a node may require the trajectory of a specific
84 * simulation as input, but flexibly handle starting from an arbitrary step in that trajectory to allow check-pointed
87 * The workflow object can have a list of keys that can be instantiated with no input dependencies, the scheduler could
88 * scan for keys that represent source nodes, or workflow containers could be turned into graphs through an additional
89 * preprocessing or clustering phase, but it will be easiest if we assert a protocol such as a node is not instantiated
90 * or activated until its inputs are ready.
92 * This is just a type alias until more elaborate implementation is needed.
94 using NodeKey
= std::string
;
96 // Forward declarations for definitions below.
97 class NodeSpecification
;
100 * \brief Recipe for a computational workflow.
102 * Provides a lightweight and portable container defining the nodes and edges in a workflow with enough information for
103 * the workflow to be instantiated and run.
110 //! In initial version, Implementation class is just a type alias.
111 using Impl
= typename
std::map
< NodeKey
, std::unique_ptr
<NodeSpecification
> >;
113 /*! \brief Use create() to get Workflow objects.
115 * An empty workflow is not meaningful except to a builder, which does not
116 * yet exist. Even a builder, though, will probably create the implementation
117 * object directly and the Workflow object from that.
122 * \brief Construct by transfering ownership of an implementation object.
124 * \param impl Implementation object to wrap.
128 * gmxapi::Workflow::Impl newGraph;
130 * // configure graph...
132 * // Create workflow container
133 * gmxapi::Workflow work {std::move(newGraph)};
134 * gmxapi::launchSession(&context, work);
137 explicit Workflow(Impl
&&impl
);
140 * \brief Add a node to the workflow graph.
142 * The work specification must already have its inputs assigned to existing
143 * nodes. This operation should only be permitted if it does not render a
144 * valid workflow invalid.
146 * \param spec Operational node to add to the Workflow.
148 * \return Key for the new node in the Workflow container.
150 * \todo Not yet implemented.
152 NodeKey
addNode(std::unique_ptr
<NodeSpecification
> spec
);
155 * \brief Get the node specification for a provided key.
157 * \param key Unique identifier for a node in the graph.
158 * \return copy of the node specification.
160 std::unique_ptr
<NodeSpecification
> getNode(const gmxapi::NodeKey
&key
) const noexcept
;
163 * \brief Get an iterator to the node key--value pairs.
165 * \return iterator across nodes in container.
167 * The order in which the nodes are returned is unspecified. Only forward iterator is provided.
170 Impl::const_iterator
cbegin() const;
171 Impl::const_iterator
cend() const;
172 // Allow range based for loop to work before C++17
173 Impl::const_iterator
begin() const;
174 Impl::const_iterator
end() const;
178 * \brief Create a new workflow.
180 * \param filename TPR filename accessible both to the client and library.
181 * \return Ownership of a new Workflow instance.
183 static std::unique_ptr
<Workflow
> create(const std::string
&filename
);
186 * \brief Storage structure.
193 * \brief Portable specification to define work and inform instantiation by the library.
195 * The GROMACS library creates the objects it needs to run as late as possible while
196 * optimizing parallel resources at run time. The specifications provide a way for
197 * client code to interact with the definition of the work to be performed while carrying
198 * enough information for GROMACS to launch.
200 * Client input is translated into serializeable parameters sufficient to instantiate
201 * the node at runtime.
203 * On the library side, the spec should have a pointer to a factory function for
204 * the library object(s) it represents that is valid in the current Context. Thus,
205 * when a workflow specification (and thus Node Specifications) are cloned to new
206 * Contexts, the Contexts must resolve an appropriate function pointer or raise an
207 * appropriate exception indicating the specified work is not possible on the targeted
210 * Different node types will have different sorts of parameters and such.
211 * \todo Clarify chain of responsibility for defining param type.
213 class NodeSpecification
216 //! Base class is heritable.
217 virtual ~NodeSpecification();
219 //! Nodes can use arbitrary param type, but string is default.
220 using paramsType
= std::string
;
223 * \brief Get an equivalent node for a new graph.
225 * \return ownership of a new node specification
227 * Allows a derived class to define its own copy behavior when accessed
228 * through a base class pointer.
231 * Future versions may use this function to translate a node spec from one
232 * context to another, in which case the context would likely be passed
233 * as an argument. E.g. clone(&context) or cloneTo(&workspec). It may
234 * be confusing for developers to manage the distinction between replicating
235 * a node in a graph versus using helper methods to copy the node-specific
236 * parameters to a node in a new graph, so it is probably better to
237 * reserve copy/move construction/assignment for internal code and use
238 * well-named well-documented free functions for such higher level operations.
239 * Furthermore, it is not universally intuitive what is meant by copying
240 * a node without specifying what happens to edges and connected nodes.
242 virtual std::unique_ptr
<NodeSpecification
> clone() = 0;
245 * \brief Fetch current params value.
247 * \return copy of internal params value.
249 virtual paramsType
params() const noexcept
= 0;
251 //! Parameters for the operation represented by this node.
252 paramsType params_
{};
256 } //end namespace gmxapi
258 #endif //GMXAPI_WORKFLOW_H