A pipeline is a sequence of operations that leads to a required result.
What exactly “result” means, and what kind of “operations” are used, depends heavily on application domain. The expected application domain influences the design of a tool for building pipelines.
The famous Unix utility
make (known since 1976) was, probably, the first tool for building pipelines (although the word “pipeline” is rarely used in conjunction with
make). Virtually all of the tools for building pipelines borrow from
make, and MXP is not an exception. But what makes these tools different are the elementary units which the pipeline operates on and the rules that are used to determine whether to re-execute a step or to use its existing results. This difference eventually influences the language used to describe the pipelines (e.g.,
Makefile syntax and semantics).
The units which MXP operates on are called (just like in
make) targets. A target is represented by a directory containing an arbitrary set of files (and possibly subdirectories). We often use the word “target” instead more exact term “target directory”.
As in case of
make, the execution of MXP consists of obtaining target specified in the command line. In order to obtain a target, other target(s) may be needed. MXP checks whether the required targets have been already obtained and if they are up-to-date; if not, MXP automatically rebuilds the required targets — which may require other targets, i.e. obtaining the required targets is a recursive process. What targets are required for a given target, and how a given target should be obtained from the other ones is specified in Makefile (again, the term is borrowed from
What is Makefile and how to use it
Makefile consists of rules. In MXP, Makefile is a Bash script. Here is an example of a rule:
MXP_MAKEFILE[d01_pdata]="(idata_DIR = d00_idata) pdata_0 : pdata"
This rule states that:
pdata_0should be used to obtain target
- during execution of method
idata_DIRwill be set to full path to target directory
Also, it implicitly states that:
- there is an analysis directory (current directory or directory explicitly specified in MXP command-line arguments) that contains a subdirectory
mxp, and a file
Makefile.shinside of it
- the target directory named
d01_pdatawill be created within the analysis directory as a result of obtaining target
d01_pdata(or, if this directory already exists, MXP will check whether this directory is up-to-date and rebuild it if it is not)
- there is a file
pdata.shcontaining a Bash script that will be executed in order to obtain target
- there is a file
pdata_0.params.shcontaining a Bash script (that define parameters) that will be executed in order to obtain the target
Strictly speaking, there is no difference between a parameter script and a method script. MXP introduces this distinction to encourage the pipeline developers to clearly separate parameters from methods. Parameters could be changed by the pipeline user (for example, the user may want to use his own parameters for quality control), while methods are much more stable and are not expected to change from one pipeline application to another.
To determine if the target is up-to-date, MXP will check if:
- the target directory exists
- the last attempt to build target was completed successfully
- all required targets are up-to-date
- the rule used to obtain target has not been updated
- method and parameter scripts used to obtain target have not been updated
An important feature of MXP is that it allows to create new pipelines re-using pieces from existing pipelines. Each pipeline has a parent; only the root pipeline (which is a part of MXP base) does not have parent. Makefile, methods and parameter sets defined in the parent pipeline are available in the child pipeline, and the child pipeline may override exactly those pieces from the parent pipeline that need to be changed. How this works is described in detail in the section “Chaining Pipelines”.
Another important feature of MXP is logging. When a target is built, a full log is written in the target directory. This log can be examined later to learn how exactly the target was built (in the case of successful build) or find out why the target build failed (in the case of failure).
It is also possible to save a log of a full MXP run, which may involve building multiple targets.
Looking to the example on Makefile rule above, one may wonder why names of targets are
d01_pdata, but not simply
The answer is simple. Targets are represented by directories, and you often will use command
ls -l) to examine what targets you have already. The order in which directories are listed by
ls command is alphabetical. Prefixing directory names by a number ensures a convenient ordering.