Home » MXP — Modular eXpandable framework for building Pipelines » MXP Base » MXP Documentation » MXP Overview and Concepts

MXP Overview and Concepts

A pipeline is a sequence of operations that leads to a required result.

What exactly “result” means, and what kind of “operations” are used, depends heavily on application domain. The expected application domain influences the design of a tool for building pipelines.

The famous Unix utility make (known since 1976) was, probably, the first tool for building pipelines (although the word “pipeline” is rarely used in conjunction with make). Virtually all of the tools for building pipelines borrow from make, and MXP is not an exception. But what makes these tools different are the elementary units which the pipeline operates on and the rules that are used to determine whether to re-execute a step or to use its existing results. This difference eventually influences the language used to describe the pipelines (e.g., Makefile syntax and semantics).

The units which MXP operates on are called (just like in make) targets. A target is represented by a directory containing an arbitrary set of files (and possibly subdirectories). We often use the word “target” instead more exact term “target directory”.

As in case of make, the execution of MXP consists of obtaining target specified in the command line. In order to obtain a target, other target(s) may be needed. MXP checks whether the required targets have been already obtained and if they are up-to-date; if not, MXP automatically rebuilds the required targets — which may require other targets, i.e. obtaining the required targets is a recursive process. What targets are required for a given target, and how a given target should be obtained from the other ones is specified in Makefile (again, the term is borrowed from make).

What is Makefile and how to use it

Makefile consists of rules. In MXP, Makefile is a Bash script. Here is an example of a rule:

MXP_MAKEFILE[d01_pdata]="(idata_DIR = d00_idata) pdata_0 : pdata"

This rule states that:

target d01_pdata requires target d00_idata
method pdata with parameters pdata_0 should be used to obtain target d01_pdata from target d00_idata
during execution of method pdata environmental variable idata_DIR will be set to full path to target directory d00_idata

Also, it implicitly states that:

there is an analysis directory (current directory or directory explicitly specified in MXP command-line arguments) that contains a subdirectory mxp, and a file Makefile.sh inside of it
the target directory named d01_pdata will be created within the analysis directory as a result of obtaining target d01_pdata (or, if this directory already exists, MXP will check whether this directory is up-to-date and rebuild it if it is not)
there is a file pdata.sh containing a Bash script that will be executed in order to obtain target d01_pdata
there is a file pdata_0.params.sh containing a Bash script (that define parameters) that will be executed in order to obtain the target d01_pdata

Strictly speaking, there is no difference between a parameter script and a method script. MXP introduces this distinction to encourage the pipeline developers to clearly separate parameters from methods. Parameters could be changed by the pipeline user (for example, the user may want to use his own parameters for quality control), while methods are much more stable and are not expected to change from one pipeline application to another.

To determine if the target is up-to-date, MXP will check if:

the target directory exists
the last attempt to build target was completed successfully
all required targets are up-to-date
the rule used to obtain target has not been updated
method and parameter scripts used to obtain target have not been updated

Chaining pipelines

An important feature of MXP is that it allows to create new pipelines re-using pieces from existing pipelines. Each pipeline has a parent; only the root pipeline (which is a part of MXP base) does not have parent. Makefile, methods and parameter sets defined in the parent pipeline are available in the child pipeline, and the child pipeline may override exactly those pieces from the parent pipeline that need to be changed. How this works is described in detail in the section “Chaining Pipelines”.

Logging

Another important feature of MXP is logging. When a target is built, a full log is written in the target directory. This log can be examined later to learn how exactly the target was built (in the case of successful build) or find out why the target build failed (in the case of failure).

It is also possible to save a log of a full MXP run, which may involve building multiple targets.

Useful trick

Looking to the example on Makefile rule above, one may wonder why names of targets are d00_idata and d01_pdata, but not simply idata and pdata?

The answer is simple. Targets are represented by directories, and you often will use command ls (or ls -l) to examine what targets you have already. The order in which directories are listed by ls command is alphabetical. Prefixing directory names by a number ensures a convenient ordering.