MXP Overview and Concepts

A pipeline is a sequence of operations that leads to a required result.

What exactly “result” means, and what kind of “operations” are used, depends heavily on application domain. The expected application domain influences the design of a tool for building pipelines.

The famous Unix utility make (known since 1976) was, probably, the first tool for building pipelines (although the word “pipeline” is rarely used in conjunction with make). Virtually all of the tools for building pipelines borrow from make, and MXP is not an exception. But what makes these tools different are the elementary units which the pipeline operates on and the rules that are used to determine whether to re-execute a step or to use its existing results. This difference eventually influences the language used to describe the pipelines (e.g., Makefile syntax and semantics).

The units which MXP operates on are called (just like in make) targets. A target is represented by a directory containing an arbitrary set of files (and possibly subdirectories). We often use the word “target” instead more exact term “target directory”.

As in case of make, the execution of MXP consists of obtaining target specified in the command line. In order to obtain a target, other target(s) may be needed. MXP checks whether the required targets have been already obtained and if they are up-to-date; if not, MXP automatically rebuilds the required targets — which may require other targets, i.e. obtaining the required targets is a recursive process. What targets are required for a given target, and how a given target should be obtained from the other ones is specified in Makefile (again, the term is borrowed from make).

What is Makefile and how to use it

Makefile consists of rules. In MXP, Makefile is a Bash script. Here is an example of a rule:

MXP_MAKEFILE[d01_pdata]="(idata_DIR = d00_idata) pdata_0 : pdata"

This rule states that:

  • target  d01_pdata  requires target  d00_idata 
  • method  pdata  with parameters  pdata_0  should be used to obtain target  d01_pdata  from target  d00_idata 
  • during execution of method  pdata  environmental variable  idata_DIR  will be set to full path to target directory  d00_idata 

Also, it implicitly states that:

  • there is an analysis directory (current directory or directory explicitly specified in MXP command-line arguments) that contains a subdirectory  mxp,  and a file  Makefile.sh  inside of it
  • the target directory named  d01_pdata  will be created within the analysis directory as a result of obtaining target  d01_pdata  (or, if this directory already exists, MXP will check whether this directory is up-to-date and rebuild it if it is not)
  • there is a file  pdata.sh  containing a Bash script that will be executed in order to obtain target  d01_pdata 
  • there is a file  pdata_0.params.sh  containing a Bash script (that define parameters) that will be executed in order to obtain the target  d01_pdata 

Strictly speaking, there is no difference between a parameter script and a method script. MXP introduces this distinction to encourage the pipeline developers to clearly separate parameters from methods. Parameters could be changed by the pipeline user (for example, the user may want to use his own parameters for quality control), while methods are much more stable and are not expected to change from one pipeline application to another.

To determine if the target is up-to-date, MXP will check if:

  • the target directory exists
  • the last attempt to build target was completed successfully
  • all required targets are up-to-date
  • the rule used to obtain target has not been updated
  • method and parameter scripts used to obtain target have not been updated

Chaining pipelines

An important feature of MXP is that it allows to create new pipelines re-using pieces from existing pipelines. Each pipeline has a parent; only the root pipeline (which is a part of MXP base) does not have parent. Makefile, methods and parameter sets defined in the parent pipeline are available in the child pipeline, and the child pipeline may override exactly those pieces from the parent pipeline that need to be changed. How this works is described in detail in the section “Chaining Pipelines”.

Logging

Another important feature of MXP is logging. When a target is built, a full log is written in the target directory. This log can be examined later to learn how exactly the target was built (in the case of successful build) or find out why the target build failed (in the case of failure).

It is also possible to save a log of a full MXP run, which may involve building multiple targets.

Useful trick

Looking to the example on Makefile rule above, one may wonder why names of targets are  d00_idata  and  d01_pdata,  but not simply  idata  and  pdata?

The answer is simple. Targets are represented by directories, and you often will use command  ls  (or  ls -l) to examine what targets you have already. The order in which directories are listed by  ls  command is alphabetical. Prefixing directory names by a number ensures a convenient ordering.