Syntax of MXP Makefile

Makefile is stored in the file named  Makefile.sh  within the  mxp  subdirectory.

As the filename implies,  Makefile.sh  is a Bash script. It defines entries of associative array  MXP_MAKEFILE.  Each entry in this array is a rule. The name (key) of an entry is the name of the target defined by this rule, and the value (called “makeline”) is a string specifying what other targets the defined target depends on and how the defined target should be obtained.

Makeline syntax in BNF

makeline ::= makegroups
makegroups ::= makegroup ( ; makegroup )*
makegroup ::= ( ( req-targets ) )? ( params : )? methods
req-targets ::= req-target ( , req-target )*
req-target ::= env-var-name = target-name
params ::= param-name ( + param-name )*
methods ::= method-name ( + method-name )*
env-var-name ::= any valid Bash variable name; the same as name
target-name ::= name
param-name ::= name
method-name ::= name
name ::= sequence of letters, digits and underscores starting with letter

Outside names, whitespaces may be used freely to improve readability.

Makeline syntax explained

Makeline is a sequence of makegroups, separated by semicolons.
Each makegroup consists of:

  • Possible specification of required targets, enclosed in parentheses. If the list of required targets is empty, it is omitted together with parentheses. Specification of required targets (req-targets) is a comma-separated list of req-targets, with each req-target being:
    • name of the environmental variable (env-var-name), followed by
    • equal sign (=), followed by
    • name of the required target (target-name)
  • Possible list of parameter sets (params), followed by colon. If the list of parameter sets is empty, colon should be omitted as well. List of parameter sets consists of param-names separated by a plus sign (+).
  • Non-empty list of methods, consisting of method-names separated by a plus sign (+).

Examples of makelines

idata
(  idata_DIR = d00_idata  )                  pdata
(pdata_DIR = d01_pdata) psub_0 + psub_MALE : psubset + link
(pdata_DIR=d01_pdata) psub_0 + psub_FEMALE + psub_WHITE : psubset;  (idata_DIR=d00_idata, pdata_DIR=d01_pdata) link

  1. This makeline has an empty list of required targets and an empty list of parameter sets. It contains only a single method name  idata.
  2. This makeline says that there is one required target,  d00_idata,  and a full path to the required target directory will be assigned to the environmental variable  idata_DIR  before execution of parameter and method scripts. There are no parameter sets, and one method,  pdata.
  3. This makeline specifies one required target,  d01_pdata.  Further, it says that two parameter sets, psub_0  and  psub_MALE,  and two methods, psubset  and  link,  should be used to obtain the resulting target.
  4. This makeline consists of two makegroups. The first one specifies one required target, three parameter sets, and one method. The second group specifies two required targets and one method.

What Makefile.sh looks like

MXP_MAKEFILE[d00_idata]="                                                                         idata"
MXP_MAKEFILE[d01_pdata]="            (idata_DIR = d00_idata)                                      pdata"
MXP_MAKEFILE[d02_psub_MALE]="        (pdata_DIR = d01_pdata)  psub_0 + psub_MALE :                psubset + link"
MXP_MAKEFILE[d02_psub_FEMALE_WHITE]="(pdata_DIR = d01_pdata)  psub_0 + psub_FEMALE + psub_WHITE : psubset; \
                                     (idata_DIR=d00_idata, pdata_DIR=d01_pdata)                   link"

Remember that  Makefile.sh  is a Bash script, and must abide by the Bash syntax. In particular, no spaces are allowed between MXP_MAKEFILE and the opening square bracket, within square brackets, between closing square bracket and the equal sign, and between equal sign and the opening quote mark. However, spaces may be used freely within quote marks. We use this fact to make our  Makefile.sh  more readable. Of course, everybody may prefer to use their own rules for formatting makelines.

Abbreviations in Makefile

Often Makefile should include many similar lines. As an example, consider subsetting input data in GWAS analysis. One may want to analyze certain groups: all individuals, males, females, white males, etc. In addition, one may want to apply quality control on different stages, e.g., either do quality control for all individuals and then select males from those who passed quality control, or first select males and then apply quality control to them (there is no commutativity!). Note that quality control is a kind of subsetting.

The above will require the following lines in Makefile:

Makefile without abbreviations
[sourcecode language=”plain” highlight=””] MXP_MAKEFILE[d02_psub_QC]=" (pdata_DIR = d01_pdata) psub_0 + psub_QC : psubset + link"
MXP_MAKEFILE[d02_psub_QC_MALE]=" (pdata_DIR = d01_psub_QC) psub_0 + psub_MALE : psubset + link"
MXP_MAKEFILE[d02_psub_QC_FEMALE]=" (pdata_DIR = d01_psub_QC) psub_0 + psub_FEMALE : psubset + link"
MXP_MAKEFILE[d02_psub_QC_MALE_WHITE]=" (pdata_DIR = d01_psub_QC_MALE) psub_0 + psub_WHITE : psubset + link"
MXP_MAKEFILE[d02_psub_QC_FEMALE_WHITE]=" (pdata_DIR = d01_psub_QC_FEMALE) psub_0 + psub_WHITE : psubset + link"
MXP_MAKEFILE[d02_psub_QC_MALE_BLACK]=" (pdata_DIR = d01_psub_QC_MALE) psub_0 + psub_BLACK : psubset + link"
MXP_MAKEFILE[d02_psub_QC_FEMALE_BLACK]=" (pdata_DIR = d01_psub_QC_FEMALE) psub_0 + psub_BLACK : psubset + link"
MXP_MAKEFILE[d02_psub_MALE]=" (pdata_DIR = d01_pdata) psub_0 + psub_MALE : psubset + link"
MXP_MAKEFILE[d02_psub_FEMALE]=" (pdata_DIR = d01_pdata) psub_0 + psub_FEMALE : psubset + link"
MXP_MAKEFILE[d02_psub_MALE_QC]=" (pdata_DIR = d01_psub_MALE) psub_0 + psub_QC : psubset + link"
MXP_MAKEFILE[d02_psub_FEMALE_QC]=" (pdata_DIR = d01_psub_FEMALE) psub_0 + psub_QC : psubset + link"
# …..
# … many more lines for MALE_WHITE, MALE_QC_WHITE, MALE_WHITE_QC …
# …..
[/sourcecode]

Moreover, it is not comprehensive enough: if one needs to do subsetting by a disease state, one needs to add many more lines, and the size of Makefile will grow exponentially.

MXP allows to compact this into:

MXP_MAKEFILE["d02_psub_{S2}"]="         (pdata_DIR = d01_pdata)          psub_0 + psub_{S2} : psubset + link"
MXP_MAKEFILE["d02_psub_{{S1}}_{S2}"]="  (pdata_DIR = d02_psub_{{S1}})    psub_0 + psub_{S2} : psubset + link"
MXP_MAKEFILE["d02_psub_{{S1}}_PC"]="    (pdata_DIR = d02_psub_{{S1}})    prcomp_0 + prcomp  : prcomp  + link"
MXP_MAKEFILE["d02_psub_{{S1}}_EXP"]="   (pdata_DIR = d02_psub_{{S1}})    pexp_0 + pexp      : pexport + link"

Actually, the first two lines express everything from non-abbreviated example, and much more. The last two lines are shown here to demonstrate other properties of MXP Makefile abbreviations.

Makefile abbreviation syntax

Abbreviated Makefile rule may contain “Makefile variable names”, both in the name of defined target (key of MXP_MAKEFILE associative array) and in the makeline. Makefile variable name is a name (i.e., sequence of letters, digits and underscores starting with letter) enclosed either in single or double curly braces. In the example above, Makefile variable names are “{{S1}}” and “{S2}“.

Variable name in single curly braces is called a narrow name, and variable name in double curly braces is called a wide name. Thus, “{{S1}}” is a wide name, and “{S2}” is a narrow name.

Occurrence of variable name in the defined target name is defining occurrence, and occurrence in the makeline is applied occurrence. Any variable name mentioned in the rule must have exactly one defining occurrence, and may have an arbitrary number of applied occurrences (in particular, it may have no applied occurrences at all, although it rarely makes sense).

Makefile abbreviation semantics

MXP treats Makefile variables in a special way (and it is why variables are not included in Makefile syntax directly). Namely, when Makefile is loaded, no processing of Makefile variable names is performed. Rules are stored in MXP_MAKEFILE associative array “as is”, i.e.  d02_psub_{S2}  is a key in this array.

Variable names come into play when a target name is given, i.e. when MXP needs to find a rule to use for a target. More specifically, when a target name A is specified in command line, MXP needs to find a rule how to build this target. Further, MXP learns that target B is required to build target A; consequently, it needs a rule for target B, and so on.

When MXP needs to find a rule for target A, it matches all (abbreviated) target names defined in Makefile (i.e., keys of MXP_MAKEFILE associative array) against name A. Matching is performed by following rules:

  • any part of abbreviated name outside variable names must be matched literally
  • any narrow variable matches any sequence of letters and digits
  • any wide variable matches any sequence of letters, digits and underscores

For example, when target  d02_psub_QC_MALE_WHITE  is given in command line, the abbreviated name  d02_psub_{{S1}}_{S2}  from the second line will match. This match gives values for variables:

  • {{S1}}  gets value  QC_MALE
  • {S2}  gets value  WHITE

After matching and obtaining variable values, these values are substituted in the makeline and thus an instance of the rule is obtained:

MXP_MAKEFILE["d02_psub_QC_MALE_WHITE"]="  (pdata_DIR = d02_psub_QC_MALE)    psub_0 + psub_WHITE : psubset + link"

Now MXP can see that it needs target  d02_psub_QC_MALE.  Again, it matches  d02_psub_QC_MALE  against all (abbreviated) target names defined in Makefile, and again the abbreviated name from the second line matches, now with values of variables  {{S1}}==QC  and  {S2}==MALE.  This gives instance of rule to use:

MXP_MAKEFILE["d02_psub_QC_MALE"]="  (pdata_DIR = d02_psub_QC)    psub_0 + psub_MALE : psubset + link"

Next, target  d02_psub_QC  is needed, and an instance of rule from the first line is used:

MXP_MAKEFILE["d02_psub_QC"]="         (pdata_DIR = d01_pdata)          psub_0 + psub_QC : psubset + link"

Handling multiple matches

It may happen that that target name matches multiple rules. For example, in the above example, target name  d02_psub_QC_MALE_PC  matches both rules 2 and 3. In such cases MXP uses most specific rule.

We say that abbreviated name A is more specific than abbreviated name B (denoted A ≼ B) if any instance of A is also an instance of B.

Thus,  d02_psub_{{S1}}_PCd02_psub_{{S1}}_{S2}.

For a given set of abbreviated names A1, …, An, we say that Ai is the most specific in this set, if Ai ≼ Aj for any j.

Among two abbreviated names,  d02_psub_{{S1}}_PC  and  d02_psub_{{S1}}_{S2},  the first one is the most specific.

More complex example follows. Consider these rules in Makefile:

MXP_MAKEFILE["{Var1}_{Var2}"]=" ..... "
MXP_MAKEFILE["{Var1}_B"]="      ..... "
MXP_MAKEFILE["A_{Var2}"]="      ..... "
MXP_MAKEFILE["A_B"]="           ..... "

We have the following relations:

  • A_B ≼ {Var1}_B ≼ {Var1}_{Var2}
  • A_B ≼ A_{Var2} ≼ {Var1}_{Var2}

Then:

  • Target name  X_Y  matches only rule 1, and (the instance of) this rule will be used.
  • Target name  X_B  matches rules 1 and 2. Among this rules, rule 2 is most specific, and thus rule 2 will be used.
  • Target name  A_Y  matches rules 1 and 3. Similarly, rule 3 will be used.
  • Target name  A_B  matches all 4 rules. Rule 4 is the most specific, and thus it will be used.

If, however, rule 4 is not given, there are no most specific rule for target  A_B. In this case, MXP will report an error.