Makefile is stored in the file named Makefile.sh
within the mxp
subdirectory.
As the filename implies, Makefile.sh
is a Bash script. It defines entries of associative array MXP_MAKEFILE
. Each entry in this array is a rule. The name (key) of an entry is the name of the target defined by this rule, and the value (called “makeline”) is a string specifying what other targets the defined target depends on and how the defined target should be obtained.
Makeline syntax in BNF
; | ||
( ) : | ||
, | ||
= | ||
+ | ||
+ | ||
any valid Bash variable name; the same as | ||
sequence of letters, digits and underscores starting with letter |
Outside names, whitespaces may be used freely to improve readability.
Makeline syntax explained
Each consists of:
- Possible specification of required targets, enclosed in parentheses. If the list of required targets is empty, it is omitted together with parentheses. Specification of required targets (
- name of the environmental variable ( ), followed by
- equal sign (=), followed by
- name of the required target ( )
) is a comma-separated list of s, with each being:
- Possible list of parameter sets (+). ), followed by colon. If the list of parameter sets is empty, colon should be omitted as well. List of parameter sets consists of s separated by a plus sign (
- Non-empty list of methods, consisting of +). s separated by a plus sign (
Examples of makelines
idata
( idata_DIR = d00_idata ) pdata
(pdata_DIR = d01_pdata) psub_0 + psub_MALE : psubset + link
(pdata_DIR=d01_pdata) psub_0 + psub_FEMALE + psub_WHITE : psubset; (idata_DIR=d00_idata, pdata_DIR=d01_pdata) link
- This makeline has an empty list of required targets and an empty list of parameter sets. It contains only a single method name
idata
. - This makeline says that there is one required target,
d00_idata
, and a full path to the required target directory will be assigned to the environmental variableidata_DIR
before execution of parameter and method scripts. There are no parameter sets, and one method,pdata
. - This makeline specifies one required target,
d01_pdata
. Further, it says that two parameter sets,psub_0
andpsub_MALE
, and two methods,psubset
andlink
, should be used to obtain the resulting target. - This makeline consists of two makegroups. The first one specifies one required target, three parameter sets, and one method. The second group specifies two required targets and one method.
What Makefile.sh
looks like
MXP_MAKEFILE[d00_idata]=" idata"
MXP_MAKEFILE[d01_pdata]=" (idata_DIR = d00_idata) pdata"
MXP_MAKEFILE[d02_psub_MALE]=" (pdata_DIR = d01_pdata) psub_0 + psub_MALE : psubset + link"
MXP_MAKEFILE[d02_psub_FEMALE_WHITE]="(pdata_DIR = d01_pdata) psub_0 + psub_FEMALE + psub_WHITE : psubset; \
(idata_DIR=d00_idata, pdata_DIR=d01_pdata) link"
Remember that Makefile.sh
is a Bash script, and must abide by the Bash syntax. In particular, no spaces are allowed between MXP_MAKEFILE
and the opening square bracket, within square brackets, between closing square bracket and the equal sign, and between equal sign and the opening quote mark. However, spaces may be used freely within quote marks. We use this fact to make our Makefile.sh
more readable. Of course, everybody may prefer to use their own rules for formatting makelines.
Abbreviations in Makefile
Often Makefile should include many similar lines. As an example, consider subsetting input data in GWAS analysis. One may want to analyze certain groups: all individuals, males, females, white males, etc. In addition, one may want to apply quality control on different stages, e.g., either do quality control for all individuals and then select males from those who passed quality control, or first select males and then apply quality control to them (there is no commutativity!). Note that quality control is a kind of subsetting.
The above will require the following lines in Makefile:
Moreover, it is not comprehensive enough: if one needs to do subsetting by a disease state, one needs to add many more lines, and the size of Makefile will grow exponentially.
MXP allows to compact this into:
MXP_MAKEFILE["d02_psub_{S2}"]=" (pdata_DIR = d01_pdata) psub_0 + psub_{S2} : psubset + link"
MXP_MAKEFILE["d02_psub_{{S1}}_{S2}"]=" (pdata_DIR = d02_psub_{{S1}}) psub_0 + psub_{S2} : psubset + link"
MXP_MAKEFILE["d02_psub_{{S1}}_PC"]=" (pdata_DIR = d02_psub_{{S1}}) prcomp_0 + prcomp : prcomp + link"
MXP_MAKEFILE["d02_psub_{{S1}}_EXP"]=" (pdata_DIR = d02_psub_{{S1}}) pexp_0 + pexp : pexport + link"
Actually, the first two lines express everything from non-abbreviated example, and much more. The last two lines are shown here to demonstrate other properties of MXP Makefile abbreviations.
Makefile abbreviation syntax
Abbreviated Makefile rule may contain “Makefile variable names”, both in the name of defined target (key of MXP_MAKEFILE
associative array) and in the makeline. Makefile variable name is a (i.e., sequence of letters, digits and underscores starting with letter) enclosed either in single or double curly braces. In the example above, Makefile variable names are “{{S1}}
” and “{S2}
“.
Variable name in single curly braces is called a narrow name, and variable name in double curly braces is called a wide name. Thus, “{{S1}}
” is a wide name, and “{S2}
” is a narrow name.
Occurrence of variable name in the defined target name is defining occurrence, and occurrence in the makeline is applied occurrence. Any variable name mentioned in the rule must have exactly one defining occurrence, and may have an arbitrary number of applied occurrences (in particular, it may have no applied occurrences at all, although it rarely makes sense).
Makefile abbreviation semantics
MXP treats Makefile variables in a special way (and it is why variables are not included in Makefile syntax directly). Namely, when Makefile is loaded, no processing of Makefile variable names is performed. Rules are stored in MXP_MAKEFILE
associative array “as is”, i.e. d02_psub_{S2}
is a key in this array.
Variable names come into play when a target name is given, i.e. when MXP needs to find a rule to use for a target. More specifically, when a target name A is specified in command line, MXP needs to find a rule how to build this target. Further, MXP learns that target B is required to build target A; consequently, it needs a rule for target B, and so on.
When MXP needs to find a rule for target A, it matches all (abbreviated) target names defined in Makefile (i.e., keys of MXP_MAKEFILE
associative array) against name A. Matching is performed by following rules:
- any part of abbreviated name outside variable names must be matched literally
- any narrow variable matches any sequence of letters and digits
- any wide variable matches any sequence of letters, digits and underscores
For example, when target d02_psub_QC_MALE_WHITE
is given in command line, the abbreviated name d02_psub_{{S1}}_{S2}
from the second line will match. This match gives values for variables:
{{S1}}
gets valueQC_MALE
{S2}
gets valueWHITE
After matching and obtaining variable values, these values are substituted in the makeline and thus an instance of the rule is obtained:
MXP_MAKEFILE["d02_psub_QC_MALE_WHITE"]=" (pdata_DIR = d02_psub_QC_MALE) psub_0 + psub_WHITE : psubset + link"
Now MXP can see that it needs target d02_psub_QC_MALE
. Again, it matches d02_psub_QC_MALE
against all (abbreviated) target names defined in Makefile, and again the abbreviated name from the second line matches, now with values of variables {{S1}}==QC
and {S2}==MALE
. This gives instance of rule to use:
MXP_MAKEFILE["d02_psub_QC_MALE"]=" (pdata_DIR = d02_psub_QC) psub_0 + psub_MALE : psubset + link"
Next, target d02_psub_QC
is needed, and an instance of rule from the first line is used:
MXP_MAKEFILE["d02_psub_QC"]=" (pdata_DIR = d01_pdata) psub_0 + psub_QC : psubset + link"
Handling multiple matches
It may happen that that target name matches multiple rules. For example, in the above example, target name d02_psub_QC_MALE_PC
matches both rules 2 and 3. In such cases MXP uses most specific rule.
We say that abbreviated name A is more specific than abbreviated name B (denoted A ≼ B) if any instance of A is also an instance of B.
Thus, d02_psub_{{S1}}_PC
≼ d02_psub_{{S1}}_{S2}
.
For a given set of abbreviated names A1, …, An, we say that Ai is the most specific in this set, if Ai ≼ Aj for any j.
Among two abbreviated names, d02_psub_{{S1}}_PC
and d02_psub_{{S1}}_{S2}
, the first one is the most specific.
More complex example follows. Consider these rules in Makefile:
MXP_MAKEFILE["{Var1}_{Var2}"]=" ..... "
MXP_MAKEFILE["{Var1}_B"]=" ..... "
MXP_MAKEFILE["A_{Var2}"]=" ..... "
MXP_MAKEFILE["A_B"]=" ..... "
We have the following relations:
-
A_B ≼ {Var1}_B ≼ {Var1}_{Var2}
-
A_B ≼ A_{Var2} ≼ {Var1}_{Var2}
Then:
- Target name
X_Y
matches only rule 1, and (the instance of) this rule will be used. - Target name
X_B
matches rules 1 and 2. Among this rules, rule 2 is most specific, and thus rule 2 will be used. - Target name
A_Y
matches rules 1 and 3. Similarly, rule 3 will be used. - Target name
A_B
matches all 4 rules. Rule 4 is the most specific, and thus it will be used.
If, however, rule 4 is not given, there are no most specific rule for target A_B
. In this case, MXP will report an error.