Using Power of Bash Scripting

Everything in MXP is Bash scripts. This means that everywhere you can use all power of Bash.

In this section, we provide a few examples of such advanced Bash usage.

Generating Makefile contents

Suppose you successfully obtained targets d02_psub_TOTAL, d02_psub_MALE, and d02_psub_FEMALE (which contain PLINK filesets for some genotyping data, all population, only males, and only females, respectively). Now you have to split each dataset by chromosomes — this is required, for example, for phasing genotype data with SHAPEIT or Eagle2 software.

This means that you need the following lines in Makefile:

MXP_MAKEFILE[d02_psub_TOTAL_chr1]="(psub_DIR = d02_psub_TOTAL) psub_0 + psub_chr1 : psubset"
MXP_MAKEFILE[d02_psub_TOTAL_chr2]="(psub_DIR = d02_psub_TOTAL) psub_0 + psub_chr2 : psubset"
.....
MXP_MAKEFILE[d02_psub_MALE_chr1]=" (psub_DIR = d02_psub_MALE)  psub_0 + psub_chr1 : psubset"
.....

Remarks

Here we assume that: (1) psubset method invokes PLINK with parameters defined in parameter scripts to do subsetting; (2) psub_0 parameter scripts resets all environmental variables used by psubset script to default values; (3) psub_chr1 script set PLINK parameter “--chr 1” that tells PLINK to extract chromosome 1, etc.

— 26 chromosomes times 3 subpopulations = 78 Makefile lines. Here is the place where Makefile abbreviations (see Makefile Syntax) really help. The above 78 lines may be replaced by a single line:

MXP_MAKEFILE["d02_psub_{{S1}}_chr{N}"]="(psub_DIR = d02_psub_{{S1}}) psub_0 + psub_chr{N} : psubset"

Now you can use command lines:

     mxp   d02_psub_TOTAL_chr1
     mxp   d02_psub_TOTAL_chr2
     .....

to obtain per-chromosome data.

Of course, you will run a couple of such commands to test whether everything works correctly. But typing 78 such commands in a row is too cumbersome. To save your efforts, you may define dummy target that creates no new data but has all per-chromosome targets as its required targets. For this, it is convenient to use Bash to make Makefile shorter:

required_targets=""
infix=""
for n in {1..26}; do
    required_targets="${required_targets}${infix}psub_DIR_$n = d12_psub_{{S1}}_chr$n"
    infix=", "
done
MXP_MAKEFILE[d12_psub_{{S1}}_chrALL]="($required_targets) nop"

Now you can use command lines:

     mxp   d02_psub_TOTAL_chrALL
     mxp   d02_psub_MALE_chrALL
     mxp   d02_psub_FEMALE_chrALL

to obtain all per-chromosome targets. You may also define another dummy target d02_psub_ALL_chrALL to reduce these 3 commands to one. Alternatively, you may type in your terminal (it is Bash terminal!):

     for s1 in TOTAL MALE FEMALE;   do   mxp d02_psub_${s1}_chrALL;   done

Using MXP environment

But in the above example we still need 26 parameter scripts psub_chr1.params.sh, …, psub_chr26.params.sh. These scripts are very simple; in fact, they consist of single line, like:

PSUB_PLINK_FILTER="--chr 1"

All what is different is the chromosome number. Now note that these scripts are used to obtain targets which name ends with “_chr1“, etc. Thus, we can extract number from target name, and the target name is a value of environmental variable MXP_TARGET. This allows us to replace all parameter scripts with the single script psub_chrN.params.sh:

PSUB_PLINK_FILTER="--chr ${MXP_TARGET#*_chr}"

Of course, after this we have to update Makefile:

MXP_MAKEFILE["d02_psub_{{S1}}_chr{N}"]="(psub_DIR = d02_psub_{{S1}}) psub_0 + psub_chrN : psubset"

(remove curly braces from parameter name).