Adding read group information

There has been much more headache adding this piece of information than I have expected. And, I think it is worthwhile to create a new thread just for this topic.

Ultimately, we would like to add this information into our SAMPE/SAMSE step using the -r option.

Working directory: /home/chavi/jl407/GATK-testing/testAddRG

Recalibration steps and expected output files:

Step 1 This should disappear once the new pipeline is updated (a -r option should be added at BWA step), for now, we need to manually add “read group” information and merge the bam files. But, there has been an error using this option .

Another source with details on the samtools options are here .
Merge BAM files with read group information, and further processing the bam file etc.: sampleName.RG.rmdup.bam

Just for testing purposes, will add read group into each individual bam file. A help was posted at SEQanswers forum by freeseek as:

echo -e “@RGtID:gatSM:hstLB:gatPL:Illumina” > rg.txt
samtools view -h ga.bam | cat rg.txt – | awk ‘{ if (substr($1,1,1)==”@”) print; else printf “%stRG:Z:gan”,$0; }’ | samtools view -uS – | samtools rmdup – – | samtools rmdup -s – aln.bam

With my modification as

echo -e “@RGtID:gatSM:hstLB:gatPL:Illumina” > rg.txt
samtools view -h ga.bam | cat rg.txt – | awk ‘{ if (substr($1,1,1)==”@”) print; else printf “%stRG:Z:gan”,$0; }’ | samtools view -bS -o aln.bam –

It did NOT work!!!

Take a look at the following screen shot:

[jl407@head4 testAddRG]$ samtools view -H /nfs/chgv/seqanalysis/ALIGNMENT/samples/als9c2/Run1/s.3.RG.bam
@RG ID:Run_1_3 SM:Run_1_3 LB:ga PL:Illumina

[jl407@head4 testAddRG]$ samtools view /nfs/chgv/seqanalysis/ALIGNMENT/samples/als9c2/Run1/s.3.RG.bam | more
…………….
XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:100 RG:Z:ga
………………

Each step takes 20 minutes!!

To view where the “read group has been properly” added, use the samtools view option.

OK, let’s try the most dumb way!!!

Use script: /nfs/chgv/seqanalysis/ALIGNMENT/samples/als9c2/addRG.sh, which sends this command:
/nfs/chgv/seqanalysis/SOFTWARE/samtools-0.1.12a/samtools merge -rh /nfs/chgv/seqanalysis/ALIGNMENT/samples/als9c2/rg.txt – /nfs/chgv/seqanalysis/ALIGNMENT/samples/als9c2/Run1/s.2.bam /nfs/chgv/seqanalysis/ALIGNMENT/samples/als9c2/Run1/s.3.bam /nfs/chgv/seqanalysis/ALIGNMENT/samples/als9c2/Run1/s.4.bam /nfs/chgv/seqanalysis/ALIGNMENT/samples/als9c2/Run1/s.5.bam /nfs/chgv/seqanalysis/ALIGNMENT/samples/als9c2/Run2/s.7.bam /nfs/chgv/seqanalysis/ALIGNMENT/samples/als9c2/Run2/s.8.bam | samtools rmdup – – | samtools rmdup -s – als9c2_rmdup_RG.bam

Sent to run @ May 18th, 9:11 a.m. Well, it worked!! Finished @ May 19th, 23:04

However, I need to “sort” it at the last step.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.