Test-drive recalibration on als9c2

Refer to the page on work@chgv

Without ReadGroup

There has been an error with adding the readgroup information, therefore, I am testing this procedure just like I did with earlier trial (without read group information)

Input file preparation


Step 1: CountCovariates:

Script: /nfs/chgv/seqanalysis2/GATK/scripts/gatk_CountCov_noRG.sh
sample: als9c2
inputBAM: /nfs/chgv/seqanalysis2/GATK/rawBam/als9c2.build37.raw.bam
output: /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.noRG.CntCov.csv
Command sent:

qsub /nfs/chgv/seqanalysis2/GATK/scripts/gatk_CountCov_noRG.sh als9c2 /nfs/chgv/seqanalysis2/GATK/rawBam/als9c2.build37.raw.bam /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.noRG.CntCov.csv

java -jar -Xmx12g /nfs/chgv/seqanalysis/SOFTWARE/GenomeAnalysisTK-1.0.5083/GenomeAnalysisTK.jar
-T CountCovariates
-nt 12
-R /nfs/chgv/seqanalysis/ALIGNMENT/BWA_INDEX_JYL/human_g1k_v37.fasta
-B:eval,VCF /home/chavi/jl407/GATK-testing/referenceFiles/00-All.vcf
-I /nfs/chgv/seqanalysis2/GATK/rawBam/als9c2.build37.raw.bam
–default_read_group als9c2
–default_platform illumina
-cov ReadGroupCovariate
-cov QualityScoreCovariate
-cov DinucCovariate
-cov CycleCovariate
-recalFile : /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.noRG.CntCov.csv

2. Step two: TableCalibration:

Script: /nfs/chgv/seqanalysis2/GATK/scripts/gatk_TableRecal_noRG.sh
sample: als9c2
inputBAM: /nfs/chgv/seqanalysis2/GATK/rawBam/als9c2.build37.raw.bam
tabRecalFile: /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.noRG.CntCov.csv
recalBAM: /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.noRG.bam
Command sent:

qsub /nfs/chgv/seqanalysis2/GATK/scripts/gatk_TableRecal_noRG.sh als9c2 /nfs/chgv/seqanalysis2/GATK/rawBam/als9c2.build37.raw.bam /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.noRG.CntCov.csv /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.noRG.bam

java -jar -Xmx12g GenomeAnalysisTK.jar
-T TableRecalibration
-R /nfs/chgv/seqanalysis/ALIGNMENT/BWA_INDEX_JYL/human_g1k_v37.fasta
-I /nfs/chgv/seqanalysis2/GATK/rawBam/als9c2.build37.raw.bam
–default_read_group als9c2
–default_platform illumina
-recalFile /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.noRG.CntCov.csv
–out /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.noRG.bam
*Note,
-outputBam is deprecated, use -–out instead
Currently, not supporting parallel, only single thread
Time: 22.75 hours

2.1. Step 2.1: Indexing recalibrated BAM file:

qsub /nfs/chgv/seqanalysis2/GATK/scripts/gatk_indexBAM.sh /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.noRG.bam

3. Step three: CountCovariates (on recalibrated bam file):

Script: /nfs/chgv/seqanalysis2/GATK/scripts/gatk_CountCov_noRG.sh
sample: als9c2
inputBAM: /nfs/chgv/seqanalysis2/GATK/recalBam
/als9c2.build37.recal.noRG.bam
output: /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.recal.bam.noRG.CntCov.csv
Command sent:

qsub /nfs/chgv/seqanalysis2/GATK/scripts/gatk_CountCov_noRG.sh als9c2 /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.noRG.bam /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.recal.bam.noRG.CntCov.csv

java -jar -Xmx12g /nfs/chgv/seqanalysis/SOFTWARE/GenomeAnalysisTK-1.0.5083/GenomeAnalysisTK.jar
-T CountCovariates
-nt 12
-R /nfs/chgv/seqanalysis/ALIGNMENT/BWA_INDEX_JYL/human_g1k_v37.fasta
-B:eval,VCF /home/chavi/jl407/GATK-testing/referenceFiles/00-All.vcf
-I /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.noRG.bam
–default_read_group als9c2
–default_platform illumina
-cov ReadGroupCovariate
-cov QualityScoreCovariate
-cov DinucCovariate
-cov CycleCovariate
-recalFile /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.recal.bam.noRG.CntCov.csv

Totally: 8.9 hours

4. Step four: AnalyzeCovariates (on raw bam file):

java -jar -Xmx6g $gatk_dir/AnalyzeCovariates.jar
-outputDir $outputDir
-resources $resourcesDir
-Rscript $RDir (/opt/R271/bin/R) , we have to provide this Rscript parameter.
-recalFile $tabRecalFile

Command sent: qsub /scriptdir/gatk_analCov.sh /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.noRG.CntCov.csv /nfs/chgv/seqanalysis2/GATK/samples/als9c2/preCal/

FIXME: BUT, the problem is that R script did not generate corresponding figures.

5. Step five: AnalyzeCovariates (on recalibrated bam file):

java -jar -Xmx6g $gatk_dir/AnalyzeCovariates.jar
-outputDir $outputDir
-resources $resourcesDir
-Rscript $RDir (/opt/R271/bin/R) , we have to provide this Rscript parameter.
-recalFile $tabRecalFile

Command sent: qsub /scriptdir/gatk_analCov.sh /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.recal.bam.noRG.CntCov.csv /nfs/chgv/seqanalysis2/GATK/samples/als9c2/noRG_reCal/

!! Need to fix the R code for assessment plots

With ReadGroup

ERRORS

When used samtools mpileup, I found this error “[mpileup] 1 samples in 1 input files
samtools: bam_plcmd.c:596: group_smpl: Assertion `id >= 0 && id n’ failed.” and later found out that the RG was not added as I expected. Here are the answer to my question at SEQanswers. Therefore, I have to recreate all the recalibrated bam files.

Use the dumb way to add ReadGroup information, hopefully it works. Fingers crossed

If it works, a pipeline GATK run will be fired up after checking the bam file and indexing it.

Step one: Check he bam file

samtools view -H /nfs/chgv/seqanalysis/ALIGNMENT/samples/als9c2/als9c2_rmdup_RG.bam

[jl407@head4 als9c2]$ samtools view -H als9c2_rmdup_RG.bam
@RG ID:Run_1_2 SM:Run_1_2 LB:ga PL:Illumina
@RG ID:Run_1_3 SM:Run_1_3 LB:ga PL:Illumina
@RG ID:Run_1_4 SM:Run_1_4 LB:ga PL:Illumina
@RG ID:Run_1_5 SM:Run_1_5 LB:ga PL:Illumina
@RG ID:Run_2_7 SM:Run_2_7 LB:ga PL:Illumina
@RG ID:Run_2_8 SM:Run_2_8 LB:ga PL:Illumina

samtools view /nfs/chgv/seqanalysis/ALIGNMENT/samples/als9c2/als9c2_rmdup_RG.bam | grep “RG”

HWI-ST426_0113:3:1101:1497:1989#0 99 1 177398310 29 100M = 177398565 355 ATTTGTCCAAAATGAGGAATACAATCTCTTATTCCAAAGAGCTATAAAAAAGAAGATAATCAATCTACT
TGAGAAGCCAATTGTTGGCATACATAAAGGC 6-./11142D<@DD@D;DD@6@@@??9??;?=7?<:6::99942555535=;=6<66<<?6<?=::6:7<9<DD=9=??<==6496145<494 XT:A:U NM:i:1 SM:i:29 AM:i:29 X0:i:1 X1:i:0 XM:i:
1 XO:i:0 XG:i:0 MD:Z:92C7 RG:Z:s.3

Step two: mv the bam file to /nfs/chgv/seqanalysis2/GATK/rawBam/

Step three: index the bam file

qsub /nfs/chgv/seqanalysis2/GATK/scripts/gatk_indexBAM.sh /nfs/chgv/seqanalysis2/GATK/rawBam/

Step four: fire up the “GATK pipeline”

qsub /nfs/chgv/seqanalysis2/GATK/scripts/GATK_recal_RG.pl als9c2 /nfs/chgv/seqanalysis2/GATK/rawBam/als9c2_rmdup_RG.bam /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.RG.CntCov.csv /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.RG.bam /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.recal.bam.RG.CntCov.csv

Log file:

Process command:
java -jar -Xmx12g /nfs/chgv/seqanalysis/SOFTWARE/GenomeAnalysisTK-1.0.5083/GenomeAnalysisTK.jar
-T CountCovariates
-nt 12
-R /nfs/chgv/seqanalysis/ALIGNMENT/BWA_INDEX_JYL/human_g1k_v37.fasta
-B:eval,VCF /home/chavi/jl407/GATK-testing/referenceFiles/00-All.vcf
-I /nfs/chgv/seqanalysis2/GATK/rawBam/als9c2_rmdup_RG.bam
-cov ReadGroupCovariate
-cov QualityScoreCovariate
-cov DinucCovariate
-cov CycleCovariate
-recalFile /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.RG.CntCov.csv

Time to start the process is: Thu May 19 11:37:59 2011
Time when the process finishes is: Thu May 19 11:37:59 2011

Process command:
java -jar -Xmx12g /nfs/chgv/seqanalysis/SOFTWARE/GenomeAnalysisTK-1.0.5083/GenomeAnalysisTK.jar
-T TableRecalibration
-R /nfs/chgv/seqanalysis/ALIGNMENT/BWA_INDEX_JYL/human_g1k_v37.fasta
-I /nfs/chgv/seqanalysis2/GATK/rawBam/als9c2_rmdup_RG.bam
-recalFile /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.RG.CntCov.csv
–out /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.RG.bam

Time to start the process is: Thu May 19 11:37:59 2011
Time when the process finishes is: Thu May 19 11:37:59 2011

Process command:
/nfs/chgv/seqanalysis/SOFTWARE/samtools-0.1.12a/samtools index /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.RG.bam

Time to start the process is: Thu May 19 11:37:59 2011
Time when the process finishes is: Thu May 19 11:37:59 2011

Process command:
java -jar -Xmx12g /nfs/chgv/seqanalysis/SOFTWARE/GenomeAnalysisTK-1.0.5083/GenomeAnalysisTK.jar
-T CountCovariates
-nt 12
-R /nfs/chgv/seqanalysis/ALIGNMENT/BWA_INDEX_JYL/human_g1k_v37.fasta
-B:eval,VCF /home/chavi/jl407/GATK-testing/referenceFiles/00-All.vcf
-I /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.RG.bam
-cov ReadGroupCovariate
-cov QualityScoreCovariate
-cov DinucCovariate
-cov CycleCovariate
-recalFile /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.recal.bam.RG.CntCov.csv

Time to start the process is: Thu May 19 11:37:59 2011
Time when the process finishes is: Thu May 19 11:37:59 2011

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.