Refer to the page on work@chgv
Without ReadGroup
There has been an error with adding the readgroup information, therefore, I am testing this procedure just like I did with earlier trial (without read group information)
Input file preparation
Step 1: CountCovariates:
Script: /nfs/chgv/seqanalysis2/GATK/scripts/gatk_CountCov_noRG.sh
sample: als9c2
inputBAM: /nfs/chgv/seqanalysis2/GATK/rawBam/als9c2.build37.raw.bam
output: /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.noRG.CntCov.csv
Command sent:
qsub /nfs/chgv/seqanalysis2/GATK/scripts/gatk_CountCov_noRG.sh als9c2 /nfs/chgv/seqanalysis2/GATK/rawBam/als9c2.build37.raw.bam /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.noRG.CntCov.csv
java -jar -Xmx12g /nfs/chgv/seqanalysis/SOFTWARE/GenomeAnalysisTK-1.0.5083/GenomeAnalysisTK.jar
-T CountCovariates
-nt 12
-R /nfs/chgv/seqanalysis/ALIGNMENT/BWA_INDEX_JYL/human_g1k_v37.fasta
-B:eval,VCF /home/chavi/jl407/GATK-testing/referenceFiles/00-All.vcf
-I /nfs/chgv/seqanalysis2/GATK/rawBam/als9c2.build37.raw.bam
–default_read_group als9c2
–default_platform illumina
-cov ReadGroupCovariate
-cov QualityScoreCovariate
-cov DinucCovariate
-cov CycleCovariate
-recalFile : /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.noRG.CntCov.csv
2. Step two: TableCalibration:
Script: /nfs/chgv/seqanalysis2/GATK/scripts/gatk_TableRecal_noRG.sh
sample: als9c2
inputBAM: /nfs/chgv/seqanalysis2/GATK/rawBam/als9c2.build37.raw.bam
tabRecalFile: /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.noRG.CntCov.csv
recalBAM: /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.noRG.bam
Command sent:
qsub /nfs/chgv/seqanalysis2/GATK/scripts/gatk_TableRecal_noRG.sh als9c2 /nfs/chgv/seqanalysis2/GATK/rawBam/als9c2.build37.raw.bam /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.noRG.CntCov.csv /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.noRG.bam
java -jar -Xmx12g GenomeAnalysisTK.jar
-T TableRecalibration
-R /nfs/chgv/seqanalysis/ALIGNMENT/BWA_INDEX_JYL/human_g1k_v37.fasta
-I /nfs/chgv/seqanalysis2/GATK/rawBam/als9c2.build37.raw.bam
–default_read_group als9c2
–default_platform illumina
-recalFile /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.noRG.CntCov.csv
–out /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.noRG.bam
*Note,
-outputBam is deprecated, use -–out instead
Currently, not supporting parallel, only single thread
Time: 22.75 hours
2.1. Step 2.1: Indexing recalibrated BAM file:
qsub /nfs/chgv/seqanalysis2/GATK/scripts/gatk_indexBAM.sh /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.noRG.bam
3. Step three: CountCovariates (on recalibrated bam file):
Script: /nfs/chgv/seqanalysis2/GATK/scripts/gatk_CountCov_noRG.sh
sample: als9c2
inputBAM: /nfs/chgv/seqanalysis2/GATK/recalBam
/als9c2.build37.recal.noRG.bam
output: /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.recal.bam.noRG.CntCov.csv
Command sent:
qsub /nfs/chgv/seqanalysis2/GATK/scripts/gatk_CountCov_noRG.sh als9c2 /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.noRG.bam /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.recal.bam.noRG.CntCov.csv
java -jar -Xmx12g /nfs/chgv/seqanalysis/SOFTWARE/GenomeAnalysisTK-1.0.5083/GenomeAnalysisTK.jar
-T CountCovariates
-nt 12
-R /nfs/chgv/seqanalysis/ALIGNMENT/BWA_INDEX_JYL/human_g1k_v37.fasta
-B:eval,VCF /home/chavi/jl407/GATK-testing/referenceFiles/00-All.vcf
-I /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.noRG.bam
–default_read_group als9c2
–default_platform illumina
-cov ReadGroupCovariate
-cov QualityScoreCovariate
-cov DinucCovariate
-cov CycleCovariate
-recalFile /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.recal.bam.noRG.CntCov.csvTotally: 8.9 hours
4. Step four: AnalyzeCovariates (on raw bam file):
java -jar -Xmx6g $gatk_dir/AnalyzeCovariates.jar
-outputDir $outputDir
-resources $resourcesDir
-Rscript $RDir (/opt/R271/bin/R) , we have to provide this Rscript parameter.
-recalFile $tabRecalFile
Command sent: qsub /scriptdir/gatk_analCov.sh /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.noRG.CntCov.csv /nfs/chgv/seqanalysis2/GATK/samples/als9c2/preCal/
FIXME: BUT, the problem is that R script did not generate corresponding figures.
5. Step five: AnalyzeCovariates (on recalibrated bam file):
java -jar -Xmx6g $gatk_dir/AnalyzeCovariates.jar
-outputDir $outputDir
-resources $resourcesDir
-Rscript $RDir (/opt/R271/bin/R) , we have to provide this Rscript parameter.
-recalFile $tabRecalFile
Command sent: qsub /scriptdir/gatk_analCov.sh /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.recal.bam.noRG.CntCov.csv /nfs/chgv/seqanalysis2/GATK/samples/als9c2/noRG_reCal/
!! Need to fix the R code for assessment plots
With ReadGroup
ERRORS
When used samtools mpileup, I found this error “[mpileup] 1 samples in 1 input files
samtools: bam_plcmd.c:596: group_smpl: Assertion `id >= 0 && id n’ failed.” and later found out that the RG was not added as I expected. Here are the answer to my question at SEQanswers. Therefore, I have to recreate all the recalibrated bam files.
Use the dumb way to add ReadGroup information, hopefully it works. Fingers crossed
If it works, a pipeline GATK run will be fired up after checking the bam file and indexing it.
Step one: Check he bam file
samtools view -H /nfs/chgv/seqanalysis/ALIGNMENT/samples/als9c2/als9c2_rmdup_RG.bam
[jl407@head4 als9c2]$ samtools view -H als9c2_rmdup_RG.bam
@RG ID:Run_1_2 SM:Run_1_2 LB:ga PL:Illumina
@RG ID:Run_1_3 SM:Run_1_3 LB:ga PL:Illumina
@RG ID:Run_1_4 SM:Run_1_4 LB:ga PL:Illumina
@RG ID:Run_1_5 SM:Run_1_5 LB:ga PL:Illumina
@RG ID:Run_2_7 SM:Run_2_7 LB:ga PL:Illumina
@RG ID:Run_2_8 SM:Run_2_8 LB:ga PL:Illumina
samtools view /nfs/chgv/seqanalysis/ALIGNMENT/samples/als9c2/als9c2_rmdup_RG.bam | grep “RG”
HWI-ST426_0113:3:1101:1497:1989#0 99 1 177398310 29 100M = 177398565 355 ATTTGTCCAAAATGAGGAATACAATCTCTTATTCCAAAGAGCTATAAAAAAGAAGATAATCAATCTACT
TGAGAAGCCAATTGTTGGCATACATAAAGGC 6-./11142D<@DD@D;DD@6@@@??9??;?=7?<:6::99942555535=;=6<66<<?6<?=::6:7<9<DD=9=??<==6496145<494 XT:A:U NM:i:1 SM:i:29 AM:i:29 X0:i:1 X1:i:0 XM:i:
1 XO:i:0 XG:i:0 MD:Z:92C7 RG:Z:s.3
Step two: mv the bam file to /nfs/chgv/seqanalysis2/GATK/rawBam/
Step three: index the bam file
qsub /nfs/chgv/seqanalysis2/GATK/scripts/gatk_indexBAM.sh /nfs/chgv/seqanalysis2/GATK/rawBam/
Step four: fire up the “GATK pipeline”
qsub /nfs/chgv/seqanalysis2/GATK/scripts/GATK_recal_RG.pl als9c2 /nfs/chgv/seqanalysis2/GATK/rawBam/als9c2_rmdup_RG.bam /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.RG.CntCov.csv /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.RG.bam /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.recal.bam.RG.CntCov.csv
Log file:
Process command:
java -jar -Xmx12g /nfs/chgv/seqanalysis/SOFTWARE/GenomeAnalysisTK-1.0.5083/GenomeAnalysisTK.jar
-T CountCovariates
-nt 12
-R /nfs/chgv/seqanalysis/ALIGNMENT/BWA_INDEX_JYL/human_g1k_v37.fasta
-B:eval,VCF /home/chavi/jl407/GATK-testing/referenceFiles/00-All.vcf
-I /nfs/chgv/seqanalysis2/GATK/rawBam/als9c2_rmdup_RG.bam
-cov ReadGroupCovariate
-cov QualityScoreCovariate
-cov DinucCovariate
-cov CycleCovariate
-recalFile /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.RG.CntCov.csvTime to start the process is: Thu May 19 11:37:59 2011
Time when the process finishes is: Thu May 19 11:37:59 2011Process command:
java -jar -Xmx12g /nfs/chgv/seqanalysis/SOFTWARE/GenomeAnalysisTK-1.0.5083/GenomeAnalysisTK.jar
-T TableRecalibration
-R /nfs/chgv/seqanalysis/ALIGNMENT/BWA_INDEX_JYL/human_g1k_v37.fasta
-I /nfs/chgv/seqanalysis2/GATK/rawBam/als9c2_rmdup_RG.bam
-recalFile /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.raw.bam.RG.CntCov.csv
–out /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.RG.bamTime to start the process is: Thu May 19 11:37:59 2011
Time when the process finishes is: Thu May 19 11:37:59 2011Process command:
/nfs/chgv/seqanalysis/SOFTWARE/samtools-0.1.12a/samtools index /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.RG.bamTime to start the process is: Thu May 19 11:37:59 2011
Time when the process finishes is: Thu May 19 11:37:59 2011Process command:
java -jar -Xmx12g /nfs/chgv/seqanalysis/SOFTWARE/GenomeAnalysisTK-1.0.5083/GenomeAnalysisTK.jar
-T CountCovariates
-nt 12
-R /nfs/chgv/seqanalysis/ALIGNMENT/BWA_INDEX_JYL/human_g1k_v37.fasta
-B:eval,VCF /home/chavi/jl407/GATK-testing/referenceFiles/00-All.vcf
-I /nfs/chgv/seqanalysis2/GATK/recalBam/als9c2.build37.recal.RG.bam
-cov ReadGroupCovariate
-cov QualityScoreCovariate
-cov DinucCovariate
-cov CycleCovariate
-recalFile /nfs/chgv/seqanalysis2/GATK/resultsDir/als9c2.build37.recal.bam.RG.CntCov.csvTime to start the process is: Thu May 19 11:37:59 2011
Time when the process finishes is: Thu May 19 11:37:59 2011