Monthly Archives: June 2011

Filtering “highly variate” SNPs

A list of “highly variate” region information is available at: /nfs/goldstein/goldsteinlab/Bioinformatics/artifact_regions/ Filtering na12878 variant calls with primary objectives: Whether FDR drops Whether sensitivity increases? Copied those files to DSCR side

Posted in Uncategorized | Leave a comment

Test assessment on dodrp751952 build37

Following the assessment thread done earlier , doing the real assessment on dodrp751952 build37 with the focus on the following metrics. 1. ti/tv ratio 2. number of raw and filtered SNV 3. number of indel called by samtools 4. proportion … Continue reading

Posted in Uncategorized | Leave a comment

Zoom in comparison on na12878

An email from Jessica: Hi Jianying, Here are the answers to your questions: 1. Where is 1000 genome snp file stored locally? If we LOST it, which file we should re-download from 1000 genome? I think this file might have … Continue reading

Posted in Uncategorized | Leave a comment

New test run on build37

There has been so much information that gets squeezed into my head. Have to keep a good note on the analysis. Anyway: 1. Genome database 2. Run_summary.xls: 3. Individual and combined genome/exom_alignment_stats.txt 4. Searching for alignment storage: perl /nfs/goldstein/goldsteinlab/software/sh/find_files sampleList … Continue reading

Posted in Uncategorized | Leave a comment

Learning variant calling with samtools

It is quite a learning experiences here at Duke CHGV. One thing is about the samtools for variant calling. The main procedure in our pipeline is as following: /nfs/chgv/seqanalysis/SOFTWARE/samtools-0.1.12a/samtools mpileup -d 500 -ugf /nfs/chgv/seqanalysis/ALIGNMENT/BWA_INDEX/human_ref_36_50.fa /nfs/chgv/seqanalysis2/GATK/recalBam/mchd002A2.build36.rec al.noRG.bam > /nfs/chgv/seqanalysis2/GATK/samples/mchd002A2/noRG_recal//file1 /nfs/chgv/seqanalysis/SOFTWARE/samtools-0.1.12a/bcftools/bcftools view … Continue reading

Posted in Uncategorized | Leave a comment

Running GATK recalibration in pipeline mode

Scripts: /nfs/chgv/seqanalysis2/GATK/scripts/GATK_recal_b36.pl Perform 4 steps: 1. Count/tally covariance 2. Recalibrate bam file 3. Index recal bam file 4. Count/tally covariance on recalBam Command issued qsub /nfs/chgv/seqanalysis2/GATK/scripts/GATK_recal_b36.pl mchd002A2 /nfs/chgv/seqanalysis2/GATK/rawBam/mchd002A2.build36.raw.bam /nfs/chgv/seqanalysis2/GATK/resultsDir/mchd002A2.build36.raw.bam.noRG.CntCov.csv /nfs/chgv/seqanalysis2/GATK/samples/mchd002A2/noRG_recal/mchd002A2.build36.recal.noRG.bam /nfs/chgv/seqanalysis2/GATK/resultsDir/mchd002A2.build36.recal.bam.noRG.CntCov.csv Send June 14th, 2011 A sample run with build … Continue reading

Posted in Uncategorized | Leave a comment

GATK trouble shooting

Scenario: Jianying, I think it through and now think the most possible explanation would be the reference sequence (build 37 only) used in SVA and in your alignment pipeline is very different. To test this hypothesis, can you extract the … Continue reading

Posted in Uncategorized | Leave a comment

Finding file on sva cluster

Jessica showed me a centralized place storing some documentation: /nfs/goldstein/goldsteinlab/Bioinformatics In there, there was a file “documentation.txt”, which stores the key information. To extract a file with a name: Step 1. Create a simple text file Step 2. Send this … Continue reading

Posted in Uncategorized | Leave a comment

Working with master project

Normally, the master project are run and save at /nfs/svaprojects/Master.SVA It saves up a master list of .gsap/.bco files on the samples that have been annotated with SVA. I had found very low concordance for sample als9c2 (for the second … Continue reading

Posted in Uncategorized | Leave a comment

Gain from GATK

From the currently GATK assessment, it showed good improvement with the quality score recalibration. BUT, coming down to the SNV calls, it did NOT showed what has been claimed. A few points from Mark’s nature genetics paper 1. Even though … Continue reading

Posted in Uncategorized | Leave a comment