Work note Jan. 2011
01032011
Follow up emails to
1. Dr. Chen, Hao and Dr. Chen, Xiaoxin from NCCU for their esophagus microarray study
2. Dr. Jean Cook for DNA re-replication project
3. Tangi Smallwood for her survival analysiis
01042011
1. Explore the GTD installation
2. Started esophagus microarray study
3. Emailed Dr. Arlin Rodgers and started to investigate SAM package (see detail post on SAM)
01052011 (sick day off)
01062011
1. Dr. Chen, Xiaoxin’s project
01122011
1. Arlin Rodgers’ project
a. Finished reading the Agilent CH3 methylation array documentation
b. Send response to Dr. Rogers for further direction.
2. Xiaoxin Chen’s project
Getting things done
A web-based the project management tool, getting things done (GTD) offers comprehensive features to keep track of project. Detail is here
What else
Protected: Survival analysis
SeqWare debugging
For seqware production, need to monitor the process report/fix errors accordingly.
Two important page:
11042010
A. It seems that Brian has cleaned up the errors, thanks!!
1. No event has happened to “101028_UNC2-RDR300275_00040_FC_62E9AAXX” yet
2. Only three successes with “101027_UNC7-RDR3001641_00050_FC_62J8EAAXX”, what is going on?
3. Hold on a minute, Brian corrected (deleted) the error, but did NOT fix the problem. Those missing files need to be re-stored!!
B. There was a problem (linking the file). For flowcell: 101022_UNC7-RDR3001641_00049_FC_62J92AAXX
1. 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.vendorQCsummary.txt exists on FS, but when click from LIMS, it fails
2. Detail output files like following:
/datastore/nextgenout/seqware-analysis/illumina/101022_UNC7-RDR3001641_00049_FC_62J92AAXX/seqware-0.7.0_RNASeqAlignmentBWA-0.7.4/seqware-0.7.0_BWA-0.7.0
[jyli@lbg-compute seqware-0.7.0_BWA-0.7.0]$ ls -tral 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.*
-rw-r—– 1 seqware nextgenseq 154 2010-11-02 02:33 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.uniqRead.txt
-rw-r—– 1 seqware nextgenseq 7494 2010-11-02 02:33 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.stat.txt
-rw-r—– 1 seqware nextgenseq 139 2010-11-02 02:33 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.qualFilter.txt
-rw-r—– 1 seqware nextgenseq 478 2010-11-02 02:33 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.vendorQCsummary.txt
-rw-r—– 1 seqware nextgenseq 65 2010-11-02 02:33 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.adapterCont.txt
-rw-r—– 1 seqware nextgenseq 1781 2010-11-02 02:33 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.trimmed.txt
-rw-r—– 1 seqware nextgenseq 405 2010-11-02 02:33 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.stat.summary.txt
-rw-r—– 1 seqware nextgenseq 6327810056 2010-11-02 02:47 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.trimmed.fastq
-rw-r—– 1 seqware nextgenseq 14187 2010-11-02 03:46 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.bxp.png
-rw-r—– 1 seqware nextgenseq 13962 2010-11-02 03:46 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.BCtrend.png
-rw-r—– 1 seqware nextgenseq 366 2010-11-02 03:46 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.BCtrend.txt
-rw-r—– 1 seqware nextgenseq 15 2010-11-02 03:46 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.BCflag.txt
-rw-r—– 1 seqware nextgenseq 344 2010-11-02 03:46 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.qcGenome.txt
-rw-r—– 1 seqware nextgenseq 298 2010-11-02 03:46 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.GenericQCGenome.txt
-rw-r—– 1 seqware nextgenseq 11917 2010-11-02 03:46 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.ntDistr.png
-rw-r—– 1 seqware nextgenseq 2013317829 2010-11-02 03:49 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.trimmed.raw.bam
-rw-r—– 1 seqware nextgenseq 2068081081 2010-11-02 04:36 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.trimmed.annotated.bam
-rw-r—– 1 seqware nextgenseq 9765 2010-11-02 07:34 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.TRcov.png
-rw-r—– 1 seqware nextgenseq 972296 2010-11-02 07:34 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.trimmed.annotated.gene.quantification.txt
-rw-r—– 1 seqware nextgenseq 2003 2010-11-02 07:34 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.TRcov.txt
-rw-r—– 1 seqware nextgenseq 315 2010-11-02 07:34 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.alignStat.txt
-rw-r—– 1 seqware nextgenseq 3343555 2010-11-02 07:34 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.trimmed.annotated.transcript.quantification.txt
-rw-r—– 1 seqware nextgenseq 2060617606 2010-11-02 07:36 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.trimmed.annotated.translated_to_genomic.bam
-rw-r—– 1 seqware nextgenseq 8458 2010-11-02 08:47 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.geneCoverage.png
-rw-r—– 1 seqware nextgenseq 811 2010-11-02 08:47 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.geneCoverage.txt
-rw-r—– 1 seqware nextgenseq 8950203 2010-11-02 08:47 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.trimmed.annotated.spljxn.quantification.txt
-rw-r—– 1 seqware nextgenseq 13842696 2010-11-02 08:47 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.trimmed.annotated.exon.quantification.txt
-rw-r—– 1 seqware nextgenseq 70608115 2010-11-02 08:47 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.bigwig
C. As of today, 11042010, how does QC do? Using 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1 as an example
1. IllQualStat (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.vendorQCsummary.txt
2. BCtrend (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.BCtrend.png
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.BCtrend.txt
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.BCflag.txt
3. qualStat (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.stat.txt
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.stat.summary.txt
4. uniqRead (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.uniqRead.txt
5. adapter/TrimCountAdapt(success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.adapterCont.txt
6. qualFilter (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.adapterCont.txt
7. qcGenome (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.qcGenome.txt
8. alignStat (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.alignStat.txt
9. geneCoverage (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.geneCoverage.txt
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.geneCoverage.png
10. plotStats (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.bxp.png
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.ntDistr.png
11.coverageXTranscript (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.TRcov.txt
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.TRcov.png
11052010
1. Helped Alicia to get her data
2. For IllQualStat
3. For BCtrend
Work note November 2010
Nov. 1st, 2010
1. Proposed meeting with Sai and Brian for redefining my position, see Brian’s email:
********************
Hi Jianying,
OK, let’s sit down together on Monday some time and iron out the issues you described below.
Also Sai wants to meet with the two of us to talk about redefining your position. So we should save some time for that on Monday too.
Not a huge change for the time being, more of a change in funding source.
Also, to follow up with your workflow question ${bin_dir} corresponds to the directory provisioned by Pegasus on the cluster node (which stores the binaries used by the modules). Not the bin directories in SeqWare subversion. So this workflow code is correct.
–Brian
***************
2. Check the adapterCont ouput and found out that 29 files are missing the “real” adapterCont output. Detail see the error log,
3. Using the “TrimCountAdapterSingleRun.pl” to correct those files and re-processing.
4. Error in the script??
5. It turns out that “/PullFromIlluminaSequencerToSRF/” does NOT exist for those flowcell!!!
[jyli@lbg-compute qcFlagFixing]$ ls -tral /datastore/nextgenout/seqware-analysis/illumina/100722_UNCI-RDR301647_0014_706GFAAXX/
total 20
drwxrwsr-x 3 brianoc nextgenseq 4096 2010-08-16 16:55 .
drwxrwsr-x 3 brianoc nextgenseq 4096 2010-08-16 16:55 seqware-0.7.0_RNASeqAlignmentBWA-0.7.1
drwxrwsr-x 117 brianoc nextgenseq 12288 2010-10-27 08:35 ..
ls -tral /datastore/nextgenout/seqware-analysis/illumina/100603-UNC3-RDR300156_0007/
[jyli@lbg-compute qcFlagFixing]$ ls -al /datastore/nextgenout/seqware-analysis/illumina/100608_UNC3-RDR300156_0008/
total 40
drwxrwsr-x 8 brianoc nextgenseq 4096 2010-09-09 07:05 .
drwxrwsr-x 117 brianoc nextgenseq 12288 2010-10-27 08:35 ..
drwxrwsr-x 2 brianoc nextgenseq 4096 2010-09-23 16:31 seqware-0.7.0_PullFromIlluminaSequencerToSRF-0.7.0
drwxrwsr-x 2 brianoc nextgenseq 4096 2010-08-03 17:01 seqware-0.7.0_PullFromIlluminaSequencerToSRF-0.7.0_1277671957000
drwxrwsr-x 3 brianoc nextgenseq 4096 2010-06-29 17:21 seqware-0.7.0_RNASeqAlignmentBWA-0.7.0
drwxrwsr-x 3 brianoc nextgenseq 4096 2010-08-18 14:05 seqware-0.7.0_RNASeqAlignmentBWA-0.7.2
drwxrwsr-x 3 brianoc nextgenseq 4096 2010-09-09 07:05 seqware-0.7.0_RNASeqAlignmentBWA-0.7.4
drwxr-sr-x 12 jyli nextgenseq 4096 2010-07-29 10:30 seqware-0.7.0_RNASeqQC-0.7.0
6. Or, miss and match happened. i.e. /datastore/nextgenout/seqware-analysis/illumina/100414_UNC4-RDR3001561_0001/seqware-0.7.0_RNASeqAlignmentBWA-0.7.4/seqware-0.7.0_BWA-0.7.0/PEROU_REF_6_TCGA00099_UNCCH_100414_UNC4-RDR3001561_0001_3.adapterCont.txt
7. So, put a halt on this since this is so unique!
8. For the fixing effort, it ran smoothly, and reported 152 missing adapterCont.txt files
As of this morning, four fixing scripts are done: adapterCont_reCal.pl, qcGenome_reCal.pl, uniqRead_reCal.pl, and alignStat_reCal.pl
Now, trying to get BCtrend and IllQualStat modules fixing scripts.
11022010
select count(*) from (select f.file_path from processing p, processing_files pf, file f where p.processing_id = pf.processing_id and pf.file_id = f.file_id and algorithm = ‘BCtrend’ and status not like ‘%failed%’and f.file_path like ‘%BCtrend.png%’)tbla;
11032010
To avoid loosing database information, always back up before delete/update anything with the following procedure:
pg_dump -h swmaster.bioinf.unc.edu -U seqware -W seqware_meta_db > MetaDB_schema.txt
Matt is helping with getting query for qualStat and BCtrend:
BCtrends modules which were run:
select count(*) from (select f.file_path from processing p, processing_files pf, file f where p.processing_id = pf.processing_id and pf.file_id = f.file_id and algorithm = ‘BCtrend’ and status not like ‘%failed%’and f.file_path like ‘%BCTrend.txt%’)tbla;
select count(*) from (select f.file_path from processing p, processing_files pf, file f where p.processing_id = pf.processing_id and pf.file_id = f.file_id and algorithm = ‘BCtrend’ and status not like ‘%failed%’and f.file_path like ‘%BCtrend.png%’)tbla;
Without, include sw_accesion:
select processing_id from processing p where algorithm = ‘BCtrend’ and status like ‘success’;
seqware_meta_db=> select count(*) from (select processing_id from processing p where algorithm = ‘qualStat’ and status like ‘success’)success_qs where success_qs.processing_id not in (select processing_id from processing p where algorithm = ‘BCtrend’ and status = ‘success’);
count
——-
929
(1 row)
select * from (select processing_id from processing p where algorithm = ‘qualStat’ and status like ‘success’)success_qs where success_qs.processing_id not in (select processing_id from processing p where algorithm = ‘BCtrend’ and status = ‘success’);
dt list tables with data in
using processing_relationship, processing table..
Why did I loose note from these two day’s note?
11042010
It was documented in seqware debugging page!!
11052010
Had a talk with Brian and Sai for the transition of TCGA developing to Bioinformatics consultation duty, at the new capacity, I will be responsible for the following few tasks:
1. Supporting Bioinformatics Core’s analytical component, which includes:
a. Microarry gene expression analysis
b. Bioinformatics Core walk-in clinic
c. Handle analysis request
d. Support Genomic Core, Yan Shi and Michael Topal’s facillity
2. Discussion with Chris Fan and Katie Hoadley for technical information exchange
3. Monthly presentation on work progress etc.
4. Set up a wiki page for such effort
5. Develop survey (SOP) for our support for internal evaluation and better communication with P.I. and scientists as customers
6. A point person for NGS data, eyes and ears
&. Will report to Sai from then on
11082010
1. While waiting for Matt for the BCtrend fixing, start to work on the qc aggregate plot (generic R code)
2. First, there is problem with QC files,
a. E.G. For 100813_UNC4-RDR3001561_00015_FC_629FHAAXX, we have missed all sort of files for some lanes, (lane 1), this include QC files as well as processed files for quantification
b. Some newly processed files have not qc report??
3. Some newly processed flowcells have problem with file association, it seems Brian (or someone) has fixed this problem
4. /home/jyli/svnroot/TCGA-dev/scripts/user/jyli/QC_modules/sw_module_aggrePlotAuto.pl seems to have a bug as it does NOT collect newly processed qc flag files, strange?? Need to fix this. Get this done!!
While checking, found the following:
qcGenome flag did not get updated (101018_UNC8*):
For example:
[jyli@lbg-compute seqware-0.7.0_BWA-0.7.0]$ more 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.qcGenome.txt
qcgenome_query_file datastore/nextgenout/seqware-analysis/illumina/101018_UNC8-RDR3001640_00040_FC_706KRAAXX/seqware-0.7.0_RNASeqAlignmentBWA-0.7.4/seqware-0.7.0_BWA-0.7.0/101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.trimmed.fastq
qcgenome_read_total 32799173
reads_rRNA 811883
percnt_rRNA 2.48
reads_viral 53074
percnt_viral 0.16
qcgenome_flag 1
But the script has been updated:
vi ~/svnroot/seqware-complete/trunk/seqware-pipeline/perl/bin/sw_module_RiboAndViralAln.pl
$flag = 0;
#if ($pctR > 2 || $pctV > 1){$flag = 1;}
if ($pctR > 10 || $pctV > 5){$flag = 1;} #lowered stringency for qc flag, need to be parameterized in seqware 0.7.5
print OUT “qcgenome_flagt$flagn”;
Some QC output are missing!
They are properly reflected on qc-flag-html report!
[jyli@lbg-compute seqware-0.7.0_BWA-0.7.0]$ ls -tral 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.*
-rw-r—– 1 seqware nextgenseq 346 2010-10-27 13:32 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.qcGenome.txt
CHECK -rw-r—– 1 seqware nextgenseq 11921 2010-10-27 13:32 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.ntDistr.png
-rw-r—– 1 seqware nextgenseq 298 2010-10-27 13:32 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.GenericQCGenome.txt
-rw-r—– 1 seqware nextgenseq 14580 2010-10-27 13:32 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.bxp.png
-rw-r—– 1 seqware nextgenseq 1961163791 2010-10-27 13:34 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.trimmed.raw.bam
-rw-r—– 1 seqware nextgenseq 2018005939 2010-10-27 13:38 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.trimmed.annotated.bam
-rw-r—– 1 seqware nextgenseq 2005 2010-10-27 18:13 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.TRcov.txt
-rw-r—– 1 seqware nextgenseq 991206 2010-10-27 18:13 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.trimmed.annotated.gene.quantification.txt
CHECK -rw-r—– 1 seqware nextgenseq 316 2010-10-27 18:13 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.alignStat.txt
-rw-r—– 1 seqware nextgenseq 9748 2010-10-27 18:13 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.TRcov.png
-rw-r—– 1 seqware nextgenseq 3329977 2010-10-27 18:13 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.trimmed.annotated.transcript.quantification.txt
-rw-r—– 1 seqware nextgenseq 2011755497 2010-10-27 18:16 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.trimmed.annotated.translated_to_genomic.bam
-rw-r—– 1 seqware nextgenseq 8486 2010-10-27 19:46 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.geneCoverage.png
-rw-r—– 1 seqware nextgenseq 806 2010-10-27 19:46 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.geneCoverage.txt
-rw-r—– 1 seqware nextgenseq 8943835 2010-10-27 19:46 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.trimmed.annotated.spljxn.quantification.txt
-rw-r—– 1 seqware nextgenseq 13433672 2010-10-27 19:46 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.trimmed.annotated.exon.quantification.txt
-rw-r—– 1 seqware nextgenseq 63351543 2010-10-27 19:46 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.bigwig
CHECK
Some timing also seems confusing, i.e. 101018_UNC4:
[jyli@lbg-compute seqware-0.7.0_BWA-0.7.0]$ ls -al 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.2.*
-rw-r—– 1 seqware nextgenseq 421 2010-11-01 13:18 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.2.vendorQCsummary.txt
[jyli@lbg-compute seqware-0.7.0_BWA-0.7.0]$ ls -al 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.*
-rw-r—– 1 seqware nextgenseq 316 2010-10-27 19:48 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.alignStat.txt
-rw-r—– 1 seqware nextgenseq 49825998 2010-10-27 20:27 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.bigwig
-rw-r—– 1 seqware nextgenseq 13706 2010-10-27 16:05 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.bxp.png
-rw-r—– 1 seqware nextgenseq 8536 2010-10-27 20:27 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.geneCoverage.png
-rw-r—– 1 seqware nextgenseq 790 2010-10-27 20:27 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.geneCoverage.txt
-rw-r—– 1 seqware nextgenseq 298 2010-10-27 16:05 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.GenericQCGenome.txt
-rw-r—– 1 seqware nextgenseq 11946 2010-10-27 16:05 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.ntDistr.png
-rw-r—– 1 seqware nextgenseq 345 2010-10-27 16:05 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.qcGenome.txt
-rw-r—– 1 seqware nextgenseq 9816 2010-10-27 19:48 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.TRcov.png
-rw-r—– 1 seqware nextgenseq 1998 2010-10-27 19:48 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.TRcov.txt
-rw-r—– 1 seqware nextgenseq 1687645852 2010-10-27 17:30 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.trimmed.annotated.bam
-rw-r—– 1 seqware nextgenseq 13482365 2010-10-27 20:27 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.trimmed.annotated.exon.quantification.txt
-rw-r—– 1 seqware nextgenseq 1010573 2010-10-27 19:48 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.trimmed.annotated.gene.quantification.txt
-rw-r—– 1 seqware nextgenseq 8926642 2010-10-27 20:27 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.trimmed.annotated.spljxn.quantification.txt
-rw-r—– 1 seqware nextgenseq 3373515 2010-10-27 19:48 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.trimmed.annotated.transcript.quantification.txt
-rw-r—– 1 seqware nextgenseq 1686530032 2010-10-27 19:49 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.trimmed.annotated.translated_to_genomic.bam
-rw-r—– 1 seqware nextgenseq 1640686890 2010-10-27 16:07 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.trimmed.raw.bam
-rw-r—– 1 seqware nextgenseq 421 2010-11-01 13:18 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.vendorQCsummary.txt
11092010
1. Keep working on aggregate plot…
2. Need to check the re-run for rerunning the flag computation..
a. Where are the re-run scripts: uniqRead_reCal.pl, alignStat_reCal.pl, qcGenome_reCal.pl, and adapterCont_reCal.pl
b. Four modules can be rerun for lowering the threshold (uniqRead, alignStat, alignStat, and qcGenome with four scripts located at //home/jyli/svnroot/TCGA-dev/scripts/user/jyli/QC_modules/qcFlagFixing/)
3. For IllQualStat:
a. Brian deleted those errorneous stuffs
b. For those with output in FS, need to insert to DB
c. For those not even available, need to re-run
d. Solution, an R script was created at /home/jyli/current_projects/SeqWareModules/qc_flags/R-stuffs/flag_summary_11092010.R
e. It seems that all current flowcell/lanes have the IllQualStat output
f. Therefore, the only thing that is needed is to insert into DB
11112010
A. Getting data from “scratch” space as seqware user
1. Log in lbg-compute with seqware user
2. Go to directory: /home/seqware/scratch/pegasus/
3. find . -name 101103_UNC10_SN254_0172_B209C5ABXX.*.*.txt > temp.txt
3.1 find . -name 101103_UNC10_SN254_0173_A209AUABXX.*.*.txt | wc -l
4. cat “temp.txt” | perl cp_qcFiles.pl
5. As now, qc report files have been copied to /tmp/ directory
B Next, work as jyli:
1. Log in lbg-compute as jyli
2. Copy qc report file to working directory
Help desk work with Tangi
1. Meet her and get a feeling about her project
2. Will start work on this on the 16th,
11122010
1. Presented the on-going QC assessment on two new HiSeq flowcell
2. Keep mornitoring the seqware process from the scratch space, as of 2:00 p.m.
[seqware@lbg-compute RNASeqAlignmentBWA]$ find . -name 101103_UNC10_SN254_0172_B209C5ABXX.*.*.txt | wc -l
64
[seqware@lbg-compute RNASeqAlignmentBWA]$ find . -name 101103_UNC10_SN254_0173_A209AUABXX.*.*.txt | wc -l
76
[seqware@lbg-compute RNASeqAlignmentBWA]$ find . -name 101103_UNC10_SN254_0173_A209AUABXX.*.*.png | wc -l
21
[seqware@lbg-compute RNASeqAlignmentBWA]$ find . -name 101103_UNC10_SN254_0173_A209AUABXX.*.*.png | wc -l
21
3. One thing must be done is to insert the IllQualStat into the analysis.
a. May need to delete/clean up the error first
b. Re-run IllQualStat and insert into DB.
11152010
A. A request from Chuck in the morning
For Chuck’s request email:
**************************************************
Jianying,
can you run your quality metrics and give me a slide similar to slide 2 and slide 3 for the following samples. I also need this for TCGA meeting, so please do soon. thanks CHUCK
PHR-MB 100617_UNC8-RDR3001640_0008_61L3JAAXX
PHR-MB 100625_UNC1-RDR301647_0009_705PLAAXX
PHR-MB 100701_UNC4-RDR3001561_0011_622TVAAXX
PHR-MG 100617_UNC8-RDR3001640_0008_61L3JAAXX
PHR-RIBO- 100617_UNC8-RDR3001640_0008_61L3JAAXX
PHR-RIBO- 100701_UNC4-RDR3001561_0011_622TVAAXX
PHR-RIBO- 100625_UNC1-RDR301647_0009_705PLAAXX
PHR-T 100916_UNC4-RDR3001561_00024_FC_62HW7AAXX
PHR-TD 100617_UNC8-RDR3001640_0008_61L3JAAXX
PHR-TD 100701_UNC4-RDR3001561_0011_622TVAAXX
**********************************************************************************************************
1. Getting experimental information via DB query:
psql -h swmaster.bioinf.unc.edu -U seqwarero -W seqware_meta_db
Use the readonly account:
u: seqwarero
p: read0Nly%
2. The following query returns the information we need:
select s.title, sr.name, l.lane_index+1 as lane from sample as s, sequencer_run as sr, lane as l where s.sample_id = l.sample_id and l.sequencer_run_id = sr.sequencer_run_id and s.title like ‘%PHR%’order by sr.name, l.lane_index asc;
title | name | lane
———–+——————————————-+——
PHR-MB | 100617_UNC8-RDR3001640_0008_61L3JAAXX | 2
PHR-MG | 100617_UNC8-RDR3001640_0008_61L3JAAXX | 3
PHR-RIBO- | 100617_UNC8-RDR3001640_0008_61L3JAAXX | 4
PHR-TD | 100617_UNC8-RDR3001640_0008_61L3JAAXX | 5
PHR-MB | 100625_UNC1-RDR301647_0009_705PLAAXX | 2
PHR-RIBO- | 100625_UNC1-RDR301647_0009_705PLAAXX | 3
PHR-TD | 100701_UNC4-RDR3001561_0011_622TVAAXX | 1
PHR-MB | 100701_UNC4-RDR3001561_0011_622TVAAXX | 2
PHR-RIBO- | 100701_UNC4-RDR3001561_0011_622TVAAXX | 3
PHR-T | 100916_UNC4-RDR3001561_00024_FC_62HW7AAXX | 3
3. Getting the qc file information as it was done with HiSeq data, but found out that we are missing 5 out of 10 lanes. It happens that these five lanes are what Matt has been actively working on. Therefore, I can only get a summary on five of them.
B. Cleaning up SeqWare LIMS
11182010
1. Refers to Survival analysis blog
11302010
1. Get vendorQCsummary inserted to 101103_UNC10_SN254_0172_B209C5ABXX
a. Log in swmaster as seqware and go to /perl/bin/
b. Follow the commands as following:
perl sw_util_insert_processing_event.pl –username seqware –password ***** –dbhost swmaster.bioinf.unc.edu –db seqware_meta_db –parent-accession 80934 –file /datastore/nextgenout/seqware-analysis/illumina/101103_UNC10_SN254_0172_B209C5ABXX/seqware-0.7.0_RNASeqAlignmentBWA-0.7.4/seqware-0.7.0_BWA-0.7.0/101103_UNC10_SN254_0172_B209C5ABXX.1.vendorQCsummary.txt=text/key_value –algorithm IllQualStat –version 0.7.0
c. Detail command was saved as command_11302010
2. To delete an erroneous run with either “failed” or “running” (for prolonged period of time), do the following:
a. Have to be very careful!!!! As the delete will recursively delete all events after the parent accession ID
b. It would be good to do pgdum just in case
c. Prior to delete, check the meta-db for a little details:
select * from processing where sw_accession = 81726;
select * from processing where stdout like ‘%101103_UNC10_SN254_0172_B209C5ABXX.3.trimmed.annotated.bam%’;
It turned out that two attempts were made with one successful.
d. So, it is safe to use the following command to delete the “pending” record:
perl sw_util_delete_processing_events.pl –username seqware –password seqware –dbhost swmaster.bioinf.unc.edu –db seqware_meta_db –accession 81726
3. When check the flag html report, found out some adapter contamination files were missing. It turned out that we are missing files on /datastore/nextgenout/seqware-analysis/illumina/101103_UNC10_SN254_0172_B209C5ABXX/seqware-0.7.0_RNASeqAlignmentBWA-0.7.4/seqware-0.7.0_BWA-0.7.0/. But they are still in scratch space under “seqware” user.
Talked to John but did not find a concrete answer.
Work progress doc (Oct. 2010)
Starting from today, I am using this blog to keep track of my work, replacing word, notepad, etc. It goes by month and hopefully, it helps!
10252010
(1). Check my QC modules
[jyli@lbg-compute ~]$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
—————————————————————————————————————–
147313 0.55500 TrimAdaptR jyli r 10/20/2010 14:02:12 tcga.q@lbg-comp7-pvt.bioinf.un 1
147941 0.55500 uniqReadRe jyli r 10/20/2010 20:45:20 tcga.q@lbg-comp11-pvt.bioinf.u 1
147948 0.55500 AlignStatR jyli r 10/20/2010 20:48:20 tcga.q@lbg-comp10-pvt.bioinf.u 1
Three QC modules are still running. Send out the forth one just to correct the adapter contamination output!
However, two more issues came up:
1. What about the threshold of qc genome?
2. IllQualStat?
Solution:
1. It might be easier to implement a system to monitor the flag, view it individually (for upcoming new one), and reset to “zero” if possible
3. Need write permission for BCtrend modules
4. Switch to seqware user is urgent!
(2). Check Ying Du’s data processing
Ying’s email:
Hi Jianying,
It seems that that seqware has already processed these two flow cells by last weekend.Thanks a lot. Here I just have some questions regarding some individual file content and hope you can help.
1. In the file, say, “101004_UNC5-RDR300700_00028_FC_62GLPAAXX.1.trimmed.annotated.gene.quantification.txt”:
a. what are the column headers? I figure that the first two are: gene and read.
b. How about the last two? Are they all RPKM?
c. If so, why there are two RPKM columns?
d. what does “?|100130436” mean? Is this a real gene or not?
The following is an example file:
?|100130426 3 0.558823529411765 0.316155262260932
?|100133144 21 1.32119205298013 0.747466414749356
?|100134869 18 0.858218318695107 0.485538319959073
?|10357 58 6.93081761006289 3.9211206111859
?|10431 574 38.8085867620751 22.0754422121086
— —
Solution: responded itemize
2. What’s the difference among the following files, particularly, c, and a & d?
a. “*.trimmed.annotated.exon.quantification.txt”
b. “*.trimmed.annotated.gene.quantification.txt”
c. “*.trimmed.annotated.spljxn.quantification.txt”
d. “*.trimmed.annotated.transcript.quantification.txt”
(3) Check html report on flags
(4) Start figuring out “generic” plot, which has been long postponed!!!
(5) Respond to Brian about work progress
In the past week:
1. Learned how to monitor seqware run, dissect problems and debug if necessary. Detail info Here
a. Consistently check LIMS looking for erroneous record, i.e. errors, stalled processing, flowcell without lane info. etc.
b. Check two cron jobs: runfolder2srf and seqware-bwa alignment
c. Check log files
d. Manually start some failed run
e. For some strange errors, need to look deeper in the log files, experiment information etc.
(1) Whether a touch file exists?
(2) Whether experimental information was correct, i.e. RNAseq was mis-placed as whole genome shot gun
(3) Email Matt, if it looks like an error links to PIPE process
2. QC module currating
a. Add BCtrend into workflow
b. Modified IllQualStat for proper parameterization
c. Modified TrimCountAdapater script and code, helped Matt with perl script and suggested strategy to further adapted it for small RNA project
d. Updates corresponding qc modules
e. Lower stringent of alignQual, uniqRead, and adapterCont, and qsubed retrospective fixing runs
3. Worked on Ying Du’s request
4. QC html report
a. Modified cronjob for nightly run on qc flag
b. Update all key_value in the module for proper MetaDB update and facilitate future query
For this week:
1. Check all QC modules to ensure proper fixing happened
2. Solve some mystery, i.e.
*********[jyli@lbg-compute ~]$ diff /datastore/nextgenout/seqware-analysis/illumina/100804_UNC6-RDR300211_00021_FC_629PLAAXX/seqware-0.7.0_RNASeqAlignmentBWA-0.7.1/seqware-0.7.0_BWA-0.7.0/100804_UNC6-RDR300211_00021_FC_629PLAAXX.6.adapterCont.txt /datastore/nextgenout/seqware-analysis/illumina/100804_UNC6-RDR300211_00021_FC_629PLAAXX/seqware-0.7.0_RNASeqAlignmentBWA-0.7.4/seqware-0.7.0_BWA-0.7.0/100804_UNC6-RDR300211_00021_FC_629PLAAXX.6.adapterCont.txt But, in qsub log, I don't see "100804_UNC6-RDR300211_00021_FC_629PLAAXX/seqware-0.7.0_RNASeqAlignmentBWA-0.7.4/seqware-0.7.0_BWA-0.7.0/100804_UNC6-RDR300211_00021_FC_629PLAAXX.6.adapterCont.txt ", why?? *********
2. Need to get switch off /seqware/ user for future LIMS/pipeline curation
3. Figure out a qc-flag updating system for “questionable” flag and flag reset (need Brian’s input)
4. Generic plot (QC flag, and all), and aggregate_plot still has some bugs ..
5. Need write permission on newly generated files (by Brian), will provide a list of file extension for Brian for periodically fixing
6. For those flowcells that got entered multiple times, need to figure out a systematically to fix that problem, i.e. [jyli@lbg-compute seqware-0.7.0_BWA-0.7.0]$ cd ../../../100603
100603-UNC3-RDR300156_0007/ 100603_UNC3-RDR300156_0007_61MM1AAXX/
1027201
1. Check uniqRead module fixing
a. From the qsub, it seems that it has finished
[jyli@swmaster ~]$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
—————————————————————————————————————–
147313 0.55500 TrimAdaptR jyli r 10/20/2010 14:02:12 tcga.q@lbg-comp7-pvt.bioinf.un 1
147948 0.55500 AlignStatR jyli r 10/20/2010 20:48:20 tcga.q@lbg-comp10-pvt.bioinf.u 1
151838 0.55500 AdapterCon jyli r 10/25/2010 12:21:30 tcga.q@lbg-comp6-pvt.bioinf.un 1
b. But, there are still many missing flags in the html report, i.e. 101004_UNC5-RDR300700_00028_FC_62GLPAAXX.8
c. Go go the directory: /datastore/nextgenout/seqware-analysis/illumina/101004_UNC5-RDR300700_00028_FC_62GLPAAXX/seqware-0.7.0_RNASeqAlignmentBWA-0.7.4/seqware-0.7.0_BWA-0.7.0/
d. But, 101011_UNC8-RDR3001640_00037_FC_62M6WAAXX.1 has the results!! (done on 10-26-2010)
e. What is going on with “101004_UNC5”?
Solution:
a. Go to the directory and check log “/home/jyli/svnroot/TCGA-dev/scripts/user/jyli/QC_modules/uniqReadDir”
b. Could not find a record associated with 101004, less uniqReadRetro.o147941, but did find 101011
c. Check script and check DB
d. Use this command to get query result written into a file, stalled for some reason? Well, has to be swmaster!!!
**************************
psql -h swmaster -U seqwarero -W seqware_meta_db -c “select f.file_path from processing as p, processing_files as pf, file as f where f.file_path like ‘%uniqRead.txt’ and p.algorithm = ‘uniqRead’ and p.processing_id = pf.processing_id and pf.file_id = f.file_id; ” > uniqRead_files.txt
**************************
e. Guess, find out the answer: it exists in DB but no record on the file system
f. Need help from Brian for DB debugging!!!
g. It seems that it happened randomly, but cross the board, HELP!!
h. Another case: 100908_UNC8-RDR3001640_00020_FC_62ENHAAXX.7!! Lane 7 failed (so did lane 1), but two 100908_UNC8-RDR3001640_00020_FC_62ENHAAXX.4.uniqRead.txt were generated??
i. Regardless of what happened, try to run the script on the “erroneous” lane and found out that the reason it failed, because no .srf file existed and it failed at the converting step and further no .fastq file for such process
From group meeting:
Two urgent stuffs need a deadline:
1. Friday (10-29-10), for the qc threshold fixing
New strategy: instead of re-run module, just fixing the qc file, re-cal threshold and replacing the file.
2. Next week (first week in Nov.) for aggregate report
10282010
Code review from Brian
*************
Jianying, most have the error:
net.sourceforge.seqware.pipeline.modules.qc.IllQualStat.init
‘infile’ is not a recognized option
The method ‘init’ exited abnormally so the Runner will terminate here!
seqware_meta_db=> select algorithm, count(algorithm) from processing where status = ‘failed’ group by algorithm
For Jianying:
>>> perl/bin/calculate_adapter_contamination.pl <<$output”);
-#print OUT “query_filet$infileninput_seqt$primernadapter_seqt$rcoutn”;
+print OUT “query_filet$infileninput_seqt$primernadapter_seqt$rcoutn”;
$err = “”;
if (! -e “$infile”) { $err = “input file does not exist”; }
@@ -92,10 +92,9 @@
$pctA = sprintf(“%.2f”, 100*$ct2/$ct1);
$pctAD = sprintf(“%.2f”, 100*$ctsize0/$ct1);
$mean = $efflengthsum/$ct1;
– if ($pctA < 15){$flag =0;} #Lowever to low stringency, 10192010 JYL
– #print OUT "run_statustSUCCESSnadapter_ctt$ct2nadapter_percntt$pctAnmean_effective_read_lengtht$meannadapter_dimer%t$pctADn";
– print OUT "run_statustSUCCESSnadapter_ctt$ct2nadapter_percntt$pctAnadaptor_flagt$flagn";
– # print OUT "adaptor_flagt$flagn";
+ if ($pctA >> perl/bin/sw_module_TrimCountAdapter.pl <<<
15 needs to be a parameter
Ans: Yes, this should happen in 0.7.5
Index: perl/bin/sw_module_TrimCountAdapter.pl
===================================================================
— perl/bin/sw_module_TrimCountAdapter.pl (revision 1788)
+++ perl/bin/sw_module_TrimCountAdapter.pl (working copy)
@@ -109,13 +109,7 @@
}
close(OUTFQ);
$pct = sprintf("%.2f", 100*$adpreadct/$allreadct);
–
-$flag = 1;
-if ($pct >> perl/bin/sw_module_alignStat.pl <<<
30 and 90 need to be parameters not hardcoded.
Ans: Yes, this should happen in 0.7.5
Index: perl/bin/sw_module_alignStat.pl
===================================================================
— perl/bin/sw_module_alignStat.pl (revision 1788)
+++ perl/bin/sw_module_alignStat.pl (working copy)
@@ -57,8 +57,7 @@
print OUT "alignstat_total_readt$totalCountn";
print OUT "alignstat_mapped_readt$mappedCountn";
print OUT "alignstat_mapped_percntt$percentagen";
– #if ($percentage 85){
– if ($percentage 90){ #lowered the
stringency for reporting purpose, per group meeting discussion with
Brian and Matt on 10132010
+ if ($percentage 85){
$flag = 1;
}else{ $flag = 0; }
print OUT “alignstat_flagt$flagn”;
>>> perl/bin/sw_module_readDepth.pl << 50 && uniquePerc 30 && uniquePerc 1000000) { $totalFlag = 0;}
-#if ($totalSeq > 2000000) { $totalFlag = 0;} #lowered stringency 10202010 JYL
+if ($uniquePerc > 50 && uniquePerc 2000000) { $totalFlag = 0;}
print O “input_filet$infilen”;
print O “total_readst$totalSeqn”;
print O “unique_readst$uniqueSeqn”;
Java code:
>>> java/src/net/sourceforge/seqware/pipeline/modules/qc/geneCoverage.java <<<
Did you change the workflow XML?
Ans: No, I don't know how to change workflow XML.
Index: java/src/net/sourceforge/seqware/pipeline/modules/qc/geneCoverage.java
===================================================================
— java/src/net/sourceforge/seqware/pipeline/modules/qc/geneCoverage.java (revision
1788)
+++ java/src/net/sourceforge/seqware/pipeline/modules/qc/geneCoverage.java (working
copy)
@@ -224,9 +224,8 @@
ReturnValue ret = new ReturnValue();
ret.setExitStatus(ReturnValue.SUCCESS);
ret.setRunStartTstmp(new Date());
– String output = (String)options.valueOf("outfile");
– String outplot = (String)options.valueOf("geneCoveragePlot");
–
+ String output = (String)options.valueOf("outfile");
+ String outplot = (String)options.valueOf("covergePlotFile");
StringBuffer cmd = new StringBuffer();
cmd.append(options.valueOf("perl") + " " +
options.valueOf("script") + " " + options.valueOf("infile") + " ");
cmd.append(options.valueOf("outfile") + " " +
options.valueOf("geneCoveragePlot") + " ");
@@ -245,7 +244,6 @@
//Need to fix here for plot in R, jyl
– //FIXME Only register single file??
FileMetadata fm2 = new FileMetadata();
fm2.setMetaType("png/geneCoverage plot");
fm2.setFilePath(outplot);
**************
Fixing QC flags ad hoc
Flag stringency has been lowered and located at: /home/jyli/current_projects/aggregat_report/qc_flags/collected_flags.xlsx
Current flag setting in the working modules:
uniqRead:
if ($uniquePerc > 30 && $uniquePerc < 95 1000000) { $totalFlag = 0;}
if ($uniqFlag ==1 || $totalFlag==1){$flag = 1;}
adapterCont/TrimCountAdapt:
if ($pctA < 15){$flag =0;}
if ($pct < 15) {$flag =0;}
IllQualStat:
if ($lines[$i] < 70) {$C_flag = 1;}
if ($lines[$i] < 70) {$I_flag = 1;}
if (($C_flag == 1) && ($I_flag == 1))
{ print OUT "illqualstat_flagt1n";
}else {print OUT "illqualstat_flagt0n";}
alignStat:
if ($percentage 90){ #lowered the stringency for reporting purpose, per group meeting discussion with Brian and Matt on 10132010
$flag = 1;
}else{ $flag = 0; }
qcGenome:
if ($pctR > 10 || $pctV > 5){$flag = 1;}
BCtrend:
$FLAG = 2;
if ($drop > $FLAG)
{
print OUT “$lineCntt$dropn”;
$flag = 1;
}
1. Instead of running the modules again (which takes much time), I am re-writing scripts to re-calculate the threshold and update the flags.
2. Workingdir: /home/jyli/svnroot/TCGA-dev/scripts/user/jyli/QC_modules/qcFlagFixing
3. For each QC flag
a. uniqRead: uniqRead_reCal.pl, fixing_uniqRead_flags.job
b. alignStat: alignStat_reCal.pl, fixing_alignStat_flags.job
c. qcGenome: qcGenome_reCal.pl, fixing_qcGenome_flags.job
d. adaptercont: adapterCont_reCal.pl fixing_adapterCont_flags.job
10292010
1. Get to watch over the seqware pipeline to catch errors early on
2. Two major module errors: srf2fastq and illqualstat. Brian fixed IllQualStat, and Matt is working with Brian to fix the other
3. Matt sent document for proper handling of parameterization in Perl program
***begining of program:
”
use strict;
use Getopt::Long;
my ($primer, $infile, $outfastq, $outstats, $seed_length, $flag_bad_adpt_value, $flag_bad_adpt_symbol, $min_read_length);
my $argSize = scalar(@ARGV);
my $getOptResult = GetOptions(‘primer=s’ => $primer,
‘infile=s’ => $infile,
‘outfastq=s’ => $outfastq,
‘outstats=s’ => $outstats,
‘seed_length=s’=>$seed_length,
‘flag_bad_adpt_value=i’ => $flag_bad_adpt_value,
‘flag_bad_adpt_symbol=s’ => $flag_bad_adpt_symbol,
‘min_read_length=i’ => $min_read_length);
usage() if ( $argSize < 14 || !$getOptResult)
***end of program, or where ever your functions are:
"
# statement displayed to user if parameters not properly filled
sub usage {
print "Unknown option: @_n" if ( @_ );
print "usage: program –primer [PRIMER or reverse complement of adapter, ACTG are possible] –infile [FASTQ format of reads] –outfastq [FASTQformat of trimmed reads] –outstats [tab-delimited text file of trimming statistics] –seed_length [Minumum length of the first adapter nucleotides you want to match sequencing reads] –flag_bad_adpt_value [Possible values are integers between [0..100]. What limit do you want to assign to adapter percentage so that the fastq file is marked 'bad'] –flag_bad_adpt_symbol [Possible values are 'lt' or 'gt'; Do you want to assign a 'bad' flag if the amount of adapter in fastq file is 'less than'(lt) or 'greater than'(gt) the 'flag_bad_adpt_value' paramter] –min_read_length [minimum number of nucleotides in each read of the filtered fastq file, the rest are thrown away] [[–help|-?]n";
exit;
}
************************************
ERRORS!!
There is problem with adapterCont files, 23 of them were in the TrimCountAdapt format!!
Solution: return the module first to convert those files to the correct format!
Script used: /home/jyli/svnroot/TCGA-dev/scripts/user/jyli/QC_modules/qcFlagFixing/TrimCountAdapterSingleRun.pl
Once this is fixed, will run adapter recaluation!
From Friday group meeting
1. Need to watch the seqware pipeline and identify problems if any
2. Need to re-run IllQualStat since this had the most failure from the pipeline run
3. Need to start/complete the aggregate plot urgently!!!!!
Checking QC output performance per Brian’s request
################
Also, you were going to make sure that all your code is checked in and you’re good for the 0.7.4 tag, is all your perl/java code checked in and passing compilation?
################
Using “100901_UNC6-RDR300211_00028_FC_62ERHAAXX” (lane 7) as an example since it has 328 success with no errors:
a. adapterCont:
run_status SUCCESS
adapter_ct 119
adapter_percnt 1.09
adaptor_flag 0
b. alignStat:
alignstat_query_file /datastore/scratch/users/brianoc/pegasus/RNASeqAlignmentBWA/run0262/./temp33634210274080599152657268018340463/100901_UNC6-RDR300211_00028_FC_62ERHAAXX.7.trimmed.annotated.sam
alignstat_total_read 10667
alignstat_mapped_read 8844
alignstat_mapped_percnt 82.9099090653417
alignstat_flag 0
c. BCtrend
bctrend_flag 0
d. IllQualStat:
experiment 100901_UNC6-RDR300211_00028_FC_62ERHAAXX
machine UNC6-RDR300211
lane 7
lane_yield 2313
clusters_raw 313252
clusters_raw_sd 22114
clusters_pf 253570
clusters_pf_sd 11888
1st_cycle_int 370
1st_cycle_int_sd 15
percnt_intensity_after_20_cycles 78.88
percent_intensity_after_20_cycles_sd 1.61
percent_pf_dlusters 81.08
percent_pf_clusters_sd 2.06
percent_align 0
percnet_align_sd
alignment_score 0
alignment_score_pf
percent_error_rate 0
illqualstat_flag 0
e. QCGenome:
qcgenome_query_file datastore/nextgenout/seqware-analysis/illumina/100901_UNC6-RDR300211_00028_FC_62ERHAAXX/seqware-0.7.0_RNASeqAlignmentBWA-0.7.3/seqware-0.7.0_BWA-0.7.0/100901_UNC6-RDR300211_00028_FC_62ERHAAXX.7.trimmed.fastq
qcgenome_read_total 10667
reads_rRNA 49
percnt_rRNA 0.46
reads_viral 4
percnt_viral 0.04
qcgenome_flag 0
f. uniqRead:
input_file /home/jyli/svnroot/TCGA-dev/scripts/user/jyli/QC_modules/uniqReadDir/100901_UNC6-RDR300211_00028_FC_62ERHAAXX.7.fastq
total_reads 10872
unique_reads 10737
unique_percent 98.7582781456954
read_pf_flag 1
Another example: 100810_UNC2-RDR300275_00019_FC_629FEAAXX (lane 4)
adapterCont:
run_status SUCCESS
adapter_ct 940956
adapter_percnt 4.42
adaptor_flag 0
alignStat:
alignstat_query_file /datastore/scratch/users/brianoc/pegasus/RNASeqAlignmentBWA/run0163/./temp763227124195285300994823188765076/100810_UNC2-RDR300275_00019_FC_629FEAAXX.4.trimmed.annotated.sam
alignstat_total_read 21230415
alignstat_mapped_read 18109190
alignstat_mapped_percnt 85.2983326044262
alignstat_flag 0
BCtrend:
bctrend_flag 0
IllQualStat:
experiment 100810_UNC2-RDR300275_00019_FC_629FEAAXX
machine UNC2-RDR300275
lane 4
lane_yield 1619
clusters_raw 202556
clusters_raw_sd 16202
clusters_pf 177569
clusters_pf_sd 13108
1st_cycle_int 419
1st_cycle_int_sd 13
percnt_intensity_after_20_cycles 79.64
percent_intensity_after_20_cycles_sd 1.99
percent_pf_dlusters 87.71
percent_pf_clusters_sd 1.10
percent_align 0
percnet_align_sd
alignment_score 0
alignment_score_pf
percent_error_rate 0
illqualstat_flag 0
QCGenome:
qcgenome_query_file datastore/nextgenout/seqware-analysis/illumina/100810_UNC2-RDR300275_00019_FC_629FEAAXX/seqware-0.7.0_RNASeqAlignmentBWA-0.7.4/seqware-0.7.0_BWA-0.7.0/100810_UNC2-RDR300275_00019_FC_629FEAAXX.4.trimmed.fastq
qcgenome_read_total 21230415
reads_rRNA 223212
percnt_rRNA 1.05
reads_viral 135055
percnt_viral 0.64
qcgenome_flag 0
uniqRead:
input_file /home/jyli/svnroot/TCGA-dev/scripts/user/jyli/QC_modules/uniqReadDir/100810_UNC2-RDR300275_00019_FC_629FEAAXX.4.fastq
total_reads 21308392
unique_reads 11634508
unique_percent 54.6005911661471
read_pf_flag 0
Checking the modules
Protected: Retrospective QC assessment
Programming involving database
This is the first “official” trial and my early learning experiences doing program involving interaction with a database. Here I am using postgreSQL database as an example, which is our SeqWare MetaDB as well our SeqWareLIMS are relying on.
Some background knowledge about postgreSQL goes here…
Some SeqWare know-how goes here….
First of all, test some database query
To log in database, issue the following command (for the best results from swmaster.med.unc.edu)
psql -h swmaster.bioinf.unc.edu -U seqware -W seqware_meta_db
u: seqware
p: seqware
Here comes queries:
The first script involves DB query is /home/jyli/svnroot/TCGA-dev/scripts/user/jyli/QC_modules/retro_IllQualStat.pl and it performs the following two tasks:
1. Query the database and get all available runfolder
2. Find the most recently RNAseqAlignmentBWA output folder
3. Run sw_module_IllQualStat.pl (/home/jyli/svnroot/seqware-complete/trunk/seqware-pipeline/perl/bin/sw_module_IllQualStat.pl) for per lane based vendorQC
For DB study
Question: to find out how many flowcells do we have in DB and out of which how many have RNAseqAlignmentBWA event?
Answer: sequence_run table, lane table, sequencer_run linked to lane, lane is linked to processing_lanes, and processing_lanes is linked to processing
SeqWare production pipeline tracking
As a transition, Brian will hand off such duty onto me and Matt, in an attempt that Matt and I can cover each other in case one is on vacation.
1. Background:
Currently, two pars of run composes the seqware automation pipeline
a. a cron job is run to convert runfolder to srf format (module Illumina2SRF)
b. a manually submitted run, afterward basically, from srf to the end of RNAseqAlignment. this job is run by firing up a script “decider.pl” where? In the long run, this needs to change:
i) Does not need to run off any single user’s home directory, but rather a seqware user
ii) It needs a cron job too to periodically initiate the pipeline run.
2. Error checking
Need to learn “how to detect errors”
a. Go to LIMS at and look for processes with “error”s
b. All run (log) are stored at “/home/brianoc/scratch/pegasus/RNASeqAlignmentBWA/run####”
c. Use the MetaDB to query for the cause of errors
Log in to an instance of swmaster, and log in seqware metadb as following. Detail see
psql -h swmaster.bioinf.unc.edu -U seqware -W seqware_meta_db
u: seqware
p: seqware