SeqWare debugging

For seqware production, need to monitor the process report/fix errors accordingly.

Two important page:

seqware LIMS page

debugging page

11042010

A. It seems that Brian has cleaned up the errors, thanks!!

1. No event has happened to “101028_UNC2-RDR300275_00040_FC_62E9AAXX” yet
2. Only three successes with “101027_UNC7-RDR3001641_00050_FC_62J8EAAXX”, what is going on?
3. Hold on a minute, Brian corrected (deleted) the error, but did NOT fix the problem. Those missing files need to be re-stored!!

B. There was a problem (linking the file). For flowcell: 101022_UNC7-RDR3001641_00049_FC_62J92AAXX

1. 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.vendorQCsummary.txt exists on FS, but when click from LIMS, it fails
2. Detail output files like following:
/datastore/nextgenout/seqware-analysis/illumina/101022_UNC7-RDR3001641_00049_FC_62J92AAXX/seqware-0.7.0_RNASeqAlignmentBWA-0.7.4/seqware-0.7.0_BWA-0.7.0
[jyli@lbg-compute seqware-0.7.0_BWA-0.7.0]$ ls -tral 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.*
-rw-r—– 1 seqware nextgenseq 154 2010-11-02 02:33 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.uniqRead.txt
-rw-r—– 1 seqware nextgenseq 7494 2010-11-02 02:33 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.stat.txt
-rw-r—– 1 seqware nextgenseq 139 2010-11-02 02:33 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.qualFilter.txt
-rw-r—– 1 seqware nextgenseq 478 2010-11-02 02:33 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.vendorQCsummary.txt
-rw-r—– 1 seqware nextgenseq 65 2010-11-02 02:33 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.adapterCont.txt
-rw-r—– 1 seqware nextgenseq 1781 2010-11-02 02:33 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.trimmed.txt
-rw-r—– 1 seqware nextgenseq 405 2010-11-02 02:33 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.stat.summary.txt
-rw-r—– 1 seqware nextgenseq 6327810056 2010-11-02 02:47 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.trimmed.fastq
-rw-r—– 1 seqware nextgenseq 14187 2010-11-02 03:46 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.bxp.png
-rw-r—– 1 seqware nextgenseq 13962 2010-11-02 03:46 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.BCtrend.png
-rw-r—– 1 seqware nextgenseq 366 2010-11-02 03:46 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.BCtrend.txt
-rw-r—– 1 seqware nextgenseq 15 2010-11-02 03:46 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.BCflag.txt
-rw-r—– 1 seqware nextgenseq 344 2010-11-02 03:46 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.qcGenome.txt
-rw-r—– 1 seqware nextgenseq 298 2010-11-02 03:46 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.GenericQCGenome.txt
-rw-r—– 1 seqware nextgenseq 11917 2010-11-02 03:46 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.ntDistr.png
-rw-r—– 1 seqware nextgenseq 2013317829 2010-11-02 03:49 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.trimmed.raw.bam
-rw-r—– 1 seqware nextgenseq 2068081081 2010-11-02 04:36 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.trimmed.annotated.bam
-rw-r—– 1 seqware nextgenseq 9765 2010-11-02 07:34 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.TRcov.png
-rw-r—– 1 seqware nextgenseq 972296 2010-11-02 07:34 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.trimmed.annotated.gene.quantification.txt
-rw-r—– 1 seqware nextgenseq 2003 2010-11-02 07:34 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.TRcov.txt
-rw-r—– 1 seqware nextgenseq 315 2010-11-02 07:34 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.alignStat.txt
-rw-r—– 1 seqware nextgenseq 3343555 2010-11-02 07:34 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.trimmed.annotated.transcript.quantification.txt
-rw-r—– 1 seqware nextgenseq 2060617606 2010-11-02 07:36 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.trimmed.annotated.translated_to_genomic.bam
-rw-r—– 1 seqware nextgenseq 8458 2010-11-02 08:47 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.geneCoverage.png
-rw-r—– 1 seqware nextgenseq 811 2010-11-02 08:47 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.geneCoverage.txt
-rw-r—– 1 seqware nextgenseq 8950203 2010-11-02 08:47 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.trimmed.annotated.spljxn.quantification.txt
-rw-r—– 1 seqware nextgenseq 13842696 2010-11-02 08:47 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.trimmed.annotated.exon.quantification.txt
-rw-r—– 1 seqware nextgenseq 70608115 2010-11-02 08:47 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.bigwig

C. As of today, 11042010, how does QC do? Using 101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1 as an example

1. IllQualStat (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.vendorQCsummary.txt

2. BCtrend (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.BCtrend.png
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.BCtrend.txt
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.BCflag.txt

3. qualStat (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.stat.txt
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.stat.summary.txt

4. uniqRead (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.uniqRead.txt

5. adapter/TrimCountAdapt(success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.adapterCont.txt

6. qualFilter (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.adapterCont.txt

7. qcGenome (success):

101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.qcGenome.txt

8. alignStat (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.alignStat.txt

9. geneCoverage (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.geneCoverage.txt
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.geneCoverage.png

10. plotStats (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.bxp.png
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.ntDistr.png

11.coverageXTranscript (success):
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.TRcov.txt
101022_UNC7-RDR3001641_00049_FC_62J92AAXX.1.TRcov.png

11052010

1. Helped Alicia to get her data
2. For IllQualStat
3. For BCtrend

Work note November 2010

Nov. 1st, 2010

1. Proposed meeting with Sai and Brian for redefining my position, see Brian’s email:

********************
Hi Jianying,

OK, let’s sit down together on Monday some time and iron out the issues you described below.

Also Sai wants to meet with the two of us to talk about redefining your position. So we should save some time for that on Monday too.
Not a huge change for the time being, more of a change in funding source.

Also, to follow up with your workflow question ${bin_dir} corresponds to the directory provisioned by Pegasus on the cluster node (which stores the binaries used by the modules). Not the bin directories in SeqWare subversion. So this workflow code is correct.

–Brian

***************
2. Check the adapterCont ouput and found out that 29 files are missing the “real” adapterCont output. Detail see the error log,
3. Using the “TrimCountAdapterSingleRun.pl” to correct those files and re-processing.
4. Error in the script??
5. It turns out that “/PullFromIlluminaSequencerToSRF/” does NOT exist for those flowcell!!!
[jyli@lbg-compute qcFlagFixing]$ ls -tral /datastore/nextgenout/seqware-analysis/illumina/100722_UNCI-RDR301647_0014_706GFAAXX/
total 20
drwxrwsr-x 3 brianoc nextgenseq 4096 2010-08-16 16:55 .
drwxrwsr-x 3 brianoc nextgenseq 4096 2010-08-16 16:55 seqware-0.7.0_RNASeqAlignmentBWA-0.7.1
drwxrwsr-x 117 brianoc nextgenseq 12288 2010-10-27 08:35 ..

ls -tral /datastore/nextgenout/seqware-analysis/illumina/100603-UNC3-RDR300156_0007/

[jyli@lbg-compute qcFlagFixing]$ ls -al /datastore/nextgenout/seqware-analysis/illumina/100608_UNC3-RDR300156_0008/
total 40
drwxrwsr-x 8 brianoc nextgenseq 4096 2010-09-09 07:05 .
drwxrwsr-x 117 brianoc nextgenseq 12288 2010-10-27 08:35 ..
drwxrwsr-x 2 brianoc nextgenseq 4096 2010-09-23 16:31 seqware-0.7.0_PullFromIlluminaSequencerToSRF-0.7.0
drwxrwsr-x 2 brianoc nextgenseq 4096 2010-08-03 17:01 seqware-0.7.0_PullFromIlluminaSequencerToSRF-0.7.0_1277671957000
drwxrwsr-x 3 brianoc nextgenseq 4096 2010-06-29 17:21 seqware-0.7.0_RNASeqAlignmentBWA-0.7.0
drwxrwsr-x 3 brianoc nextgenseq 4096 2010-08-18 14:05 seqware-0.7.0_RNASeqAlignmentBWA-0.7.2
drwxrwsr-x 3 brianoc nextgenseq 4096 2010-09-09 07:05 seqware-0.7.0_RNASeqAlignmentBWA-0.7.4
drwxr-sr-x 12 jyli nextgenseq 4096 2010-07-29 10:30 seqware-0.7.0_RNASeqQC-0.7.0

6. Or, miss and match happened. i.e. /datastore/nextgenout/seqware-analysis/illumina/100414_UNC4-RDR3001561_0001/seqware-0.7.0_RNASeqAlignmentBWA-0.7.4/seqware-0.7.0_BWA-0.7.0/PEROU_REF_6_TCGA00099_UNCCH_100414_UNC4-RDR3001561_0001_3.adapterCont.txt
7. So, put a halt on this since this is so unique!
8. For the fixing effort, it ran smoothly, and reported 152 missing adapterCont.txt files

As of this morning, four fixing scripts are done: adapterCont_reCal.pl, qcGenome_reCal.pl, uniqRead_reCal.pl, and alignStat_reCal.pl

Now, trying to get BCtrend and IllQualStat modules fixing scripts.

11022010

select count(*) from (select f.file_path from processing p, processing_files pf, file f where p.processing_id = pf.processing_id and pf.file_id = f.file_id and algorithm = ‘BCtrend’ and status not like ‘%failed%’and f.file_path like ‘%BCtrend.png%’)tbla;

11032010

To avoid loosing database information, always back up before delete/update anything with the following procedure:

pg_dump -h swmaster.bioinf.unc.edu -U seqware -W seqware_meta_db > MetaDB_schema.txt

Matt is helping with getting query for qualStat and BCtrend:

BCtrends modules which were run:

select count(*) from (select f.file_path from processing p, processing_files pf, file f where p.processing_id = pf.processing_id and pf.file_id = f.file_id and algorithm = ‘BCtrend’ and status not like ‘%failed%’and f.file_path like ‘%BCTrend.txt%’)tbla;

select count(*) from (select f.file_path from processing p, processing_files pf, file f where p.processing_id = pf.processing_id and pf.file_id = f.file_id and algorithm = ‘BCtrend’ and status not like ‘%failed%’and f.file_path like ‘%BCtrend.png%’)tbla;

Without, include sw_accesion:

select processing_id from processing p where algorithm = ‘BCtrend’ and status like ‘success’;

seqware_meta_db=> select count(*) from (select processing_id from processing p where algorithm = ‘qualStat’ and status like ‘success’)success_qs where success_qs.processing_id not in (select processing_id from processing p where algorithm = ‘BCtrend’ and status = ‘success’);
count
——-
929
(1 row)

select * from (select processing_id from processing p where algorithm = ‘qualStat’ and status like ‘success’)success_qs where success_qs.processing_id not in (select processing_id from processing p where algorithm = ‘BCtrend’ and status = ‘success’);

dt list tables with data in

using processing_relationship, processing table..

Why did I loose note from these two day’s note?
11042010
It was documented in seqware debugging page!!
11052010

Had a talk with Brian and Sai for the transition of TCGA developing to Bioinformatics consultation duty, at the new capacity, I will be responsible for the following few tasks:

1. Supporting Bioinformatics Core’s analytical component, which includes:
a. Microarry gene expression analysis
b. Bioinformatics Core walk-in clinic
c. Handle analysis request
d. Support Genomic Core, Yan Shi and Michael Topal’s facillity

2. Discussion with Chris Fan and Katie Hoadley for technical information exchange

3. Monthly presentation on work progress etc.

4. Set up a wiki page for such effort

5. Develop survey (SOP) for our support for internal evaluation and better communication with P.I. and scientists as customers

6. A point person for NGS data, eyes and ears

&. Will report to Sai from then on


11082010

1. While waiting for Matt for the BCtrend fixing, start to work on the qc aggregate plot (generic R code)
2. First, there is problem with QC files,
a. E.G. For 100813_UNC4-RDR3001561_00015_FC_629FHAAXX, we have missed all sort of files for some lanes, (lane 1), this include QC files as well as processed files for quantification
b. Some newly processed files have not qc report??
3. Some newly processed flowcells have problem with file association, it seems Brian (or someone) has fixed this problem
4. /home/jyli/svnroot/TCGA-dev/scripts/user/jyli/QC_modules/sw_module_aggrePlotAuto.pl seems to have a bug as it does NOT collect newly processed qc flag files, strange?? Need to fix this. Get this done!!

While checking, found the following:

qcGenome flag did not get updated (101018_UNC8*):

For example:

[jyli@lbg-compute seqware-0.7.0_BWA-0.7.0]$ more 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.qcGenome.txt
qcgenome_query_file datastore/nextgenout/seqware-analysis/illumina/101018_UNC8-RDR3001640_00040_FC_706KRAAXX/seqware-0.7.0_RNASeqAlignmentBWA-0.7.4/seqware-0.7.0_BWA-0.7.0/101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.trimmed.fastq
qcgenome_read_total 32799173
reads_rRNA 811883
percnt_rRNA 2.48
reads_viral 53074
percnt_viral 0.16
qcgenome_flag 1

But the script has been updated:

vi ~/svnroot/seqware-complete/trunk/seqware-pipeline/perl/bin/sw_module_RiboAndViralAln.pl

$flag = 0;
#if ($pctR > 2 || $pctV > 1){$flag = 1;}
if ($pctR > 10 || $pctV > 5){$flag = 1;} #lowered stringency for qc flag, need to be parameterized in seqware 0.7.5
print OUT “qcgenome_flagt$flagn”;

Some QC output are missing!

They are properly reflected on qc-flag-html report!

[jyli@lbg-compute seqware-0.7.0_BWA-0.7.0]$ ls -tral 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.*
-rw-r—– 1 seqware nextgenseq 346 2010-10-27 13:32 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.qcGenome.txt
CHECK -rw-r—– 1 seqware nextgenseq 11921 2010-10-27 13:32 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.ntDistr.png
-rw-r—– 1 seqware nextgenseq 298 2010-10-27 13:32 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.GenericQCGenome.txt
-rw-r—– 1 seqware nextgenseq 14580 2010-10-27 13:32 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.bxp.png
-rw-r—– 1 seqware nextgenseq 1961163791 2010-10-27 13:34 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.trimmed.raw.bam
-rw-r—– 1 seqware nextgenseq 2018005939 2010-10-27 13:38 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.trimmed.annotated.bam
-rw-r—– 1 seqware nextgenseq 2005 2010-10-27 18:13 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.TRcov.txt
-rw-r—– 1 seqware nextgenseq 991206 2010-10-27 18:13 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.trimmed.annotated.gene.quantification.txt
CHECK -rw-r—– 1 seqware nextgenseq 316 2010-10-27 18:13 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.alignStat.txt
-rw-r—– 1 seqware nextgenseq 9748 2010-10-27 18:13 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.TRcov.png
-rw-r—– 1 seqware nextgenseq 3329977 2010-10-27 18:13 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.trimmed.annotated.transcript.quantification.txt
-rw-r—– 1 seqware nextgenseq 2011755497 2010-10-27 18:16 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.trimmed.annotated.translated_to_genomic.bam
-rw-r—– 1 seqware nextgenseq 8486 2010-10-27 19:46 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.geneCoverage.png
-rw-r—– 1 seqware nextgenseq 806 2010-10-27 19:46 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.geneCoverage.txt
-rw-r—– 1 seqware nextgenseq 8943835 2010-10-27 19:46 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.trimmed.annotated.spljxn.quantification.txt
-rw-r—– 1 seqware nextgenseq 13433672 2010-10-27 19:46 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.trimmed.annotated.exon.quantification.txt
-rw-r—– 1 seqware nextgenseq 63351543 2010-10-27 19:46 101018_UNC8-RDR3001640_00040_FC_706KRAAXX.2.bigwig
CHECK

Some timing also seems confusing, i.e. 101018_UNC4:

[jyli@lbg-compute seqware-0.7.0_BWA-0.7.0]$ ls -al 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.2.*
-rw-r—– 1 seqware nextgenseq 421 2010-11-01 13:18 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.2.vendorQCsummary.txt
[jyli@lbg-compute seqware-0.7.0_BWA-0.7.0]$ ls -al 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.*
-rw-r—– 1 seqware nextgenseq 316 2010-10-27 19:48 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.alignStat.txt
-rw-r—– 1 seqware nextgenseq 49825998 2010-10-27 20:27 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.bigwig
-rw-r—– 1 seqware nextgenseq 13706 2010-10-27 16:05 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.bxp.png
-rw-r—– 1 seqware nextgenseq 8536 2010-10-27 20:27 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.geneCoverage.png
-rw-r—– 1 seqware nextgenseq 790 2010-10-27 20:27 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.geneCoverage.txt
-rw-r—– 1 seqware nextgenseq 298 2010-10-27 16:05 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.GenericQCGenome.txt
-rw-r—– 1 seqware nextgenseq 11946 2010-10-27 16:05 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.ntDistr.png
-rw-r—– 1 seqware nextgenseq 345 2010-10-27 16:05 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.qcGenome.txt
-rw-r—– 1 seqware nextgenseq 9816 2010-10-27 19:48 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.TRcov.png
-rw-r—– 1 seqware nextgenseq 1998 2010-10-27 19:48 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.TRcov.txt
-rw-r—– 1 seqware nextgenseq 1687645852 2010-10-27 17:30 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.trimmed.annotated.bam
-rw-r—– 1 seqware nextgenseq 13482365 2010-10-27 20:27 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.trimmed.annotated.exon.quantification.txt
-rw-r—– 1 seqware nextgenseq 1010573 2010-10-27 19:48 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.trimmed.annotated.gene.quantification.txt
-rw-r—– 1 seqware nextgenseq 8926642 2010-10-27 20:27 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.trimmed.annotated.spljxn.quantification.txt
-rw-r—– 1 seqware nextgenseq 3373515 2010-10-27 19:48 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.trimmed.annotated.transcript.quantification.txt
-rw-r—– 1 seqware nextgenseq 1686530032 2010-10-27 19:49 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.trimmed.annotated.translated_to_genomic.bam
-rw-r—– 1 seqware nextgenseq 1640686890 2010-10-27 16:07 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.trimmed.raw.bam
-rw-r—– 1 seqware nextgenseq 421 2010-11-01 13:18 101018_UNC4-RDR3001561_00039_FC_706G7AAXX.1.vendorQCsummary.txt

11092010

1. Keep working on aggregate plot…

2. Need to check the re-run for rerunning the flag computation..
a. Where are the re-run scripts: uniqRead_reCal.pl, alignStat_reCal.pl, qcGenome_reCal.pl, and adapterCont_reCal.pl
b. Four modules can be rerun for lowering the threshold (uniqRead, alignStat, alignStat, and qcGenome with four scripts located at //home/jyli/svnroot/TCGA-dev/scripts/user/jyli/QC_modules/qcFlagFixing/)

3. For IllQualStat:
a. Brian deleted those errorneous stuffs
b. For those with output in FS, need to insert to DB
c. For those not even available, need to re-run
d. Solution, an R script was created at /home/jyli/current_projects/SeqWareModules/qc_flags/R-stuffs/flag_summary_11092010.R
e. It seems that all current flowcell/lanes have the IllQualStat output
f. Therefore, the only thing that is needed is to insert into DB

11112010

A. Getting data from “scratch” space as seqware user

1. Log in lbg-compute with seqware user
2. Go to directory: /home/seqware/scratch/pegasus/
3. find . -name 101103_UNC10_SN254_0172_B209C5ABXX.*.*.txt > temp.txt
3.1 find . -name 101103_UNC10_SN254_0173_A209AUABXX.*.*.txt | wc -l
4. cat “temp.txt” | perl cp_qcFiles.pl
5. As now, qc report files have been copied to /tmp/ directory

B Next, work as jyli:

1. Log in lbg-compute as jyli
2. Copy qc report file to working directory

Help desk work with Tangi
1. Meet her and get a feeling about her project
2. Will start work on this on the 16th,

11122010

1. Presented the on-going QC assessment on two new HiSeq flowcell
2. Keep mornitoring the seqware process from the scratch space, as of 2:00 p.m.
[seqware@lbg-compute RNASeqAlignmentBWA]$ find . -name 101103_UNC10_SN254_0172_B209C5ABXX.*.*.txt | wc -l
64
[seqware@lbg-compute RNASeqAlignmentBWA]$ find . -name 101103_UNC10_SN254_0173_A209AUABXX.*.*.txt | wc -l
76
[seqware@lbg-compute RNASeqAlignmentBWA]$ find . -name 101103_UNC10_SN254_0173_A209AUABXX.*.*.png | wc -l
21
[seqware@lbg-compute RNASeqAlignmentBWA]$ find . -name 101103_UNC10_SN254_0173_A209AUABXX.*.*.png | wc -l
21

3. One thing must be done is to insert the IllQualStat into the analysis.
a. May need to delete/clean up the error first
b. Re-run IllQualStat and insert into DB.

11152010

A. A request from Chuck in the morning

For Chuck’s request email:

**************************************************
Jianying,

can you run your quality metrics and give me a slide similar to slide 2 and slide 3 for the following samples. I also need this for TCGA meeting, so please do soon. thanks CHUCK

PHR-MB 100617_UNC8-RDR3001640_0008_61L3JAAXX
PHR-MB 100625_UNC1-RDR301647_0009_705PLAAXX
PHR-MB 100701_UNC4-RDR3001561_0011_622TVAAXX
PHR-MG 100617_UNC8-RDR3001640_0008_61L3JAAXX
PHR-RIBO- 100617_UNC8-RDR3001640_0008_61L3JAAXX
PHR-RIBO- 100701_UNC4-RDR3001561_0011_622TVAAXX
PHR-RIBO- 100625_UNC1-RDR301647_0009_705PLAAXX
PHR-T 100916_UNC4-RDR3001561_00024_FC_62HW7AAXX
PHR-TD 100617_UNC8-RDR3001640_0008_61L3JAAXX
PHR-TD 100701_UNC4-RDR3001561_0011_622TVAAXX

**********************************************************************************************************
1. Getting experimental information via DB query:
psql -h swmaster.bioinf.unc.edu -U seqwarero -W seqware_meta_db

Use the readonly account:

u: seqwarero
p: read0Nly%

2. The following query returns the information we need:

select s.title, sr.name, l.lane_index+1 as lane from sample as s, sequencer_run as sr, lane as l where s.sample_id = l.sample_id and l.sequencer_run_id = sr.sequencer_run_id and s.title like ‘%PHR%’order by sr.name, l.lane_index asc;

title | name | lane
———–+——————————————-+——
PHR-MB | 100617_UNC8-RDR3001640_0008_61L3JAAXX | 2
PHR-MG | 100617_UNC8-RDR3001640_0008_61L3JAAXX | 3
PHR-RIBO- | 100617_UNC8-RDR3001640_0008_61L3JAAXX | 4
PHR-TD | 100617_UNC8-RDR3001640_0008_61L3JAAXX | 5
PHR-MB | 100625_UNC1-RDR301647_0009_705PLAAXX | 2
PHR-RIBO- | 100625_UNC1-RDR301647_0009_705PLAAXX | 3
PHR-TD | 100701_UNC4-RDR3001561_0011_622TVAAXX | 1
PHR-MB | 100701_UNC4-RDR3001561_0011_622TVAAXX | 2
PHR-RIBO- | 100701_UNC4-RDR3001561_0011_622TVAAXX | 3
PHR-T | 100916_UNC4-RDR3001561_00024_FC_62HW7AAXX | 3

3. Getting the qc file information as it was done with HiSeq data, but found out that we are missing 5 out of 10 lanes. It happens that these five lanes are what Matt has been actively working on. Therefore, I can only get a summary on five of them.

B. Cleaning up SeqWare LIMS

11182010

1. Refers to Survival analysis blog

11302010

1. Get vendorQCsummary inserted to 101103_UNC10_SN254_0172_B209C5ABXX

a. Log in swmaster as seqware and go to /perl/bin/
b. Follow the commands as following:
perl sw_util_insert_processing_event.pl –username seqware –password ***** –dbhost swmaster.bioinf.unc.edu –db seqware_meta_db –parent-accession 80934 –file /datastore/nextgenout/seqware-analysis/illumina/101103_UNC10_SN254_0172_B209C5ABXX/seqware-0.7.0_RNASeqAlignmentBWA-0.7.4/seqware-0.7.0_BWA-0.7.0/101103_UNC10_SN254_0172_B209C5ABXX.1.vendorQCsummary.txt=text/key_value –algorithm IllQualStat –version 0.7.0
c. Detail command was saved as command_11302010

2. To delete an erroneous run with either “failed” or “running” (for prolonged period of time), do the following:
a. Have to be very careful!!!! As the delete will recursively delete all events after the parent accession ID
b. It would be good to do pgdum just in case
c. Prior to delete, check the meta-db for a little details:
select * from processing where sw_accession = 81726;
select * from processing where stdout like ‘%101103_UNC10_SN254_0172_B209C5ABXX.3.trimmed.annotated.bam%’;
It turned out that two attempts were made with one successful.
d. So, it is safe to use the following command to delete the “pending” record:
perl sw_util_delete_processing_events.pl –username seqware –password seqware –dbhost swmaster.bioinf.unc.edu –db seqware_meta_db –accession 81726
3. When check the flag html report, found out some adapter contamination files were missing. It turned out that we are missing files on /datastore/nextgenout/seqware-analysis/illumina/101103_UNC10_SN254_0172_B209C5ABXX/seqware-0.7.0_RNASeqAlignmentBWA-0.7.4/seqware-0.7.0_BWA-0.7.0/. But they are still in scratch space under “seqware” user.
Talked to John but did not find a concrete answer.