Programming involving database

This is the first “official” trial and my early learning experiences doing program involving interaction with a database. Here I am using postgreSQL database as an example, which is our SeqWare MetaDB as well our SeqWareLIMS are relying on.

Some background knowledge about postgreSQL goes here…

Some SeqWare know-how goes here….

First of all, test some database query

To log in database, issue the following command (for the best results from swmaster.med.unc.edu)
psql -h swmaster.bioinf.unc.edu -U seqware -W seqware_meta_db
u: seqware
p: seqware

Here comes queries:

The first script involves DB query is /home/jyli/svnroot/TCGA-dev/scripts/user/jyli/QC_modules/retro_IllQualStat.pl and it performs the following two tasks:

1. Query the database and get all available runfolder
2. Find the most recently RNAseqAlignmentBWA output folder
3. Run sw_module_IllQualStat.pl (/home/jyli/svnroot/seqware-complete/trunk/seqware-pipeline/perl/bin/sw_module_IllQualStat.pl) for per lane based vendorQC

For DB study

Question: to find out how many flowcells do we have in DB and out of which how many have RNAseqAlignmentBWA event?
Answer: sequence_run table, lane table, sequencer_run linked to lane, lane is linked to processing_lanes, and processing_lanes is linked to processing

SeqWare production pipeline tracking

As a transition, Brian will hand off such duty onto me and Matt, in an attempt that Matt and I can cover each other in case one is on vacation.

1. Background:

Currently, two pars of run composes the seqware automation pipeline

a. a cron job is run to convert runfolder to srf format (module Illumina2SRF)
b. a manually submitted run, afterward basically, from srf to the end of RNAseqAlignment. this job is run by firing up a script “decider.pl” where? In the long run, this needs to change:
i) Does not need to run off any single user’s home directory, but rather a seqware user
ii) It needs a cron job too to periodically initiate the pipeline run.

2. Error checking

Need to learn “how to detect errors”

a. Go to LIMS at and look for processes with “error”s

b. All run (log) are stored at “/home/brianoc/scratch/pegasus/RNASeqAlignmentBWA/run####”

c. Use the MetaDB to query for the cause of errors
Log in to an instance of swmaster, and log in seqware metadb as following. Detail see

psql -h swmaster.bioinf.unc.edu -U seqware -W seqware_meta_db
u: seqware
p: seqware

SeqWare LIMS query

There is the help page for accessing SeqWare LIMS system.

For postgresql, the query language is slightly different from other sql language.

To clean up SeqWareLIMS event, follow the following steps:

1. To to SeqWareLIMS and find run with either running or errors. Currently, Brian will need to fix the bug in the deleting scripts, also he will take care errors associated with Illumina2srf workflow.

2. Leave Illumina2srf untouched (Brian will take care of it)
3. Doing this from the most recent finished run
4. Drill down to deeper hiearchy and file the access number
5. Log into swmaster and go into ~/svnroot/*/seqware-pipeline/
6. Issue the following command:

perl perl/bin/sw_util_delete_processing_events.pl –username seqware
–password seqware –dbhost swmaster.bioinf.unc.edu –db
seqware_meta_db –accession

7. This will delete the LIMS event as a result, leave output files on the file systems as orphan
8. Further actions are needed and maybe new run will be executed, which will overwrite those output files.
9. In the mean time, if the researchers want the data, we can still get them through the file system.

Query for samples done with pipeline process (little report)

A script (sw_reports_find_files.pl) is written by Brian located at /home/jyli/svnroot/seqware-complete/trunk/seqware-pipeline/perl/bin/site-specific/unc/sw_reports_find_files.pl

1. log into swmaster
2. run the script as ” perl /home/jyli/svnroot/seqware-complete/trunk/seqware-pipeline/perl/bin/site-specific/unc/sw_reports_find_files.pl –username seqware –password seqware –dbhost swmaster.bioinf.unc.edu –db seqware_meta_db –workflow-accession 23324 > temp.txt ”
3. Keep in mind that we need “amdin” log in “seqware seqware”
4. To get –workflow-accession number:
a. Log into seqware_meta_db as “seqware seqware”
b. Issue query “select * from workflow where name = ‘RNASeqAlignmentBWA’;”
seqware_meta_db=> select sw_accession, version, name, update_tstmp from workflow where name = ‘RNASeqAlignmentBWA’;
-[ RECORD 1 ]+—————————
sw_accession | 7535
version | 0.7.2
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-13 16:23:21.069316
-[ RECORD 2 ]+—————————
sw_accession | 7603
version | 0.7.0
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-14 16:42:02.141734
-[ RECORD 3 ]+—————————
sw_accession | 7604
version | 0.7.1
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-14 16:43:29.010826
-[ RECORD 4 ]+—————————
sw_accession | 21544
version | 0.7.3
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-28 06:28:02.482627
-[ RECORD 5 ]+—————————
sw_accession | 23324
version | 0.7.4
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-30 00:45:13.497882

c. Look for the latest processing date.

To associate the processing event with the files stored on the system:

Email from Brian:
*******************************************************************************************
You’re querying the file table using sw_accession from the processing table. This is wrong. Use “d tablename” to see table keys and how they relate via foreign keys. Also look at the sample queries in the wiki. They will show you that you need to do a join on processing processing_files files to get the file path you need.
*******************************************************************************************
First: Understand the DB schema/layout, where? how?
Second: joined the tables processing processing_files files
Third: get the file path