SeqWare LIMS query

There is the help page for accessing SeqWare LIMS system.

For postgresql, the query language is slightly different from other sql language.

To clean up SeqWareLIMS event, follow the following steps:

1. To to SeqWareLIMS and find run with either running or errors. Currently, Brian will need to fix the bug in the deleting scripts, also he will take care errors associated with Illumina2srf workflow.

2. Leave Illumina2srf untouched (Brian will take care of it)
3. Doing this from the most recent finished run
4. Drill down to deeper hiearchy and file the access number
5. Log into swmaster and go into ~/svnroot/*/seqware-pipeline/
6. Issue the following command:

perl perl/bin/sw_util_delete_processing_events.pl –username seqware
–password seqware –dbhost swmaster.bioinf.unc.edu –db
seqware_meta_db –accession

7. This will delete the LIMS event as a result, leave output files on the file systems as orphan
8. Further actions are needed and maybe new run will be executed, which will overwrite those output files.
9. In the mean time, if the researchers want the data, we can still get them through the file system.

Query for samples done with pipeline process (little report)

A script (sw_reports_find_files.pl) is written by Brian located at /home/jyli/svnroot/seqware-complete/trunk/seqware-pipeline/perl/bin/site-specific/unc/sw_reports_find_files.pl

1. log into swmaster
2. run the script as ” perl /home/jyli/svnroot/seqware-complete/trunk/seqware-pipeline/perl/bin/site-specific/unc/sw_reports_find_files.pl –username seqware –password seqware –dbhost swmaster.bioinf.unc.edu –db seqware_meta_db –workflow-accession 23324 > temp.txt ”
3. Keep in mind that we need “amdin” log in “seqware seqware”
4. To get –workflow-accession number:
a. Log into seqware_meta_db as “seqware seqware”
b. Issue query “select * from workflow where name = ‘RNASeqAlignmentBWA’;”
seqware_meta_db=> select sw_accession, version, name, update_tstmp from workflow where name = ‘RNASeqAlignmentBWA’;
-[ RECORD 1 ]+—————————
sw_accession | 7535
version | 0.7.2
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-13 16:23:21.069316
-[ RECORD 2 ]+—————————
sw_accession | 7603
version | 0.7.0
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-14 16:42:02.141734
-[ RECORD 3 ]+—————————
sw_accession | 7604
version | 0.7.1
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-14 16:43:29.010826
-[ RECORD 4 ]+—————————
sw_accession | 21544
version | 0.7.3
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-28 06:28:02.482627
-[ RECORD 5 ]+—————————
sw_accession | 23324
version | 0.7.4
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-30 00:45:13.497882

c. Look for the latest processing date.

To associate the processing event with the files stored on the system:

Email from Brian:
*******************************************************************************************
You’re querying the file table using sw_accession from the processing table. This is wrong. Use “d tablename” to see table keys and how they relate via foreign keys. Also look at the sample queries in the wiki. They will show you that you need to do a join on processing processing_files files to get the file path you need.
*******************************************************************************************
First: Understand the DB schema/layout, where? how?
Second: joined the tables processing processing_files files
Third: get the file path

Linux note

Basic command

To remove write permission for the group: chmod -R g-w .

To compare two files: diff file1 file2

To count how many files in a directory: ls | wc –l

To extract a few lines from a file: sed –n 1, 10p filename > newfile

To run a shell script: chmod u+rwx yourfile.sh; ./yourfile.sh

To download file from http portal without password: wget –r –nd –np

To download file from http portal with password: wget –http-user = –http-password= –r –nd -np

To get unique entry: uniq

To create a symbolic link: ln –s [TARGET DIRECTORY OR FILE] ./[SHORTCUT]

To sort file: sort

To get to a person’s home directory on topsail, which is inaccessible, we can try through: /ifs1/scr/someone/

To modify environmental variable: go to home directory , open .bash_profile, then edit this file i.e. “PATH=$PATH:/ifs1/home/ferizs/lbgapps/bin

To copy the whole directory: cp –r /dir/ .

To kill a process: ps -ef | grep “jyli”, then kill -9 “process id”

To check an installed package: rpm –q “package name”

To put process into background:
1. ctrl-z
2. bg put it to the background
3. fg brings it up to the foreground

A user case to exclude redundancy in a file

To get SINGLE PLACEMENT reads
1. Work off .mapView files with unix command “uniq”
2. Chop the mapview file leave only three columns “cut -f 2,3,4 s_1_mapview.txt > s_1_mapview.txt.cut”
3. Extract unique rows only “uniq -u s_1_mapview.txt.cut s_1_mapview.txt.cut.uniq”
4. Extract just the first single copy of duplicate rows “uniq -d s_1_mapview.txt.cut s_1_mapview.txt.cut.dup”
5. Concatenate “.uniq” and “.dup” file and get a total count of the rows: “cat *.dup *.uniq > s_1_mapview.txt.cut.final” and “wc -l s_1_mapview.txt.cut.final”
6. Should check out “samtools” command

To exclude the first line of a file

tail -n +2 file > newfile

To exclude/remove blank lines from a text file, use the following command:

$ sed ‘/^$/d’ input.txt > output.txt
$ grep -v ‘^$’ input.txt > output.txt
$ strings input.txt > output.txt

The credit goes to I love Linux

To ssh across two linux systems without password:


From system one:

1. Go to home directory, “cd” will do it, and “pwd” confirms
2. cd .ssh
3. ssh-keygen -t dsa
4. Press “enter” 3 times
5. scp id_dsa.pub /dest/authorized_keys

From another system (two), do exactly same process. This way, you have achieved your goal.

Difference between “scp” and “rsync”:

“scp” copies and overwirtes the files on the destination. Also, if the network is interrupted, it will stop the process and loose track.

“rsyn” checks the time print on the destination, if the file at the destination has the same time stamp, it won’t copy. Or it will copy and/or update the destination.

Use “rsyn -av source destination”

Download files in Linux

Altough it seems simple, sometimes a small road block can through people out. Let’s take a look options to download:

  • “wget” seems to be the first choice
  • Sometimes, I have to use “–no-check-certificate” to download from an “unauthorized” questionable sites
  • We do have “rsync”, “aspera”, and who know what else is available out there
  • “aria” is another good options, in a sense that it can break download into small chunks
  • About aria:

    	
  • "# yum install aria2" gets me the tool
  • aria2c -x 4 "url"
  • check out aria wiki for details
  • Weird situation, since we don’t have the “sge” environment, we have to rely on linux detach way to run long job.

    Scenario 1, using “nohup”

    Example: Printing lines to both standard output & standard error

    while(true)
    do
    echo “standard output”
    echo “standard error” 1>&2
    sleep 1;
    done

    Execute the script without redirection

    $ nohup sh custom-script.sh &
    [1] 12034
    $ nohup: ignoring input and appending output to `nohup.out’

    $ tail -f nohup.out
    standard output
    standard error
    standard output
    standard error

    Execute the script with redirection

    $ nohup sh custom-script.sh > custom-out.log &
    [1] 11069
    $ nohup: ignoring input and redirecting stderr to stdout

    $ tail -f custom-out.log
    standard output
    standard error
    standard output
    standard error
    ..

    check processing status

    $ ps aux | grep li11

    Send an email in linux

    Send a via "Email_content" as the email content with Title
    
    
    mail -s "Email title" li11@niehs.nih.gov < Email_content
    To email a file as attachment
    echo "Email body" | mutt -a command_02132012.txt li11@niehs.nih.gov

    Echo “exclamation mark

    echo “something!” will generation the following error:
    It was because the ONLY single quote protect some special character. So,
    echo ‘something!’ will do the trick. Thanks to the post here

    It turns out that I have resize the /tmp/ on seqbig, which was set as default. I did plenty of research and found link. Here is what I did

    You can raise the size limit in /etc/fstab:

    tmpfs                  /dev/shm      tmpfs     size=20G,nr_inodes=10k  0      0
    

    Then remount it:

    # mount -o remount /dev/shm
    

    Be careful with the size, though. Since it exists in RAM, you don’t want a tmpfs partition to be bigger than your RAM, otherwise the big bad OOM killer will come along and start assassinating your processes.

    Download website recursively

    I found out a very good web page for teaching R, from Syracuse University bio793. I really like it and wanted to download all the pages.
    I found this video by an Indian guy. It is very helpful.
    The command to use is: “wget –random-wait -r -p -e robots=off -U Mozilla www.mozilla.com”

    In fact, there is a more straightforward note from pure linux documentation on download web with wget.

    As a test case, I followed this struction and downloaded Dr. Dicky’s ST512 note. It was very successful.

    wget 
    --recursive 
    --no-clobber 
    --page-requisites 
    --html-extension 
    --convert-links 
    --restrict-file-names=windows 
    --domains ncsu.edu 
    --no-parent http://www.stat.ncsu.edu/people/dickey/courses/st512/index.html
    

    Modify file header

    I encountered a very annoy situation. I have 277 files, some files have “DI” as header but some have “DNA_Index” as header. Now, how can I modify them under linux shell??

  • If it is adding line to the last, it can be easily appended with “echo” command.
  • But, I wanted to add to the begining!!
  • This is much better
  • So, the solution is two step

    	
  • Step 1, remove the header: tail -n +2 intial_file > temp
  • Add a new header: sed -i '1i\'"DNA_Index" temp
  • Copy it back: mv temp initial_file
  • for k in `ls *.csv`; do tail -n +2 $k > temp; sed -i '1i\'"DNA_Index" temp; mv temp $k; done;
  • I wanted to kill jobs running

    	
  • List job id: ps aux | grep li11 | cut -d" " -f6
  • for k in `ps aux | grep li11 | cut -d" " -f6`; do kill -9 $k; done;
  • R sessions: for k in `ps aux | grep li11 | grep "R" | cut -d" " -f6 `; do echo $k; kill -9 $k; done;