SeqWare LIMS query
There is the help page for accessing SeqWare LIMS system.
For postgresql, the query language is slightly different from other sql language.
To clean up SeqWareLIMS event, follow the following steps:
1. To to SeqWareLIMS and find run with either running or errors. Currently, Brian will need to fix the bug in the deleting scripts, also he will take care errors associated with Illumina2srf workflow.
2. Leave Illumina2srf untouched (Brian will take care of it)
3. Doing this from the most recent finished run
4. Drill down to deeper hiearchy and file the access number
5. Log into swmaster and go into ~/svnroot/*/seqware-pipeline/
6. Issue the following command:
perl perl/bin/sw_util_delete_processing_events.pl –username seqware
–password seqware –dbhost swmaster.bioinf.unc.edu –db
seqware_meta_db –accession
7. This will delete the LIMS event as a result, leave output files on the file systems as orphan
8. Further actions are needed and maybe new run will be executed, which will overwrite those output files.
9. In the mean time, if the researchers want the data, we can still get them through the file system.
Query for samples done with pipeline process (little report)
A script (sw_reports_find_files.pl) is written by Brian located at /home/jyli/svnroot/seqware-complete/trunk/seqware-pipeline/perl/bin/site-specific/unc/sw_reports_find_files.pl
1. log into swmaster
2. run the script as ” perl /home/jyli/svnroot/seqware-complete/trunk/seqware-pipeline/perl/bin/site-specific/unc/sw_reports_find_files.pl –username seqware –password seqware –dbhost swmaster.bioinf.unc.edu –db seqware_meta_db –workflow-accession 23324 > temp.txt ”
3. Keep in mind that we need “amdin” log in “seqware seqware”
4. To get –workflow-accession number:
a. Log into seqware_meta_db as “seqware seqware”
b. Issue query “select * from workflow where name = ‘RNASeqAlignmentBWA’;”
seqware_meta_db=> select sw_accession, version, name, update_tstmp from workflow where name = ‘RNASeqAlignmentBWA’;
-[ RECORD 1 ]+—————————
sw_accession | 7535
version | 0.7.2
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-13 16:23:21.069316
-[ RECORD 2 ]+—————————
sw_accession | 7603
version | 0.7.0
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-14 16:42:02.141734
-[ RECORD 3 ]+—————————
sw_accession | 7604
version | 0.7.1
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-14 16:43:29.010826
-[ RECORD 4 ]+—————————
sw_accession | 21544
version | 0.7.3
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-28 06:28:02.482627
-[ RECORD 5 ]+—————————
sw_accession | 23324
version | 0.7.4
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-30 00:45:13.497882
c. Look for the latest processing date.
To associate the processing event with the files stored on the system:
Email from Brian:
*******************************************************************************************
You’re querying the file table using sw_accession from the processing table. This is wrong. Use “d tablename” to see table keys and how they relate via foreign keys. Also look at the sample queries in the wiki. They will show you that you need to do a join on processing processing_files files to get the file path you need.
*******************************************************************************************
First: Understand the DB schema/layout, where? how?
Second: joined the tables processing processing_files files
Third: get the file path
Linux note
Basic command
To remove write permission for the group: chmod -R g-w .
To compare two files: diff file1 file2
To count how many files in a directory: ls | wc –l
To extract a few lines from a file: sed –n 1, 10p filename > newfile
To run a shell script: chmod u+rwx yourfile.sh; ./yourfile.sh
To download file from http portal without password: wget –r –nd –np
To download file from http portal with password: wget –http-user = –http-password= –r –nd -np
To get unique entry: uniq
To create a symbolic link: ln –s [TARGET DIRECTORY OR FILE] ./[SHORTCUT]
To sort file: sort
To get to a person’s home directory on topsail, which is inaccessible, we can try through: /ifs1/scr/someone/
To modify environmental variable: go to home directory , open .bash_profile, then edit this file i.e. “PATH=$PATH:/ifs1/home/ferizs/lbgapps/bin
To copy the whole directory: cp –r /dir/ .
To kill a process: ps -ef | grep “jyli”, then kill -9 “process id”
To check an installed package: rpm –q “package name”
To put process into background:
1. ctrl-z
2. bg put it to the background
3. fg brings it up to the foreground
A user case to exclude redundancy in a file
To get SINGLE PLACEMENT reads
1. Work off .mapView files with unix command “uniq”
2. Chop the mapview file leave only three columns “cut -f 2,3,4 s_1_mapview.txt > s_1_mapview.txt.cut”
3. Extract unique rows only “uniq -u s_1_mapview.txt.cut s_1_mapview.txt.cut.uniq”
4. Extract just the first single copy of duplicate rows “uniq -d s_1_mapview.txt.cut s_1_mapview.txt.cut.dup”
5. Concatenate “.uniq” and “.dup” file and get a total count of the rows: “cat *.dup *.uniq > s_1_mapview.txt.cut.final” and “wc -l s_1_mapview.txt.cut.final”
6. Should check out “samtools” command
To exclude the first line of a file
tail -n +2 file > newfile
To exclude/remove blank lines from a text file, use the following command:
$ sed ‘/^$/d’ input.txt > output.txt
$ grep -v ‘^$’ input.txt > output.txt
$ strings input.txt > output.txt
The credit goes to I love Linux
To ssh across two linux systems without password:
From system one:1. Go to home directory, “cd” will do it, and “pwd” confirms
2. cd .ssh
3. ssh-keygen -t dsa
4. Press “enter” 3 times
5. scp id_dsa.pub /dest/authorized_keysFrom another system (two), do exactly same process. This way, you have achieved your goal.
Difference between “scp” and “rsync”:
“scp” copies and overwirtes the files on the destination. Also, if the network is interrupted, it will stop the process and loose track.
“rsyn” checks the time print on the destination, if the file at the destination has the same time stamp, it won’t copy. Or it will copy and/or update the destination.
Use “rsyn -av source destination”
Download files in Linux
Altough it seems simple, sometimes a small road block can through people out. Let’s take a look options to download:
“wget” seems to be the first choice Sometimes, I have to use “–no-check-certificate” to download from an “unauthorized” questionable sites We do have “rsync”, “aspera”, and who know what else is available out there “aria” is another good options, in a sense that it can break download into small chunks About aria:
"# yum install aria2" gets me the tool aria2c -x 4 "url" check out aria wiki for details
Weird situation, since we don’t have the “sge” environment, we have to rely on linux detach way to run long job.
Scenario 1, using “nohup”
Example: Printing lines to both standard output & standard error
while(true)
do
echo “standard output”
echo “standard error” 1>&2
sleep 1;
done
Execute the script without redirection
$ nohup sh custom-script.sh &
[1] 12034
$ nohup: ignoring input and appending output to `nohup.out’$ tail -f nohup.out
standard output
standard error
standard output
standard error
Execute the script with redirection
$ nohup sh custom-script.sh > custom-out.log &
[1] 11069
$ nohup: ignoring input and redirecting stderr to stdout$ tail -f custom-out.log
standard output
standard error
standard output
standard error
..
check processing status
$ ps aux | grep li11
Send an email in linux
Send a via "Email_content" as the email content with Titlemail -s "Email title" li11@niehs.nih.gov < Email_contentTo email a file as attachmentecho "Email body" | mutt -a command_02132012.txt li11@niehs.nih.gov
Echo “exclamation mark
echo “something!” will generation the following error:
It was because the ONLY single quote protect some special character. So,
echo ‘something!’ will do the trick. Thanks to the post here
It turns out that I have resize the /tmp/ on seqbig, which was set as default. I did plenty of research and found link. Here is what I did
You can raise the size limit in /etc/fstab:
tmpfs /dev/shm tmpfs size=20G,nr_inodes=10k 0 0Then remount it:
# mount -o remount /dev/shmBe careful with the size, though. Since it exists in RAM, you don’t want a tmpfs partition to be bigger than your RAM, otherwise the big bad OOM killer will come along and start assassinating your processes.
Download website recursively
I found out a very good web page for teaching R, from Syracuse University bio793. I really like it and wanted to download all the pages.
I found this video by an Indian guy. It is very helpful.
The command to use is: “wget –random-wait -r -p -e robots=off -U Mozilla www.mozilla.com”
In fact, there is a more straightforward note from pure linux documentation on download web with wget.
As a test case, I followed this struction and downloaded Dr. Dicky’s ST512 note. It was very successful.
wget --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains ncsu.edu --no-parent http://www.stat.ncsu.edu/people/dickey/courses/st512/index.html
Modify file header
I encountered a very annoy situation. I have 277 files, some files have “DI” as header but some have “DNA_Index” as header. Now, how can I modify them under linux shell??
If it is adding line to the last, it can be easily appended with “echo” command. But, I wanted to add to the begining!! This is much better
So, the solution is two step
I wanted to kill jobs running