Protected: SeqWare process report

Posted on September 17, 2010 by Jin Tong

SeqWare LIMS query

Posted on September 15, 2010 by Jin Tong

There is the help page for accessing SeqWare LIMS system.

For postgresql, the query language is slightly different from other sql language.

To clean up SeqWareLIMS event, follow the following steps:

1. To to SeqWareLIMS and find run with either running or errors. Currently, Brian will need to fix the bug in the deleting scripts, also he will take care errors associated with Illumina2srf workflow.

2. Leave Illumina2srf untouched (Brian will take care of it)
3. Doing this from the most recent finished run
4. Drill down to deeper hiearchy and file the access number
5. Log into swmaster and go into ~/svnroot/*/seqware-pipeline/
6. Issue the following command:

perl perl/bin/sw_util_delete_processing_events.pl –username seqware
–password seqware –dbhost swmaster.bioinf.unc.edu –db
seqware_meta_db –accession

7. This will delete the LIMS event as a result, leave output files on the file systems as orphan
8. Further actions are needed and maybe new run will be executed, which will overwrite those output files.
9. In the mean time, if the researchers want the data, we can still get them through the file system.

Query for samples done with pipeline process (little report)

A script (sw_reports_find_files.pl) is written by Brian located at /home/jyli/svnroot/seqware-complete/trunk/seqware-pipeline/perl/bin/site-specific/unc/sw_reports_find_files.pl

1. log into swmaster
2. run the script as ” perl /home/jyli/svnroot/seqware-complete/trunk/seqware-pipeline/perl/bin/site-specific/unc/sw_reports_find_files.pl –username seqware –password seqware –dbhost swmaster.bioinf.unc.edu –db seqware_meta_db –workflow-accession 23324 > temp.txt ”
3. Keep in mind that we need “amdin” log in “seqware seqware”
4. To get –workflow-accession number:
a. Log into seqware_meta_db as “seqware seqware”
b. Issue query “select * from workflow where name = ‘RNASeqAlignmentBWA’;”
seqware_meta_db=> select sw_accession, version, name, update_tstmp from workflow where name = ‘RNASeqAlignmentBWA’;
-[ RECORD 1 ]+—————————
sw_accession | 7535
version | 0.7.2
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-13 16:23:21.069316
-[ RECORD 2 ]+—————————
sw_accession | 7603
version | 0.7.0
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-14 16:42:02.141734
-[ RECORD 3 ]+—————————
sw_accession | 7604
version | 0.7.1
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-14 16:43:29.010826
-[ RECORD 4 ]+—————————
sw_accession | 21544
version | 0.7.3
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-28 06:28:02.482627
-[ RECORD 5 ]+—————————
sw_accession | 23324
version | 0.7.4
name | RNASeqAlignmentBWA
update_tstmp | 2010-08-30 00:45:13.497882

c. Look for the latest processing date.

To associate the processing event with the files stored on the system:

Email from Brian:
*******************************************************************************************
You’re querying the file table using sw_accession from the processing table. This is wrong. Use “d tablename” to see table keys and how they relate via foreign keys. Also look at the sample queries in the wiki. They will show you that you need to do a join on processing processing_files files to get the file path you need.
*******************************************************************************************
First: Understand the DB schema/layout, where? how?
Second: joined the tables processing processing_files files
Third: get the file path

Linux note

Posted on June 9, 2010 by Jin Tong

Basic command

To remove write permission for the group: chmod -R g-w .

To compare two files: diff file1 file2

To count how many files in a directory: ls | wc –l

To extract a few lines from a file: sed –n 1, 10p filename > newfile

To run a shell script: chmod u+rwx yourfile.sh; ./yourfile.sh

To download file from http portal without password: wget –r –nd –np

To download file from http portal with password: wget –http-user = –http-password= –r –nd -np

To get unique entry: uniq

To create a symbolic link: ln –s [TARGET DIRECTORY OR FILE] ./[SHORTCUT]

To sort file: sort

To get to a person’s home directory on topsail, which is inaccessible, we can try through: /ifs1/scr/someone/

To modify environmental variable: go to home directory , open .bash_profile, then edit this file i.e. “PATH=$PATH:/ifs1/home/ferizs/lbgapps/bin

To copy the whole directory: cp –r /dir/ .

To kill a process: ps -ef | grep “jyli”, then kill -9 “process id”

To check an installed package: rpm –q “package name”

To put process into background:
1. ctrl-z
2. bg put it to the background
3. fg brings it up to the foreground

A user case to exclude redundancy in a file

To get SINGLE PLACEMENT reads
1. Work off .mapView files with unix command “uniq”
2. Chop the mapview file leave only three columns “cut -f 2,3,4 s_1_mapview.txt > s_1_mapview.txt.cut”
3. Extract unique rows only “uniq -u s_1_mapview.txt.cut s_1_mapview.txt.cut.uniq”
4. Extract just the first single copy of duplicate rows “uniq -d s_1_mapview.txt.cut s_1_mapview.txt.cut.dup”
5. Concatenate “.uniq” and “.dup” file and get a total count of the rows: “cat *.dup *.uniq > s_1_mapview.txt.cut.final” and “wc -l s_1_mapview.txt.cut.final”
6. Should check out “samtools” command

To exclude the first line of a file

tail -n +2 file > newfile

To exclude/remove blank lines from a text file, use the following command:

$ sed ‘/^$/d’ input.txt > output.txt
$ grep -v ‘^$’ input.txt > output.txt
$ strings input.txt > output.txt

The credit goes to I love Linux

To ssh across two linux systems without password:

From system one:

1. Go to home directory, “cd” will do it, and “pwd” confirms
2. cd .ssh
3. ssh-keygen -t dsa
4. Press “enter” 3 times
5. scp id_dsa.pub /dest/authorized_keys

From another system (two), do exactly same process. This way, you have achieved your goal.

Difference between “scp” and “rsync”:

“scp” copies and overwirtes the files on the destination. Also, if the network is interrupted, it will stop the process and loose track.

“rsyn” checks the time print on the destination, if the file at the destination has the same time stamp, it won’t copy. Or it will copy and/or update the destination.

Use “rsyn -av source destination”

Download files in Linux

Altough it seems simple, sometimes a small road block can through people out. Let’s take a look options to download:

“wget” seems to be the first choice

Sometimes, I have to use “–no-check-certificate” to download from an “unauthorized” questionable sites

We do have “rsync”, “aspera”, and who know what else is available out there

“aria” is another good options, in a sense that it can break download into small chunks

About aria:
	"# yum install aria2" gets me the tool

	aria2c -x 4 "url"

	check out aria wiki  for details

Weird situation, since we don’t have the “sge” environment, we have to rely on linux detach way to run long job.

Scenario 1, using “nohup”

Example: Printing lines to both standard output & standard error

while(true)
do
echo “standard output”
echo “standard error” 1>&2
sleep 1;
done

Execute the script without redirection

$ nohup sh custom-script.sh &
[1] 12034
$ nohup: ignoring input and appending output to `nohup.out’

$ tail -f nohup.out
standard output
standard error
standard output
standard error

Execute the script with redirection

$ nohup sh custom-script.sh > custom-out.log &
[1] 11069
$ nohup: ignoring input and redirecting stderr to stdout

$ tail -f custom-out.log
standard output
standard error
standard output
standard error
..

check processing status

$ ps aux | grep li11

Send an email in linux

Send a via "Email_content" as the email content with Title


mail -s "Email title" li11@niehs.nih.gov < Email_content


To email a file as  attachment


echo "Email body" | mutt -a command_02132012.txt li11@niehs.nih.gov

Echo “exclamation mark

echo “something!” will generation the following error:
It was because the ONLY single quote protect some special character. So,
echo ‘something!’ will do the trick. Thanks to the post here

It turns out that I have resize the /tmp/ on seqbig, which was set as default. I did plenty of research and found link. Here is what I did

You can raise the size limit in /etc/fstab:
tmpfs                  /dev/shm      tmpfs     size=20G,nr_inodes=10k  0      0
Then remount it:
# mount -o remount /dev/shm
Be careful with the size, though. Since it exists in RAM, you don’t want a tmpfs partition to be bigger than your RAM, otherwise the big bad OOM killer will come along and start assassinating your processes.

Download website recursively

I found out a very good web page for teaching R, from Syracuse University bio793. I really like it and wanted to download all the pages.
I found this video by an Indian guy. It is very helpful.
The command to use is: “wget –random-wait -r -p -e robots=off -U Mozilla www.mozilla.com”

In fact, there is a more straightforward note from pure linux documentation on download web with wget.

As a test case, I followed this struction and downloaded Dr. Dicky’s ST512 note. It was very successful.

wget 
--recursive 
--no-clobber 
--page-requisites 
--html-extension 
--convert-links 
--restrict-file-names=windows 
--domains ncsu.edu 
--no-parent http://www.stat.ncsu.edu/people/dickey/courses/st512/index.html

Modify file header

I encountered a very annoy situation. I have 277 files, some files have “DI” as header but some have “DNA_Index” as header. Now, how can I modify them under linux shell??

If it is adding line to the last, it can be easily appended with “echo” command.

But, I wanted to add to the begining!!

This is much better

So, the solution is two step

	Step 1, remove the header: tail -n +2 intial_file > temp
	Add a new header: sed -i  '1i\'"DNA_Index" temp 
	Copy it back: mv temp initial_file
	 for k in `ls *.csv`; do tail -n +2 $k > temp; sed -i  '1i\'"DNA_Index" temp; mv temp $k; done;

I wanted to kill jobs running

	List job id: ps aux | grep li11 | cut -d" " -f6

	for k in `ps aux | grep li11 | cut -d" " -f6`; do kill -9 $k; done;
	R sessions: for k in `ps aux | grep li11 | grep "R" | cut -d" " -f6 `; do echo $k; kill -9 $k; done;