Consideration for updating pipleline

Here are some suggestions for updating our pipeline:

1. Shall pass in the parameter for major programs as well as the parameter setting, i.e. BWA, Samtools, etc. Should return “null” for those can be null

2. For BWA samse/pe procedure, need to add “-r” option for adding the read group information.

3. Should consolidate codes (bash and tcsh should be merged) and use subversion control on version

4. Validate all input, i.e. run_summary information from core facility. There was a version issue with our exiting pipeline . Mixture of pipeline version and sample_info file.

5. We shall document our “pipeline” with version

6. Mixture of different codes: perl, bash, tcsh, etc. These need to be documented.

7. To many sub-directories, hard to manage and keep track.

8. Too many copy and paste, which can potentially introduce errors. Need to eliminate these steps and create log for each step.

9. A database/lims will be helpful for storing all these information.

How we may do better

Part I, Version control on all third party software, BWA, samtools, Picard, GATK, etc. Version control on “pipeline” itself. Goal: easy to manage, migrate and update. Modularize and easy for plugging in.

Part II, Subversion control on all codes, cleaning up code, and consolidate of mixture codes

Part III, Implement (if not) ASPERA for file transferring for anything for data management

Part IV, Error checking/QC system

Part V, In-house database and LIMS to replace Excel spread sheet, for sample tracking. In-house database for keep log for any analysis. Web programming helper needed.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.