While working on the miRNA project, I encountered weird error with Picard, sam_bam convert and even remove duplicate. I thought that it would be caused from jave security problem. In the end, it turned out that it was cause by “duplicate miRBase IDs”
grep hsa-let-7c /ddn/gs1/home/li11/refDB/miRBase/hsa_combined.fa >hsa-let-7c MIMAT0000064 Homo sapiens let-7c >hsa-let-7c MI0000064 Homo sapiens let-7c stem-loop
I have to replace all “spaces”.
Now, with WES project, I am using Picard to remove the duplicate. Here are some good postes
1. People are getting high duplication in WES 2. How to determine removed reads for PE sample 3. A thorough Picard documentation 4. Remove or not remove?? 5. Duplication metric in details
Why Picard Markduplicate does not produce histogram
1. First post concludes that it is caused by multiple-groups
, but I do NOT think so.