R markdown and MikTex trouble shoot

Posted on March 18, 2025 by Jin Tong

I have encountered difficulty using r kniter to produce a pdf file from R markdown, it has been a while and I really want to fix it. It turns out much complicated than I have expected.

It is related with MikTex installed on a Windows 10 machine

I have issue related to Qt platform plugin, similar to this: “[…] no Qt platform plugin could be initialized”. Not only am I not able to use MikTex properly, it does not even allow me to unstall MikTex!!

I found this link, I followed the instruction and was able to solve this problem.

Following “miktex-qt5-bin-x64” should be repaired made MiKTeX Console to stop #1486

Then, it comes with R markdown related issue

Even Yuhui’s own link does not work

this post offers correct solution by doing

"tinytex:::install_prebuilt()"

then 

"tinytex::tlmgr_update()"

It works!
then I had issues with loading Seurat because it requires SeuratObject with specific version to be loaded also, anyway, it all work out eventually, and I am so happy!!

wanted to give credits to this helping post

Making a website with Dreamhost

Posted on February 20, 2025 by Jin Tong

Of course the hosting company we use is: dreamhost
A doomain name can be purchase from namecheap

I learned quite a lot from the following three youtube videos

David give a step-by-step tutorial on building a website.

Running jupyter notebook via tunneling

Posted on December 11, 2024 by Jin Tong

I am trying to run jupyter notebook from our network server and I am connecting it from my windows machine with PuTTY. It works and I want to document this for future reference.

This link provides a very good help!

Basic set up via PuTTY connection from windows

Tunneling set up

Cn040701 is the proper name for ib01, which is the one I usually log on to.

The to run jupyter:
jupyter notebook –no-browser –port=8889 –ip=0.0.0.0

And finally I went to http://localhost:8889 to open it in a browser.

R apply usage

Posted on August 29, 2022 by Jin Tong

Scenario 1

I have a datamatrix of RNAseq data, the ID column looks something like the following

-----------------
head(geneIDs)
-----------------
"Xkr4|497097|locus1of1|1"  
"Sox17|20671|locus1of1|5"  
"Mrpl15|27395|locus1of1|3"
"Lypla1|18777|locus1of1|1" 
"Tcea1|21399|locus1of1|3"  
"Rgs20|58175|locus1of1|3"

I want to parse them out and retain only the gene symbols, “sapply” here is very handy

as.vector(sapply (geneIDs, FUN = function (x) {strsplit(x, "\\|")[[1]][1]}))

Running the above line get me the following:

"Xkr4"  
"Sox17"  
"Mrpl15"
"Lypla1" 
"Tcea1"  
"Rgs20"

WGS and WES variant calling — alignment score interpretation

Posted on May 19, 2022 by Jin Tong

Our collaborator, Dr. Laura Riva, indicates a criteria to filter out variant in her paper.

“Second, we removed base substitutions with a median alignment score of mutation-reporting reads <140 and complex indels."

This leads to my research of what it is.

Firstly, there is an explanation of alignment score by the Broad Institute

grep -v "@" seq_aln.sam | awk -F"\t" 'BEGIN{print "flag\toccurrences"} {a[$2]++} END{for(i in a)print i"\t"a[i]}'

Single Cell Analysis

Posted on November 17, 2021 by Jin Tong

Main steps:

1. Cellranger — get read
2. Seurat — analysis
3. Deconvolute — analysis

CIBERSORTx:  https://cibersortx.stanford.edu/
 
A benchmark for RNA-seq deconvolution analysis under dynamic testing environments
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02290-6
 
Benchmarking of cell type deconvolution pipelines for transcriptomics data
https://www.nature.com/articles/s41467-020-19015-1

STAR – RNAseq – GATK – Mutect2

Posted on August 18, 2021 by Jin Tong

Scenario:

For RNAseq data, if one uses STAR for the alignment, it could present difficulty for GATK toolkit for the following reasons (https://sequencing.qcfail.com/articles/mapq-values-are-really-useful-but-their-implementation-is-a-mess/)

STAR uses a similar scoring scheme to Tophat except that the value for uniquely mapped reads is 255 instead of 50.
The mapping quality MAPQ (column 5) is 255 for uniquely mapping reads, and int(-10*log10(1-1/[number of loci the read maps to])) for multi-mapping reads. This scheme is same as the one used by Tophat…

In the BQSR steps, we will have this:

64162342 reads (100.00% of total) failing MappingQualityUnavailableFilter

Others (http://seqanswers.com/forums/showthread.php?t=21593) reported GTAK related challenge:

“255 actually in GATK’s CountCovariatesWalker is filtered out by MappingQualityUnavailableFilter in the top of walker:”

It will fail at the Mutect2 variant calling step.

From the SAM specification (http://samtools.github.io/hts-specs/SAMv1.pdf):

MAPQ: MAPping Quality. It equals −10 log10 Pr{mapping position is wrong}, rounded to the nearest integer. A value 255 indicates that the mapping quality is not available.

In the GATK legacy post (https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2013-06-06-2013-02-12/2187-Can-I-bypass-the-MappingQualityUnavailableFilter-in-Unified-Genotyper)

Bismark sets MAPQ to 255 for all alignments.

“ -rf ReassignMappingQuality -DMQ 60 ”

People have posted the same errors here at (https://github.com/bcbio/bcbio-nextgen/issues/2163) and propose the following work around

java -jar GenomeAnalysisTK.jar -T SplitNCigarReads -R ref.fasta -I dedupped.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS

Solution:

Work around with STAR

“So what is needed I think is just adding –outSAMmapqUnique 60 in STAR CLI.”

Work around with GTAK

java -jar GenomeAnalysisTK.jar -T SplitNCigarReads -R ref.fasta -I dedupped.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS

Alternatively, one can add the option “-T PrintReads -U ALLOW_N_CIGAR_READS” in the BQSR step to allow CIGAR string with “N”.

And, can turn off the filter with the option “–disable-read-filter MappingQualityAvailableReadFilter” in the mutect2 calling method

Top 75 bioinformatics blogs

Posted on November 12, 2020 by Jin Tong

Haha, there are ONLY up to 69 of them and it is not hard to make top 75, lol. I would like to see them all.

Over the time, I would like to collect them and rank them (by personal view only now) hopefully according to some “objective metric”.

But, I like to rank Titus Brown’s site above Dave Tang’s. No offense to Dave, it it my personal opinion that I find Titus’ contains higher quality of work. Sorry Dave!. Titus’ home page has an interesting name
Here is one from Dave Tang

Machine learning note

Posted on November 4, 2020 by Jin Tong

Here, an old topic but keeps coming to me — machine learning. Java? Python? or R?

Let’s focus on some readings.

A guy posts decision making algorithm with implementation in python, 
of course!

He does not reply why data link is broken, which can be easily found 
here at kaggle competition site.

Here is a very simple tutorial about decision tree algorithm.

Another example here should be enough

Found another quite basic explanation

More on decision tree — iterative preorder traversal

Here is the first example

Decision Tree implementation

A simple python example.

Another guy provides smile  (Statistical Machine Intelligence and 
Learning Engine) package implemented in Java

A guy posts decision making algorithm with implementation in python, 
of course!

Hierarchical Dirichlet Process in Cancer Study

Posted on October 20, 2020 by Jin Tong

From a recent paper on Nature Genetics by our collaborator Laura, from the supplemental data I learned that she used Hierarchical Dirichlet Process to extract the signature.

In fact, it is her colleague Nicola Roberts, who first posted the HDP R package in 2015. Nicola also has a vignette on his package using the TCGA data to publish his HDP package.

It is quite interesting but the process is a computational intensive process.

onlinenote

Category Archives: Uncategorized