Tissue specific expression

A request from David Fargo

This is for Jack.  They have population (~1,000 samples) 27K methyl chip data and want to relate changes in those data to changes in expression using public data. They are interested in different tissues as well as developmental programs (e.g. stem cells).  They’ve classified the methyl data as hypo- and hyper-methylation changes as well as CpG island associated and not associated.

This was his note:

Here's the gene list and their IDs that are covered on the 27K array.  I guess the two things I can think of would be tissue-specific expression levels (perhaps tricky because one would want good comparability between the methods used for the different tissues, and also some developmental program expression data (since some these may only come on at specific times during early development and then only briefly).  The later may be the toughest to find.  If you can get us that data, we could then look at the different subsets of genes that we're interested in.

David found out the following:

Tissue-specific Gene Expression and Regulation (TiGER)  at Wilmer Institue 

 a paper 


or can we get these

 a paper 

or

 Human Body map might be good?? 





Perhaps ideally would be the data in  gene cards (biogps) for all genes.


My response was posted within a day from the request:

Hi, David & all,

First of all, I have to admit that I have very limited knowledge about the tissue specific expression in general. I just followed the links David provided and here is what I understood so far:

TiGER database was built off two of their previous papers. But, what they called “tissues specific expression” was really based upon the “published EST sequences” deposited in the public domain (i.e. NCBI). And they applied some statistical assessment to determine whether some “ESTs” were significantly “represented” or skewed based on their empirical observation and/or collection. It can be easily skewed toward some favorite tissues that researchers had published thus far. Use an analogy of P53 pathway as an example. Almost all microarray expression results will hit P53 from pathway analysis, just because this pathway has been studied a lot!! So, I would not suggest this database. One good part of the source is their two earlier paper, when they tried to find CRM that associated with the “tissues specific expression”. The framework is quite portable, at least provides a starting point for researching the association between “methylation status” and “expression”.

TiSGeD was also based on a collection from publicly available microarray experiment, but provided very vague definition of “well curated publicly” good data. Plus (as you indicated) it has a dead link anyway.

Shyamasundar, R. et al paper on Genome Biology (2005) was a good pick. They literally did 115 microarray experiment, and data are available on GEO (GSE2193).

The body map project at Ensembl is the most advanced source, as they extended the common microarray platform to RNAseq. The only caveat is their gene model, which was okay according to their pilot study with Zebra fish. Still, it is cross species, who knows whether their model holds well. There are raw sequences (pair end RNAseq) available and a live database. The best part is that the database is “supposed” searchable (although need some learning curve with their API).
I agree with you that “Weizmann Institute of Science” was a great resource, since they had evidence from three platforms concurrently confirming the “tissue specific” expression pattern. However, it creates its own problem, what if they do NOT agree with each other?  Which one does one believe? I believe that their database is also searchable, but it definitely needs more work and people may need to acquire academic license or something. One company LifeMap Sciences, Inc must have acquired a license from them.
My take on is:

Tier one: Download the microarray GEO (GSE2193), published by Shyamasundar, R. et al and build a Gene x Tissue matrix. Maybe get more data as explained by TiSGeD.
Possibly test out the methodologies explained in Yu, X et al BMC Bioinformatics (2006) and Yu, X et al NAR (2006)
Try to append the quantifiable “methylation status”, “CpG status”, or “other relevant information” to the subset of the above mentioned matrix (by tissue), try to run “association like” analysis to get the “significant information”.

Tier two, try the same route with Human Body Map RNAseq “expression” data.

Lastly, try to contact Weizmann Institute of Sciences (with a bag of $ ) to see whether we can get a license of their database and some advice how to search the database. This is really a god resource.

Regarding Jack’s comment on the tissue for experiment acquired at different development stage, I guess that we can only do whatever we have and we don’t have any control over this. Make sense?

This is just my 2 cents, and I really don’t know how complex it can get.

Jianying Li

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.