Monthly Archives: October 2007

Under the hood web 2.0- PUBMED and RSS

Snap poll: Did you know that PUBMED and a lot of the NCBI gives you the ability to create custom RSS feed around just about any query

My Answer : Yes I do ..but I learnt about this feature very recently, and I have been using PUBMED for years. You can watch how this feature works from this screencast .

The reason I bring this up is my observation that the NCBI and the EBI have been putting in a lot of energy at staying current while  preserving their orginal interface. Under the hood at the NCBI you can find features like MY-NCBI which has some killer personalization options ( to borrow a web 2.0 buzzword) . The NCBI forms have started to use more and more javascript and things  have started to look “more ajaxy” . Even the collections option just about ensures that I dont use connotea as often ( other than when I want to share collections on the web).

The point is that the bio-search space has and continues to see variations to the theme of search. A lot of these offerings seem to be mere duplications of functionality already present at the EBI and NCBI. While this is a good thing and only through such attempts will we arrive at a “Google”( in the verb sense) for biosearch.

I sometimes wonder if it would just be more worthwhile for atleast the career scientists among us, to learn how to use NCBI and EBI first. Then maybe private research dollars could be diverted to tackle the harder problems in bio-search .

References: Helia seraches PUBMED/MEDLINE


I feel Chicken Gunya? Can social networks help track emerging diseases

I remember being particularly amazed at Jonathan Harris and Seepandar Kamwars  “We feel fine”  visualization based on extracted statements from blog posts around the world. Reading about the re-emergence of dengue fever , chicken-gunya , west-nile and other viruses across South Asia, I started wondering if there are ways of keeping track of emerging pathogens using the many social networks that span the Globe.

In many countries there is no paranoia associated with sharing health information like there exists in the developed world. Even if the paranoia exists like it does in the US. We are curiously caught in a world where people reading my blog are more likely to know I have contracted the flu than any two of my healthcare providers who need that information to treat me better.

While the debate on the best way to handle health information online continues, I was wondering how open I would be to sharing information about what afflicts me, if there was a societal benefit to be derived from it. It could something as simple as monitoring allergy symptoms around where I live or something fancy like tracking an emerging pathogen.

Imagine all of us updating a common channel with “de-personalized” information on what afflicts us globally.   I can imagine the system to be something like this ..I  could “submit” to this service information about what ails me ..and the machine could obfuscate my details , preserving only things like my approximate geographical area and my age and sex and add it to this health information social network.

If implemented well could  possibly then have daily visualizations along the lines of  “We feel fine” to possibly something like  “We feel chicken Gunya”

Of $6 web hosts and dying web apps

Just yesterday I was reading tiagos blog where he requested hosting for a computational intensive bioinformatics web-app that he wrote. The application queries and sytematizes mitochondrial genome information from entrez databases, and I assume would be quite useful to animal geneticists and ecologists. Tiago is physically moving institutes and his blog posts talks of his fears of how the app might die if his personal computer goes down.

In one of my personal projects , I have been wrestling with cloning kappa light chains from several monoclonal antibodies that I generated. The cloning required a good knowledge of the anitbody light and heavy chain leader sequences . Several papers I was reading reference the Kabat and Wu database, which catalogs the thousands of sequences of antibodies and other immunological proteins from mouse and humans . Sadly the links to the Kabat and Wu database in some of these papers does not point to any meaningful location. The resulting google and pubmed searches to find this lost data greatly increased the time and effort required to design my cloning experiments.

Which brings me to my question.

In an era when we have free wiki hosting , 4 GB free email access , supercomputers that power maps , gigabyte large free image sharing applications, $6 per month, terabyte bandwidth web hosting. Why are we still so far from an advertisement supported “free” app host for meaningful scientific data ?

Perhaps its because only a few thousand people who are saving a rare turtle species somewhere on this planet will find tiagos web-app useful..Surely thats not yet worth enterprise level attention, or maybe we should all just write our web apps to run off facebook!

Refseq and UniprotKb groups collaborate

A lot of you have heard me complain ( sometimes unfairly) about how hard it is to tie-up sequence data from NCBI with protein data from Swissprot and Uniprot.
I just saw this on the gene announce mailing list

In collaboration with UniProtKB  ( ,  the RefSeq group is now  reporting explicit cross-references to Swiss-Prot and  TrEMBL proteins  that correspond to a RefSeq protein. These correspondences are being calculated by the UniProtKB group, and will be updated every three weeks to correspond to UniProt’s release cycle. The data are being made available  from several sites within NCBI:

This is a very nice development. I have always tended to look at the cross-references from within NCBI records for information on swissprot ids. But now I can easily linkout to the wealth of protein information provided at uniprot from my NCBI search results.

This simple announcment also brings to the fore once again the complex inter-relationships between a lot of life-science data and why I dont think there will ever be a single google styled life-science database.