I remember being particularly amazed at Jonathan Harris and Seepandar Kamwars “We feel fine” visualization based on extracted statements from blog posts around the world. Reading about the re-emergence of dengue fever , chicken-gunya , west-nile and other viruses across South Asia, I started wondering if there are ways of keeping track of emerging pathogens using the many social networks that span the Globe.
In many countries there is no paranoia associated with sharing health information like there exists in the developed world. Even if the paranoia exists like it does in the US. We are curiously caught in a world where people reading my blog are more likely to know I have contracted the flu than any two of my healthcare providers who need that information to treat me better.
While the debate on the best way to handle health information online continues, I was wondering how open I would be to sharing information about what afflicts me, if there was a societal benefit to be derived from it. It could something as simple as monitoring allergy symptoms around where I live or something fancy like tracking an emerging pathogen.
Imagine all of us updating a common channel with “de-personalized” information on what afflicts us globally. I can imagine the system to be something like this ..I could “submit” to this service information about what ails me ..and the machine could obfuscate my details , preserving only things like my approximate geographical area and my age and sex and add it to this health information social network.
If implemented well could possibly then have daily visualizations along the lines of “We feel fine” to possibly something like “We feel chicken Gunya”
Just yesterday I was reading tiagos blog where he requested hosting for a computational intensive bioinformatics web-app that he wrote. The application queries and sytematizes mitochondrial genome information from entrez databases, and I assume would be quite useful to animal geneticists and ecologists. Tiago is physically moving institutes and his blog posts talks of his fears of how the app might die if his personal computer goes down.
In one of my personal projects , I have been wrestling with cloning kappa light chains from several monoclonal antibodies that I generated. The cloning required a good knowledge of the anitbody light and heavy chain leader sequences . Several papers I was reading reference the Kabat and Wu database, which catalogs the thousands of sequences of antibodies and other immunological proteins from mouse and humans . Sadly the links to the Kabat and Wu database in some of these papers does not point to any meaningful location. The resulting google and pubmed searches to find this lost data greatly increased the time and effort required to design my cloning experiments.
Which brings me to my question.
In an era when we have free wiki hosting , 4 GB free email access , supercomputers that power maps , gigabyte large free image sharing applications, $6 per month, terabyte bandwidth web hosting. Why are we still so far from an advertisement supported “free” app host for meaningful scientific data ?
Perhaps its because only a few thousand people who are saving a rare turtle species somewhere on this planet will find tiagos web-app useful..Surely thats not yet worth enterprise level attention, or maybe we should all just write our web apps to run off facebook!
My good friend Deepak had a quote in his blog from Lincoln Stein about making bioinformatics as much an everyday tool to the practicing biologist as a pipettor ( a device used to dispense liquids by experimental biologists and chemists)..
I totally agree, but think we are quite far away. For example this morning I had to obtain the sequence of 772 swissprot entries ,which were part of an alignment for some downstream analysis. Of course my first choice was to query the NCBI -Entrez database. I soon realized that NCBI query box did not return any results for the first few queries I tried, all of which were probably new Uniprot/SwissProt IDs ( for eg. .sequence ids Q57T52_SALCH ,Q325Y4_SHIBS )
Disappointed , I turned to the EBI search engine. Within seconds I realized that the EBI indeed does indeed serve up all of entries. SO there are a subset of uniprot entries that the NCBI does not have in its database.
Out of sheer curiosity I entered the queries that drew a blank at the NCBI into Google.
Wonder of Wonders google pulled up all of the hard to find UniProt entries as the very first Match.
Thanks to the increasing use of publicly accessible web service APIs , Google is becoming more and more aware of a lot of very specific sequence data.
I will be very happy when I can type Q57T52_SALCH calc=MW and get an answer back from right inside google. Maybe that day bioinformatics will move one step closer to becoming just another tool.
Till then I am stuck with learning about Equery and WSDL and SOAP and so on..
Powered by ScribeFire.
The Blink database at the NCBI
The NCBI which I have blogged about before has a number of outgoing links embedded into its search results. One of these links which I use extensively is the Blink-link , which is basically a precompiled BLAST run . The BLAST link – blink exists for every annotated sequence in the database and is a great way to look at homologs to any given sequence without messing around with cutting and pasting sequences into web forms or BLAST input parameters. As easy as clicking on a link- the results come nicely laid out with all the homologs color coded by taxonomy , i.e the archaea , bacteria , fungi , plants , etc. Clicking on any graphic takes you to the pairwise alignment. Clicking on a GI ( a unique numeric ID for every sequence in the database) takes you to the BLAST results (blink) page for that GI.
This interface is very powerful and a great way to explore the sequence space for any protein. I have put together a screencast documenting one of my recent explorations you can see it on the youtube link above.
Documentation: The Blink documentation
A while back I had written about how the NCBI had probably mis-indexed a particular protein domain annotation , because their search algorithm yielded irreproducible and probably erroneous search results for a particular search pattern (see Whassup NCBI) .
Back then I had contacted the NCBI support staff who were extremely helpful and helped me initially troubleshoot the problem to some extent but told me I had to wait for the possible mis-indexing error to get corrected. A few after my initial post I got an email from NCBI support asking me if the problem persisted and whether I had tried repeating the search. I tried repeating the search and the problem had indeed been rectified.
Throughout the whole process I had the good fortune of speaking one on one , with several of the support staff at NCBI who attempted to help me troubleshoot the search pattern. Often these conversations lasted almost 30-40 minutes and contributed immensely to my knowledge of how the NCBI was laid out , the numerous interconnected databases , the precompiled BLAST options , the linkout options and the MyNCBI “personalization” options.
Before this encounter, I did on many an occasion found myself hoping that google would index all the biomedical information out there and make a lot of it available in a “google-ized” format. I am beginning to change my mind on this. Having realized the powerful ways of querying the data at NCBI . I can certainly say that I turn to the NCBI more and more to construct and optimize several queries in the genomic sequence space. The dedication with which the NCBI support staff pursued my problem is something I have only come across on many open-souce forums. In direct contrast a few of my posts to google “support” to get some basic questions on their Google calendar bugs sorted out were largely ignored.
Vertical search is becoming a huge buzzword in this web 2.0 aroused world. Lets face it the NCBI had been our vertical seach platform. Its been around long before the term “search” meant what it is today. It is comprehensive, cross-referenced and annotated to the hilt , has personalization , RSS feeds , email alerts , links to full-text and private libraries all built in for FREE.
In my very humble opinion it has evolved into an extremely powerful and dare I say user-friendly front end to query all biomedical information. If only more of us Omics practitioners take the time to learn its very powerful features.
I for one will definitely try to blog more about its many facets ( along these lines see Search in the Omics age, Compiling search strategies in the biosciences, Screencast 101 ). And I will never wish google waste its time with what the NCBI has already done so well. I only wish the NCBI would popularize its offerings more. I also wish all the bioinformaticians out there would do their part in teaching everyday users the ins and outs of its tools. For now I am glad that we have something a powerful and dedicated as the NCBI.