Category Archives: Information technology

I feel Chicken Gunya? Can social networks help track emerging diseases

I remember being particularly amazed at Jonathan Harris and Seepandar Kamwars  “We feel fine”  visualization based on extracted statements from blog posts around the world. Reading about the re-emergence of dengue fever , chicken-gunya , west-nile and other viruses across South Asia, I started wondering if there are ways of keeping track of emerging pathogens using the many social networks that span the Globe.

In many countries there is no paranoia associated with sharing health information like there exists in the developed world. Even if the paranoia exists like it does in the US. We are curiously caught in a world where people reading my blog are more likely to know I have contracted the flu than any two of my healthcare providers who need that information to treat me better.

While the debate on the best way to handle health information online continues, I was wondering how open I would be to sharing information about what afflicts me, if there was a societal benefit to be derived from it. It could something as simple as monitoring allergy symptoms around where I live or something fancy like tracking an emerging pathogen.

Imagine all of us updating a common channel with “de-personalized” information on what afflicts us globally.   I can imagine the system to be something like this ..I  could “submit” to this service information about what ails me ..and the machine could obfuscate my details , preserving only things like my approximate geographical area and my age and sex and add it to this health information social network.

If implemented well could  possibly then have daily visualizations along the lines of  “We feel fine” to possibly something like  “We feel chicken Gunya”

Of $6 web hosts and dying web apps

Just yesterday I was reading tiagos blog where he requested hosting for a computational intensive bioinformatics web-app that he wrote. The application queries and sytematizes mitochondrial genome information from entrez databases, and I assume would be quite useful to animal geneticists and ecologists. Tiago is physically moving institutes and his blog posts talks of his fears of how the app might die if his personal computer goes down.

In one of my personal projects , I have been wrestling with cloning kappa light chains from several monoclonal antibodies that I generated. The cloning required a good knowledge of the anitbody light and heavy chain leader sequences . Several papers I was reading reference the Kabat and Wu database, which catalogs the thousands of sequences of antibodies and other immunological proteins from mouse and humans . Sadly the links to the Kabat and Wu database in some of these papers does not point to any meaningful location. The resulting google and pubmed searches to find this lost data greatly increased the time and effort required to design my cloning experiments.

Which brings me to my question.

In an era when we have free wiki hosting , 4 GB free email access , supercomputers that power maps , gigabyte large free image sharing applications, $6 per month, terabyte bandwidth web hosting. Why are we still so far from an advertisement supported “free” app host for meaningful scientific data ?

Perhaps its because only a few thousand people who are saving a rare turtle species somewhere on this planet will find tiagos web-app useful..Surely thats not yet worth enterprise level attention, or maybe we should all just write our web apps to run off facebook!

Sequence first ask questions later?

I am little confused after reading about the metagenomics approach that identified the causative agent for the colony collapse disorder which Deepak and myself blogged about.

After trolling through pubmed , it seems like a number of the honeybee potential pathogens were already quite well known. The Kashmir bee virus and the Israeli acute Paralysis virus were also lurking among bee populations. Was is not then possible to query this with a quick microarray designed following some text and sequence mining .

Or maybe its just faster to just sequence the whole bee and then perform the in vitro RT-PCR experiments which are a little more targeted.

Maybe this does say something about the difficulty of on the fly bioinformatics driven microarray fabrication . Since the closest I have come to a microarray experiment is seeing the images on the web .. I was just wondering aloud..I am hardly an expert

Addendum: There is of course no denying the added benefits of the metagenomic approach . Like the many other conclusions the paper made possible- that mite levels in both CCD and non-CCD samples were similar , that microflora ( like the bacteria in the bee gut) among Australian and American bees are similar . So I guess the question then is ..maybe metagenomics is just so much more direct that its going to be the first choice in all such open ended questions like ” What causes infectious Disease X”

Exciting times on the science web : Timo Hannay on Nascent

I was very excited to read Timo Hannays post on the Nature Nascent blog where he reproduced an excerpt from his post for STM news on “how Oreilly and the alpha-geek crowd have influenced Nature Magazine”. Titled , web opportunity , the post talks about the great opportunities that lie in the web for all of science and science publishing.

In the very interesting post Timo talks about the democratization of audio and video and Natures experiments with the Nature podcast. The Nature podcast apparently started off as just an experiment and then grew to almost 30,000 downloads at the end of its first year.
The article talks about scientists who listen to the podcast when they are on the microscope and commuting in or exercising. In my own case, I find that thanks to the nature podcast I am now even more inclined to pick up my print copy, to follow up on something exciting I heard on the nature podcast.

Apart from the ability of audio and video to organize and nucleate communities, Timo also talks about Databases as being the conduits that enable collaborations and the role that publishers have in building communities . Towards this Natures several Gateways , are database driven community resources that aggregate content from both the community and NPG journals in several areas.

The article makes good reading and I will not paraphrase it any further

If I were to rank the web offerings from Nature in terms of their value to my current scientific life..my ranking would go thus
1) The Nature podcast
2) Connotea
3) The Nature Omics gateway

Powered by ScribeFire.

Why Google may be better to find Uniprot sequences than the NCBI

My good friend Deepak had a quote in his blog from Lincoln Stein about making bioinformatics as much an everyday tool to the practicing biologist as a pipettor ( a device used to dispense liquids by experimental biologists and chemists)..

I totally agree, but  think we are quite far away. For example this morning I had to obtain the sequence of 772 swissprot entries  ,which were part of  an alignment for some downstream analysis. Of course my first choice was to query the NCBI -Entrez database. I soon realized that NCBI query box did not return  any results for  the first few queries I tried, all of which were probably new Uniprot/SwissProt IDs ( for eg. .sequence ids Q57T52_SALCH ,Q325Y4_SHIBS )

Disappointed , I turned to the EBI search engine. Within seconds I realized that the EBI indeed does indeed serve up all of entries. SO there are a subset of uniprot entries that the NCBI does not have in its database.

Out of sheer curiosity I entered the queries that drew a blank at the NCBI into Google.

Wonder of Wonders google pulled up all of the hard to find UniProt entries as the very first Match.
Thanks to the increasing use of publicly accessible web service APIs , Google is becoming more and more aware of a lot of very specific sequence data.

I will be very happy when I can type Q57T52_SALCH calc=MW and get an answer back from right inside google. Maybe that day bioinformatics will move one step closer to becoming just another tool.

Till then I am stuck with learning about Equery and WSDL and SOAP and so on..

Powered by ScribeFire.

Blink and its done

The Blink database at the NCBI

The NCBI which I have blogged about before has a number of outgoing links embedded into its search results. One of these links which I use extensively is the Blink-link , which is basically a precompiled BLAST run . The BLAST link – blink exists for every annotated sequence in the database and is a great way to look at homologs to any given sequence without messing around with cutting and pasting sequences into web forms or BLAST input parameters. As easy as clicking on a link- the results come nicely laid out with all the homologs color coded by taxonomy , i.e the archaea , bacteria , fungi , plants , etc. Clicking on any graphic takes you to the pairwise alignment. Clicking on a GI ( a unique numeric ID for every sequence in the database) takes you to the BLAST results (blink) page for that GI.

This interface is very powerful and a great way to explore the sequence space for any protein. I have put together a screencast documenting one of my recent explorations you can see it on the youtube link above.

Documentation: The Blink documentation

Why I think the NCBI Rocks!

A while back I had written about how the NCBI had probably mis-indexed a particular protein domain annotation , because their search algorithm yielded irreproducible and probably erroneous search results for a particular search pattern (see Whassup NCBI) .

Back then I had contacted the NCBI support staff who were extremely helpful and helped me initially troubleshoot the problem to some extent but told me I had to wait for the possible mis-indexing error to get corrected. A few after my initial post I got an email from NCBI support asking me if the problem persisted and whether I had tried repeating the search. I tried repeating the search and the problem had indeed been rectified.

Throughout the whole process I had the good fortune of speaking one on one , with several of the support staff at NCBI who attempted to help me troubleshoot the search pattern. Often these conversations lasted almost 30-40 minutes and contributed immensely to my knowledge of how the NCBI was laid out , the numerous interconnected databases , the precompiled BLAST options , the linkout options and the MyNCBI “personalization” options.

Before this encounter, I did on many an occasion found myself hoping that google would index all the biomedical information out there and make a lot of it available in a “google-ized” format. I am beginning to change my mind on this. Having realized the powerful ways of querying the data at NCBI . I can certainly say that I turn to the NCBI more and more to construct and optimize several queries in the genomic sequence space. The dedication with which the NCBI support staff pursued my problem is something I have only come across on many open-souce forums. In direct contrast a few of my posts to google “support” to get some basic questions on their Google calendar bugs sorted out were largely ignored.

Vertical search is becoming a huge buzzword in this web 2.0 aroused world. Lets face it the NCBI had been our vertical seach platform. Its been around long before the term “search” meant what it is today. It is comprehensive, cross-referenced and annotated to the hilt , has personalization , RSS feeds , email alerts , links to full-text and private libraries all built in for FREE.

In my very humble opinion it has evolved into an extremely powerful and dare I say user-friendly front end to query all biomedical information. If only more of us Omics practitioners take the time to learn its very powerful features.

I for one will definitely try to blog more about its many facets ( along these lines see Search in the Omics age, Compiling search strategies in the biosciences, Screencast 101 ). And I will never wish google waste its time with what the NCBI has already done so well. I only wish the NCBI would popularize its offerings more. I also wish all the bioinformaticians out there would do their part in teaching everyday users the ins and outs of its tools. For now I am glad that we have something a powerful and dedicated as the NCBI.