Monthly Archives: July 2006

Whassup NCBI

Is it just me or is it that the NCBI has changed something drastic with its search engine functionality.

The problem: Simple searches seem to yield very incomplete information
The solution : Is there anyone out there

I have recently started using the slightly advanced features of the Entrez search utility at the NCBI. The NCBI education utilities do a very good job of explaining how to construct detailed searches for any biological information from its massive collection of heavily inter-linked databases.

What I set out to do was simple , a search that I have performed repeatedly in the last several months.
Identifying all chloride channel proteins with TrkA_C domains.
The keywords I used were a simple combination of “chloride” “voltage” and “TrkA” combined using the “History” option of Entrez.
In the past I was delighted to find at-least 90 sequences which I proceeded to save as an “TinyseqXML” file and then import into Excel for organization and downstream processing.

But last week I ran into a strange problem. No matter what I tried , I could not get back the earlier matches . Each of those entries were still in the database. Further the same search strategy yielded only two matches which were not int the original set of 90. Whats more disconcerting was that some matches contained entries like “p53 human tumour suppressor” which we know (and hope) has nothing to do with a chloride channel.

In my opinion something seems terribly wrong here.

Frantic emails to user support at NCBI have only gotten a response saying that ” we dont know why you cannot reproduce your search result” but use the MyNCBI option to save queries for future reproducibility.

I really don’t want to air my grievances in public but it is a tad bit disconcerting when a well tuned search engine like the NCBI starts behaving funny.

.

India and the Hapmap

Disclaimer: I am no geneticist, nor am I an anthropologist. I have a merely extra-curricular interest in history with a greater focus on its misuses to divide people rather than unify them . This interest is what gave birth to the wish-list at the bottom of this post.

I have been recently fascinated about the possibilities opened up by the International Hapmap project. Another blog also talked about the possible uses of the hapmap data to understand population history.

Since I was always fascinated by the cultural and ethnic diversity in India a sub-continent that has seen invasions, migrations and occupation by almost every great power to have ever risen in the old world, I was curious to see how much of diversity data was available for the Indian subcontinent. India it turns out is joining the PAN-Asian HUGO SNP mapping project co-ordinated by the IGIB. ALthough some Indian families did make it into the intial hapmap sampling having a detailed analysis of genetic variation among Indians will obviously have benefits to direct medical research and make it relevant to India and Indians , in addition it will have a very pronounced side-effect in that it will make it possible for us to understand what India really is made up of and how different all our sub-populations are.

I am particularly curious about such a hapmap and its ability to answer several questions and facts about India, particularly with regard to the North-south divide.

# How different are our different populations

# Is there a reason to believe that South Indians come from a genetically different stock than north Indians.

# The thalassemia argument according to which the northern part of India has a greater predominance of thalassemia which is common among European populations, a fact used to corroborate the North-South divide: does this correlate with other markers

# Our Indigenous people: how different are they from “mainstream” Indians

# As far as clinical trials and India go , should we be sampling many communities

# Does haplotype data have any light to shed on the effects of the gotra (an ancient geneology criterion used to guide matrimony among south Indian brahminical Indians) practice.

# Can we use the almost linear genealogies and combine them with haplotype data to get at multi-gene disorders than run in Indian sub-populations

# Is there any evidence for selection amongst Indians

Of course none of this is novel and several such approaches are underway in several countries. But just thought I would jot down this tangent ..and dub it ” populomics ” to justify its posting as part of this blog

Open-notebooks anyone?

Doctors are generally known to lag behind the times as far as technology adoption goes ( my personal experience : no citeable reference yet). Scientists in the life-sciences come very close.

I spent my morning reading about Vioxx and then read a very interesting article about postdocs and data falsification. All this got me thinking about my earlier opinion about the NIH taking the lead on developing frameworks to make electronic lab notebooks a routine practice in any lab receiving NIH funding.

As a graduate student some 5 years ago I was always told about “NIH guidelines” about note-taking and about how any entry in my lab notebook was very important. But face the truth. My results were only written in a form most convenient to me and me alone. While it could obviously mean that I was the rotten apple in the barrel. There is no denying the fact that notebooks can benefit enormously from becoming electronic.

I cannot wait for the NIH to take the lead in at-least providing the frameworks ( like software maybe?) for electronic information storage and retrieval. And importantly records that the NIH can mandate to be made public once a paper is published.

As a crystallographer, I am now very used to depositing a lot of data on the process behind my structure solution into public databases to be later organized and available for all to use.There is no denying that thanks to an NIH funded effort, a variety of tools are available-for understanding structural and genomic data. I would hope that the NIH or NSF would take the lead in funding the development of an NIH-wide notebook project that will eventually provide the “supplementary material” section for all future peer reviewed publications.

Other reading:

Electronic Notebooks a New leaf : article by Declan Butler

ELNs my collection of connotea tags

HHMI Bulletin on archival data providing leads for discovery