Monthly Archives: December 2008

A naive biochemist wakes up to the closed world of chemical abstracts and such

We have a project in the lab that involves screening small molecule inhibitors that inhibit the transport activity of a membrane protein on a “lab scale”  . Having identified one such inhibitor we intended to look for similar molecules that share the same substructure . Substructure query is a standard procedure in chemical informatics . In the past I have screencast the use of the Sigma-Aldrich service to identify molecules from sigmas catalog based on similarity .  However  considering the wealth of biochemically relevant  information PUBCHEM offers , I was curious to try out the substructure query at PUBCHEM .  This Pubchem service works great and is very feature rich ( screencast coming soon) and gave me several molecules which could be of interest in my screen.

The next step I assumed was to locate these compounds in the catalogs of the many chemical providers using a suitable lookup id . Naively I assumed this would be the CAS id which is the “unique id” associated with each molecule . An hour of googling later I woke up to the realization that CAS is a closed subscription based service which has fought many political battles against the PUBCHEM database . Also while PUBCHEM , fortunately ,  and I guess surprisingly allows lookups of its data by CAS ids , sadly it does not spit out CAS ids for the molecules it identifies as related ( at least as far as I could tell)

I am glad for the Entrez provided services that help lookup CID ids ( PUBCHEMs id) for CAS id  and am now wishing I can go the other way i.e CID to CAS .

Its been almost 10 years since I have used the CAS abstracts since I mostly use literature search available for free at PUBMED . I guess I am finally waking up to the closed world of the chemical abstracts offered by the CAS service of the American Chemical Society. For a non-profit service to be this closed , it makes me thankful for Entrez and the NCBI being this open. With all this talk of open source drug discovery , I would think that the least we can do is make our unique id lookups freely interconvertible and public.

refs : The Ridiculous Battles ( my words ) of Pubchem vs CAS  

Who has got the Bottle 

Of Bubbles and funding

I am writing to attempt to describe my opinions after reading a very insightful commentary written by Gregory Petsko in the September issue of Genome Biology ( doi 10.1186/gb-2008-9-9-110) titled “When Bubbles Burst” .

In that article Greg Petsko analyses the parallels between the current Economics bubble and the Big-science bubble ( my words). Just as we can attribute the financial bubble to the un-regulated growth of the financial industry , we can possibly attribute the many problems ailing the research establishment to the un-regulated growth of the “omics” bubble.

We all have witnessed the move of all Science into the genomic age. We have witnessed the gradual shift of federal research Dollars to consortium based science. Whether it is the cancer genome or structural genomics , there has been a pronounced shift in way we all do science : Bigger it seems is better and data gathering has gained a precedence over hypothesis testing .

The argument being made often , is that from all this data will come better hypothesis which can then be tested in the future . When the big-data prevents us from arriving at any cogent and testable hypothesis , our answer seems to be more big-data .

We have all seen good researchers get caught in their respective “omics” bubbles. And with every such bubble , small labs that dont jump onto the bandwagon tend to  suffer. Of course all  of this would be useless talk if funding were increasing , but as Greg Petsko states , the “pie is finite”

I think the time has come for us to rethink the way we treat fundamental research . When funding is tight , It makes sense to postpone our big-data projects and concentrate on using our infrastructure to concentrate on “smaller” science which research more manageable projects . Give individual labs the funding they need to probe the hypothesis that we have built up based on the available data.

Disband the consortia ( or leave them to industry) and divert funding  back to our research labs. There is no better way in my opinion to survive the current funding crisis.

Disclaimer: These are heavily influenced by the fact that I am in an academic establishment and have never directly worked on any genome level project.