A naive biochemist wakes up to the closed world of chemical abstracts and such

We have a project in the lab that involves screening small molecule inhibitors that inhibit the transport activity of a membrane protein on a “lab scale”  . Having identified one such inhibitor we intended to look for similar molecules that share the same substructure . Substructure query is a standard procedure in chemical informatics . In the past I have screencast the use of the Sigma-Aldrich service to identify molecules from sigmas catalog based on similarity .  However  considering the wealth of biochemically relevant  information PUBCHEM offers , I was curious to try out the substructure query at PUBCHEM .  This Pubchem service works great and is very feature rich ( screencast coming soon) and gave me several molecules which could be of interest in my screen.

The next step I assumed was to locate these compounds in the catalogs of the many chemical providers using a suitable lookup id . Naively I assumed this would be the CAS id which is the “unique id” associated with each molecule . An hour of googling later I woke up to the realization that CAS is a closed subscription based service which has fought many political battles against the PUBCHEM database . Also while PUBCHEM , fortunately ,  and I guess surprisingly allows lookups of its data by CAS ids , sadly it does not spit out CAS ids for the molecules it identifies as related ( at least as far as I could tell)

I am glad for the Entrez provided services that help lookup CID ids ( PUBCHEMs id) for CAS id  and am now wishing I can go the other way i.e CID to CAS .

Its been almost 10 years since I have used the CAS abstracts since I mostly use literature search available for free at PUBMED . I guess I am finally waking up to the closed world of the chemical abstracts offered by the CAS service of the American Chemical Society. For a non-profit service to be this closed , it makes me thankful for Entrez and the NCBI being this open. With all this talk of open source drug discovery , I would think that the least we can do is make our unique id lookups freely interconvertible and public.

refs : The Ridiculous Battles ( my words ) of Pubchem vs CAS  

