Monthly Archives: December 2006

Compiling search strategies in the biosciences

 This morning I spent nearly two hours trolling through the various biological databases trying to find some information but in the end my searches drew a blank. I grew dejected at having wasted all that time only to find nothing.

I then read Jon Udells blog post “Hunting the elusive search strategy” .  In this post , he talks about how some of us are  good searchers vs other people who are not.  He also says

“Effective search depends on reservoirs of tacit knowledge and unconscious skill. Some people possess much deeper reservoirs, and/or can tap into them more effectively, than others. That makes them valuable.”

Some have the ability to compile and relate matches and near matches on the fly  during a search to interactively synthesize a search strategy. Assuming this ability is learned and not innate, the post says it will be very useful to compile effective strategies to understand what is it that makes a good search strategy.

Jon Udell started a tag called searchstrategy on to compile his list.   I too started one on called ncbi-serach-strategy to hopefully compile my own list that will be more bioscience specific. Maybe by such tag aggregation , we can all start to learn at what is it that makes a good search strategy.

I would definitely recommend reading Jon Udell’s post and the comments therein , and also hope this practice of compiling search strategies catches on  as we all learn to handle the gigabytes of omics information.

refs : Google Hacks

 Search , in this Omics age

See “Search in this Omics age”

Search in this OMICS age

Often in the membrane protein world , we attempt to clone and express proteins from the microbial world , which are homologous to a particular human protein. Typically we identify a homolog of interest using BLAST and then try and PCR out the gene from genomic DNA from that particular homolog.

Just this morning I wanted to attempt to clone out a homolog for a gene we study here in the lab from the bug yersinia pseudotuberculosis. The first step I followed was to look for the genomic DNA for the bug in the ATCC collection which is as its logo says ” A Global Bioresource center”.
Using a simple word lookup among all the collections at the ATCC I found a list of 22 entries – success. This list informed me that the same bug was deposited into the ATCC under several different names. In addition , there was no genomic DNA for yersinia pusedotuberculosis, but I could get genomic DNA for another member of the yersinia genus Yersinia enterocolitica . Since we always prefer to PCR from genomic DNA I decided to look for whether Y. enterocolitica had the same protein.
So here was my search strategy and the results obtained-

  1. Query the NCBI database to learn that the genome is still unfinished since the sequence for only a few plasmids were present – partial success
  2. Next from the genome link I got to the information on the genome and learned of the use of an NCBI field query [orgn] as in txid630[orgn] – partial successI then used the linkout from the Lineage line which took me to the Sanger institute center which was sequencing the Yersinia enerocolitica type :08 genome and tried to BLAST my sequence against that genome but could not find yersinia enterocolitica in the list there – failure

3. I then went back to the genome page and found a link to a different Genome project for the bug , this time at the Walter Reed Army Institute. This time the “BLAST genome” link led me to another NCBI page with a direct plugin into running a BLAST query against the genome – success

4. I then pasted in the protein FASTA sequence of the particular protein I was interested in and tried to run a blast and I got an error which said “INFO: No alias or index file found for component [Microbial/630], type [protein] in search path [/export/home/splitd/blastdb/blast1:/blast/db/disk.blast/blast1::]”. This seemed more like a database error than a failure to find the sequence in the bug – failure

So in the end I was just confused and learnt not to go looking for homologs among unfinished genomes and considered my time wasted.

I then read a post from Jon Udell titled – “Hunting the elusive search strategy” in that post Jon talks about how some people are actually willing to pay “tech support” $100 for running a Google search and further that

“Effective search depends on reservoirs of tacit knowledge and unconscious skill. Some people possess much deeper reservoirs, and/or can tap into them more effectively, than others. That makes them valuable.”

This made me re-evaluate my earlier appraisal of my two hours as “time wasted”. For sure , I will always now look at the “linkouts” mentioned above for future searches. All of the pages I navigated through are now part of my reservoir of tacit knowledge. These are hopefully intergrated into my future search methodology.

Jon Udell also proposes to compile a list of his search strategies as part of two tags. So to hope that this practice catches on. I will start my own collection here.

Now, I only hope someone pays me $100 to run such a search in the future.

powered by performancing firefox