Search in this OMICS age

Often in the membrane protein world , we attempt to clone and express proteins from the microbial world , which are homologous to a particular human protein. Typically we identify a homolog of interest using BLAST and then try and PCR out the gene from genomic DNA from that particular homolog.

Just this morning I wanted to attempt to clone out a homolog for a gene we study here in the lab from the bug yersinia pseudotuberculosis. The first step I followed was to look for the genomic DNA for the bug in the ATCC collection which is as its logo says ” A Global Bioresource center”.
Using a simple word lookup among all the collections at the ATCC I found a list of 22 entries – success. This list informed me that the same bug was deposited into the ATCC under several different names. In addition , there was no genomic DNA for yersinia pusedotuberculosis, but I could get genomic DNA for another member of the yersinia genus Yersinia enterocolitica . Since we always prefer to PCR from genomic DNA I decided to look for whether Y. enterocolitica had the same protein.
So here was my search strategy and the results obtained-

  1. Query the NCBI database to learn that the genome is still unfinished since the sequence for only a few plasmids were present – partial success
  2. Next from the genome link I got to the information on the genome and learned of the use of an NCBI field query [orgn] as in txid630[orgn] – partial successI then used the linkout from the Lineage line which took me to the Sanger institute center which was sequencing the Yersinia enerocolitica type :08 genome and tried to BLAST my sequence against that genome but could not find yersinia enterocolitica in the list there – failure

3. I then went back to the genome page and found a link to a different Genome project for the bug , this time at the Walter Reed Army Institute. This time the “BLAST genome” link led me to another NCBI page with a direct plugin into running a BLAST query against the genome – success

4. I then pasted in the protein FASTA sequence of the particular protein I was interested in and tried to run a blast and I got an error which said “INFO: No alias or index file found for component [Microbial/630], type [protein] in search path [/export/home/splitd/blastdb/blast1:/blast/db/disk.blast/blast1::]”. This seemed more like a database error than a failure to find the sequence in the bug – failure

So in the end I was just confused and learnt not to go looking for homologs among unfinished genomes and considered my time wasted.

I then read a post from Jon Udell titled – “Hunting the elusive search strategy” in that post Jon talks about how some people are actually willing to pay “tech support” $100 for running a Google search and further that

“Effective search depends on reservoirs of tacit knowledge and unconscious skill. Some people possess much deeper reservoirs, and/or can tap into them more effectively, than others. That makes them valuable.”

This made me re-evaluate my earlier appraisal of my two hours as “time wasted”. For sure , I will always now look at the “linkouts” mentioned above for future searches. All of the pages I navigated through are now part of my reservoir of tacit knowledge. These are hopefully intergrated into my future search methodology.

Jon Udell also proposes to compile a list of his search strategies as part of two del.icio.us tags. So to hope that this practice catches on. I will start my own collection here.

Now, I only hope someone pays me $100 to run such a search in the future.

powered by performancing firefox

3 responses to “Search in this OMICS age

  1. Pingback: Biology, search and Udell at

  2. Pingback: Biology, search and Udell at

  3. you’re really a just right webmaster. The site loading pace is amazing.
    It seems that you are doing any distinctive trick. In addition, The contents are masterwork.
    you have done a magnificent job on this subject!

Leave a reply to kleines wohnzimmer einrichten ikea Cancel reply