NCBI oddities

I have often blogged about my trials and tribulations with the NCBI database.This morning I was trying to locate all the kappa light chain genes from the NCBI database.

I tried the following search

Immunoglobulin kappa mouse in the Genome database subsection.

The results I got were a curious mix of microbe genomes ranging from Aspergillus Niger to Salmonella enterica. Maybe I left my search skills at home or my eyes are playing tricks on me.

Addendum: Eric Jane from Uniprot showed me how to do the same query on Uniprot beta. Uniprot really rocks. Not only could I do the query , but also downloaded the results in batch mode as fasta sequences and in the xml format.Thanks eric , I would definitely recommend uniprot beta to everyone. Isabelle phan from uniprot did post an excellent screencast detailing the features of uniprot beta at this link on . Do check it out as well as Erics comments below.


7 responses to “NCBI oddities

  1. If you run the same query on UniProt you’ll get slightly more useful results, I hope:

    Unfortunately the search engine doesn’t know (yet) that “Ig” is a synonym for “immunoglobulin”, and “mouse” should either be restricted to the “organism” field (click on the suggestion) or, better, replaced with its TaxID (use the query builder):

    If you need to go back to NCBI you can then map these results to GenBank or RefSeq or GeneIDs:

  2. Looks like the second link got truncated, so let’s try this:

    The query actual query is:

    (“immunoglobulin kappa” OR “ig kappa”) AND organism:10090

  3. I tried a similar query as the one you suggested as an all database query on NCBI , but sadly NCBI returns NONE
    Also the tinyURL for the query with the faulty NCBI search results is
    I will try the query on Uniprot Beta as you suggested.
    thanks Eric

  4. Wow search works so much better- Thanks a tonne
    Just curious..does uniprot have something like “blink”?


  5. Not exactly, but you can use UniRef to get sequences with 50%, 90% or 100% identity to another sequence, e.g. here are the three clusters for P00750:

    You can also use sequence checksums or map identifiers from other databases if you don’t have a UniProt accession to start with…

  6. Pingback: Why are our bioinformatics workflows so complicated! « The Omics world

  7. We stumbled over here different web address and thought I might check things out.
    I like what I see so now i’m following you. Look forward to checking out your web page again.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s