Why are our bioinformatics workflows so complicated!

Last week to answer one question I had to resort to information from several sources . A lot of them contributed immense value to my “workflow” and were also either difficult to perform or very easy. For a start I have ranked them in terms of both Value ( 1 for no value to 10 for a lot of value) to ease of use ( 1 for very complicated to 10 for very easy)

# Assembling my sequences in DNAstar (Value 10 : Ease 7 )

# Compiling my sequences and pulling them into Jalview. Ran CLUSTALW web service on edited alignments and realized that all of my clones had basically two sequences for their CDRs . . Jalviews excellent web-service CLUSTALW interface allowed me to quickly edit the 32 sequences , align them interactively and realize they belonged to two types. This got me thinking that maybe the primers I used to clone my CDRs from my mouse kappa light chains were probably mis-priming ( Value 10 : Ease 9)

# Use pubmed to look at precedents i.e analyze all possible papers which had sequenced the mouse anitbody kappa light chain CDR region as I had attempted to do and derive the sequences of the primers they had used. It took forever to get the right keywords to query and I still have only three kappa light chain primer sequences. ANd they are all different! ( Value 10 : Ease 1 ),

# Use my primer sequences , compare them with the literature and figure out how I had misprimed and why my sequences were all either of two types ( Still in progress Value immense : Ease 1 i.e still difficult to do)

# Use pubmed / NCBI genome to understand the sequence space for mouse kappa light chains ( Value 10 , Ease 4 , )

# Use EBI to get the same sequence data ( Value 10 : Ease 8 )

This is still work in progress . But to summarize –

The pubmed steps were the most painful . Pubmed search has to improve!.

Jalview contributed the most value. For a free App its a must have in any bioinformatics toolkit!. DNAstar played its role ..but for its cost ( a few thousand dollars )! It sure gave a lot less value than Jalview

All of this begs the question! ..why are bioinformatics workflows so difficult! We are a long ways away from making these things easy to do for everyone!


NCBI oddities

I have often blogged about my trials and tribulations with the NCBI database.This morning I was trying to locate all the kappa light chain genes from the NCBI database.

I tried the following search

Immunoglobulin kappa mouse in the Genome database subsection.

The results I got were a curious mix of microbe genomes ranging from Aspergillus Niger to Salmonella enterica. Maybe I left my search skills at home or my eyes are playing tricks on me.

Addendum: Eric Jane from Uniprot showed me how to do the same query on Uniprot beta. Uniprot really rocks. Not only could I do the query , but also downloaded the results in batch mode as fasta sequences and in the xml format.Thanks eric , I would definitely recommend uniprot beta to everyone. Isabelle phan from uniprot did post an excellent screencast detailing the features of uniprot beta at this link on . Do check it out as well as Erics comments below.