Category Archives: genomics

My attempts at explaining personal genomics

random hapmap graphicThe other day I was talking to my parents about the fascinating world of personal genomics and I ventured to write up my description of some of the ideas behind the hapmap and personal genomics in an email.

I am reproducing the email here and hoping that I can get pointers on explaining things better ( I am after all only a biochemist/structural biologist) 

The world of personal genomics is upon us. Companies like 23andMe and deCode will run your genetic sample against a known list of variation markers and tell you things about yourself as suggested by your genes. or they will tell you what markers you share with what groups of people. Although this sounds amazing , a lot of this is very nuanced and understanding it is a fun exercise. Also all of this will change evrything or atleast it has the potential to.

Lets start at the begining , when you ask yourself to be “typed” or what do my genes have? What is this stuff all about
We are all quite different, i.e you and me will probably have several hundreds of thousands of  differences between the two of us. To actually estimate exactly how different you and me are , this will require a full sequencing of our genomes . This is quite expensive and takes a lot of time . 
Instead imagine if I told you that that scientists have figured out that these differences occur in groups i.e they are linked together. Very crudely..if one of these jumps from you to your will take a few thousands of its neighbors along for the ride.  So now instead of getting information about the several hundred thousand actual differences , we can learn a lot by just looking at the tens of thousands of these labels . In each case for a particular label ( or marker or SNP) we can look at all the variation determined this far. i.e at position 59 all know human variation has either a A or a T. So you can belong to one of those two groups. Now, Once you get this or any such  label you can infer the rest that such a label is tied to. Collecting information on these labels is what projects like the hapmap do ( see and it is exactly the identity of these labels that a genotyping service will provide you with ( for an example see  

SO whats the big deal? . All of science is trying to figure out, what makes a person A die of a heart-attack before he was 20 , while person B lives till he was 80. As well all know , there are two parts to this story “Nature : or your genes” and “Nurture – or your liefetsyle”. Science can easily attempt to understand your genes. Because that is “hard” information. And in the case of person A and B , science asks the question, whats in their genes that might have led to the outcome.
So coming back to the point I made in the prevous few paras , instead of looking at the entire genome  for differences between person A and B , we can start by asking what markers or labels do they share and what do they not share. Then, taking the markers they dont share . Which among those are common with people who had heart-attacks early. So,  looking at this information may lead to some clues about which genes A or B had led to their resultant life-expectancies.

Lets take another example , say you want to test in advance  “what genes cause allergies to Sulfur drugs ” . So what you would do is give many people sulfur drugs and then check all the people who were allergic and look at what groups their genes belong to. At the end of which you ask the obvious question..all those people who came up with severe allergies to sulfur , what group ( or markers ) did they have in common. In most cases this is not a single gene or single number  , but  for simplicity, the answer you get is something like : if you have group A59C (also known as single nucleotide polymorphism or SNP) then you have a 20% chance of being allergic . Also , since most traits dont depend on only one label or marker , the answers are quite diffused and are given in terms of probabilities. Say a 20% chance of A59C may be converted to a 80% chance in combination with label G456A and a 0.2 % chance if you had F555A . Do you get the point? If you dont dont worry as you can tell it is quite complicated!

Regardless. The chances are that, the more we ask such questions , the more we learn about these probabilities and that  is what most genetic research is looking to do.
So instead of studying say 10000 mostly white americans , things become more meaningful if you study 1000,000 people from every corner of this earth . Then the numbers may all add up and give us a more clear label to associate with any given outcome , like a heart-attack . Thats what the hapmap is all about.

Anyways getting back to the point,  Such studies are what give rise to the field of  personal genomics i.e look at what markers you have and then compare them with known marker-result associations or known drug-effect associations.

As I have already hopefuly convinced you : these studies are very nuanced. People expect  cut-and-dry answers and many may  return disappointed. Also people prone to hypochondria , should definitely stay away , lest that 2% chance that you will have a heart-attack will be converted to a very high percent chance of you getting it , actually because of the stress you put yourself through as a result of this knowledge ( the nurture  or lifestyle part).

Also this points to how these studies if carried out correctly will  change a lot of things , medicine , health-care , the very nature of how we view ourselves.
These studies can attempt to answer questions like how similar are south and north Indians genetically . And as I just told you , nature is not cut and dry and neither is human history!. So interpreting especially these results with social or political  implications is a double edged sword! 

Before I end , you can check out one of these reports as is given by companies like 23andme .

So are you excited? Do you want to find out whether you have a 10% chance of arthritis ? or a 40% chance that you have descended from a caucasian lineage?

more Reading on this topic:

For the medically inclined . You can read an article in science that talks about the implications of these studies for healthcare studies published in the science magazine

The salt lakes in the Egyptian dessert, The Sargasso sea and the cutting edge in the optical querying of neuronal circuits

I recently went to a talk given by Karl Deisseroth of Stanford University whose lab has been in the forefront of developing tools that allow neurons to be activated or de-activated using pulses of light in combination with expressed opsin transgenes. The opsins are basically photoactivated proteins ( like the rhodopsin in our retina) and they are activated by a single photon of light . Deisseroths lab had a while ago developed the use of a Channel rhodopsin- ChR2 which showed a light activated cationic (positive ion) current . Thus, when ChR2 was expressed in neurons a flash of blue light triggered a current of sodium and potassium ions which resembled the neuron firing or the action potential. The group set out looking for opsins that had the opposite effect , i.e they accelerated the suppression of the neuronal action potential ( or brought the neuron back to rest) in response to a light photon. The protein had be functional in neurons and also bring about the light activated depolarization at a time scale of the action potential .

After trolling through a few opsins the Deisseroth group in close collaboration with the Georg Nagel lab at the University of Wuerzburg in Germany zeroed in on the opsin from Naturomonas pharaonis (NpHR or Halo) an archaebacteria that lives in transient salt lakes in the Egyptian dessert in salt concentrations as high as 3.5 M NaCl and a pH of 11. The bug possibly uses the opsins to capture the energy from sunlight and uses the energy to drive the uptake of chloride which allows it to survive in the very salty lake it lives in.

This new opsin now was able to drive a chloride current on photoactivation which effectively turned the neuron off. Fortunately for the group the NpHr opsin was also activated at a wavelength entirely different from the “on switch” ChR2 opsin.

Concurrently with the publication in Nature the group released a video of a worm expressing both opsins which could be paralysed and tickled into movement by pulses of light that stimulated NpHr and CpHr respectively. In the talk Deisseroth also spoke of developments to move the experiments into a mouse model where a transgenic mouse would have a fibre optic probe inserted into its head that would then allow light to stimulate a precice area of the brain ( like a pulse of light would cause the neuron that controlled whisker movement to turn on and the mouse would twitch its whisker).

All of this clearly opens the road to other such opsins that can possibly respond to other wavelengths and drive excitatory and stimulatory currents with different properties. Interestingly way back in 2004 the metagenomics initiative by Craig Venter (the Sorcerer II expedition) published details of several novel opsins from the sargaso sea mass sequencing samples. Who knows , maybe the next such opsin may come some deep sea archaeon and have totally different spectral and kinetic properties which would allow an added level of control to the optical querying of neuronal circuits.

References and additional material

  1. Video of light stimulated supression of worm twitching upon NpHr (Halo) activation
  2. NpHr genome project page
  3. Free full text PlosBiology paper on NpHR by Xue Han and Edward Boyden
  4. Image link from an article in the MIT technology review
  5. MIT technology review article on ChR2 and NpHr ( worm video link )
  6. Request the ChR2 and NpHR (Halo) plasmids deposited by the Boyden lab at addgene
  7. The TED talks have an interesting talk by Craig Venter : around minute 3:46 he talks about the opsins
  8. Full text ( free access) PLOS Biology article on Sorcerer II data.. one of the figures contains an analysis of the spectral characteristics of proteorhodopsins from the metgagenomics dataset

The Resistome – the superbug arsenal characterized

I have heard of many large scale omics studies and their resultant “omes”, but it was only last week when I was reading a review in the Journal Cell on drug resistance in bacteria did I chance on a reference to the soil bacteria resistome , which was published by De Costa et. al. almost a year ago.
The paper deals with mapping the spectrum of antibiotic resistance among 480 streptomyces , the bacteria that produce several of the classes of antibiotics to kill other soil dwelling microbes. These microbes also encode a myriad of resistance mechanisms to make sure they survive the battle themselves. The resistance mechanisms brought into play by these soil dwelling bugs mimic those seen in clinically relevant bacteria. Given that its more a question of WHEN and not IF most of these mechanisms will surface in the clinical world.Therefore characterizing the diversity and density of resistance mechanisms prevalent in the environment is of extreme importance makes their study extremely relevant given the emergence of many super-bugs worldwide. The most surprising finding of the study was the diversity and density of resistance mechanisms present across all the strains.

The paper tested resistance to 20 different antibiotics belonging to every known class of antibiotics produced. Shockingly several of the bacteria were resistant to all 20 of them , with the average bacterium resistant to at least 7 different classes of antibiotics. Some of the antibiotics tested, were ineffective in almost 100% of the cases . Surprisingly the newly launched daptomycin which is effective against some multi-drug resistant microbes found in hospitals etc, was inactive in almost all of the soil isolates.
Resistance profile diagram from
The authors also tested the resistome for possible mechanisms of inactivation and offered the possibility that possible novel mechanisms as well as variations of known mechanisms were operational and present in the resistome. The resistome illustrates how clever micobes are at outsmarting even the most well thought out antibiotic. Before we even think of creating an antibiotic to rule over all antibiotics , the chances that resistance to it is lurking in some niche in the microbe world seems to be quite likely. Although the resistome cannot predict how likely these resistance mechanisms are to transfer from the soil to a bug that can create problems when it sweeps through a hospitals ward: But it does make a case for using modern tools to address possible drug discovery in this class of drugs that indeed introduced chemotherapeutics to the everyday people.
Superbugs have been the stuff of many a popular cover story. Drug resistant tuberculosis , cholera , malaria etc continue to wreak havoc in the developing world. Recent articles in Nature and other journals spoke of the trials and tribulations of platensimycin , the only new class of antibiotic to be discovered in nearly two decades. Even that was hardly “resistance proof” in that over-expression of its target protein was able to confer resistance to the bug. Although the resistome, you could argue makes the case for the absence of a resistance proof antibiotic it definitely underscores the importance of improving the diversity of our arsenals against infectious bacteria. Given that our pace of discovery of antibiotics seems to be slower and slower, the resistome puts some quantitative muscle behind the cries for renewed drug discovery efforts in this area.

Image courtesy  Dr Gerry Wright Lab homepage

Blink and its done

The Blink database at the NCBI

The NCBI which I have blogged about before has a number of outgoing links embedded into its search results. One of these links which I use extensively is the Blink-link , which is basically a precompiled BLAST run . The BLAST link – blink exists for every annotated sequence in the database and is a great way to look at homologs to any given sequence without messing around with cutting and pasting sequences into web forms or BLAST input parameters. As easy as clicking on a link- the results come nicely laid out with all the homologs color coded by taxonomy , i.e the archaea , bacteria , fungi , plants , etc. Clicking on any graphic takes you to the pairwise alignment. Clicking on a GI ( a unique numeric ID for every sequence in the database) takes you to the BLAST results (blink) page for that GI.

This interface is very powerful and a great way to explore the sequence space for any protein. I have put together a screencast documenting one of my recent explorations you can see it on the youtube link above.

Documentation: The Blink documentation

Systems biology education : Bringing the quantitation to biology

Nature reviews Cell and Molecular Biology has an excellent freely accessible supplement called the systems biology user guide. In it are various sections that talk of the applications and challenges facing systems biology. One of them is an essay on education for systems level biologists titled “Back to the future.”

The essay talks of how the graduate level scientists are hardly trained to appreciate the interdependence of modern research on concepts across physics mathematics and biology.  Thus most undergraduate science majors regardless of their ” concentration” don’t really know much of  disciplines outside of  hwat they focus on . Consequently later at the graduate level, they are quite at a loss working on problems like systems biology which clearly require a firm grasp of concepts across all of these disciplines. The article addresses the  question of the time and format to teach these concepts to future scientists and concludes that an undergraduate introductory class in biology is probably the best time to commence such an education. Wingreen and  Botstein then relate their experiences in conducting a seminar class at Princeton which was targeted at early graduate students made up of a mix of students with biology, physics and math backgrounds.

The Seminar class used classic papers from early and relatively recent “Biology” that drew on skills across the three fields . Each paper represented a major breakthrough in Biological understanding due to a combination of keen mathematical and physical insights applied to a biological observation.

Interestingly almost all these classical papers came from the 1940s to 1990s, a time that the authors observe ” biologists’ and physical scientists’ educations were less different than they are today”.

The seminar course at Princeton  serves as an interesting model to educate future graduate students to prepare them to function and research in the systems world.  A lot of the lessons in these early papers are curiously being re-learnt by modern day practitioners of systems biology. I think it will serve anyone wishing to appreciate the systems perspective immensely  to re-read some of these classic papers.

I for one being a trained reductionist plant to go and read all of these classic papers and attempt to “re-educate” myself . Hopefully this will help me get a better grasp of the systems approach and develop a more quantitative frame of mind.

refs: Mol 515 at Princeton

Back to the future : education for systems level biologists

The Hapmap and personalized medicine

The October 2005 issue of Nature had the complete results of the first stage of the Hapmap project. I first came across the Hapmap when I heard Chris Smiths Nature podcast corresponding to that issue. The first stage marked the analysis of polymorphisms in 270 individuals who came from diverse ethnic background. Although I am no geneticist. The great thing about the hapmap data is that it finally allows us to understand human genetic variation.

We have all heard that the genetic differences between all of us is very very small. Of course the hapmap is basically a way to understand what are those differences and how small is very small. Perhaps one of the emerging facts , from studies leading up-to the hapmap project , was the extent of linkage dis-equilibrium among our differences. I will not even attempt to explain linkage-disequilibrium: suffice to say that that disequilibrium arises from differences traveling together from one generation to the next. i.e they are linked. Now why is that a big deal ?. Well it means that instead of studying the 7 million odd “polymorphisms” ( i.e the differences between you and me). Thanks to linkage disequilibrium you can possibly get at a good picture of how “generally different ” you are from me by looking at around just 10,000 places.

SO why is this big deal, because it drastically reduces the cost and effort of tracking down these differences.

Now think of a drug undergoing clinical trials. We have all heard of people who remark ” I dont respond well to amoxicillin but do just fine with zithromax ( two different antibiotics). Now imagine the scenario during a clinical trial , when that “you” and “me” is now linked in with your haplotype ( the collection of differences) and my haplotype and then at the end they might just have a clear picture of – People with haplotype X cannot tolerate this drug while people with haplotype Y , can. That is powerful information that can potentially save lives.

Also the hapmap and the study of such polymorphisms also has powerful implications for the study of human anthropology , human history and human natural selection. In this context the online seminar from science magazine explaining the process of using SNP data to examine natural selection makes very interesting viewing ( see References).

Overall, there is no denying the impact of rapid genotyping on understanding and classifying human response to clinical trials. Of course , just as the launch of Bidil -the first pre hapmap era “personalized drug” specifically targetted towards African-Americans set of a whole slew of ethical debate. I am positive that having hapmap data guide drug discovery or dosage may also be much of a mixed blessing in this era of genomic-medicine

Relevant Links :

Clues to Our Past: Mining the Human Genome for Signs of Recent Selection

Online Science Magazine flash seminar on evidence for positive selection in human lineages and the review by Sabtei et. al.

Race based drug may halt trial for better clinical markers

Nature news and Views on the Hapmap project

Wikipedia entry with an excellent description of what is a haplotype and an SNP

CRACking the scene :High-throughput screens identify mediators of calcium currents

The Endoplasmic reticulum serves as the intracellular store for Ca+2 which is one of the major mediators of a signal inside a cell. Typically a signal arrives and impinges on a protein receptor on the cell surface and this signal is conveyed into the cell by a cascade of events which culminate in the release of Ca+2 from the ER stores mostly into the cytoplasm. In many cells like the immune T-cells, this dumping of stored calcium leads to the opening of calcium release activated ion channels on the cell surface which then restore the Ca+2 levels inside the cell . these Calcium release activated calcium currents ( the so called CRAC currents) were observed in a variety of cell types but the exact molecular nature of these current carriers ( the wondrous ion channels) were unknown to science. Now three papers in Nature , science and PNAS respectively have all identified one of the genes that plays some role in mediating these CRAC currents.

The reason this caught my eye was two fold: one , I recently started working on ion channels and two , all three papers used their versions of high-throughput knockdown screens to arrive at the exact same gene. These studies all used whole genome RNAi in drosophila ( the fruit fly) cells to abolish function. These approaches involve disrupting gene function , gene by gene , one by one using small pieces of rNA thrown onto cells and then looking for a change. In this case a failure to replenish Calcium levels in the cytoplasm after the stores had already dumped. Thus genes involved in CRAC would kill CRAC currents when their RNA pieces are introduced onto the cells ( all done in a 384 well high-throughput format) and these cells would not recover after losing their endoplasmic reticulum calcium.

It is widely believed that these screens are highly plagued by error and irreproducibility owing to their complexity. A big problem is false negatives and an equally big problem is false positives. Although a lot of these caveats are well known and often brandished by the anti-systems biology brigade in established science. Studies such as these point to the maturing of these high-throughput screen approaches to identifying the function of genes. It is also heartening to note that all of these studies followed through with their 20-100 possible initial candidates with secondary assays also conducted in a high-throughput manner (like SNP genotyping analysis and high-throughput patch clamping)and arrived at a the identical gene. This gene is possibly an important candidate gene responsible for the CRAC currents.

There is no doubting the fact that this initial identification will lead to the identification of other partners involved in these CRAC currents which are of key importance in understanding the molecular basis of t-cell activation and several diseases that result from impairments in these CRAC currents.

refs: The Nature paper by Feske et. al., The science paper by Vig et. al., and the PNAS paper by Zhang et. al.