Category Archives: Genetics

A Tale of Cookies and Milk: How We Adapted to Consuming Grains and Dairy

Humans are curious creatures. We like to poke and prod at new things to see what will happen. This curiosity is part of the reason we are successful. Though it can sometimes lead to disastrous outcomes, curiosity can be the reason not only for cultural inventions, but biological changes. This is especially true for our diet, which has changed radically in the past 10 – 20 thousand years. Two of the biggest changes have been our ability to efficiently digest grains and dairy. The agricultural revolution led to a lot of changes in human diet, including grain and dairy. Humans were experimenting with many new types of food. I’m sure the first individual to started eating grain was met with a warmer  reception than the one who suggested we start drinking cow and goat milk. At any rate, both ventures wound up changing our biology and culture. Just think: without amylase and lactase, Santa would be having something other than cookies and milk.

The Short Story of Amylase

In order to digest grains or any other starchy food, an organism needs an enzyme called amylase. Amylase hydrolyzes starch, eventually getting to the glucose molecules contained within the food. Though amylase is not unique to humans, there are some unique aspects about human amylase. In humans there is a positive correlation between the number of copies of the gene responsible for production of amylase, AMY1, that exist in a genome and the expression of amylase in the saliva. Interestingly, the average human contains about 7 times as many copies of AMY1 as chimpanzees, suggesting evidence for amylase selection after our split from the common ancestor with chimpanzees. The small differences between DNA in human AMY1 genes suggest a fairly recent selection event. Moreover, populations with high-starch diets had more AMY1 copies than populations with low-starch diets, further supporting a more recent selection as well as fairly rapid evolution. When it comes to diet, it seems natural selection can act fairly quickly.

The process of carbohydrate digestion begins in the mouth with an enzyme called ptyalin, also known as α-amylase. Ptyalin hydrolyzes the glycosidic bonds within starch molecules, breaking them down into the disaccharide sugar known as maltose. In the walls of the stomach, specialized cells called parietal cells secret hydrogen and chloride ions, creating hydrochloric acid. Amylase, which works at an optimum pH of about 7, cannot function in the highly acidic environment of the stomach.

The second part of starch digestion is initiated in the small intestine by an enzyme called pancreatic amylase. Though pancreatic amylase and salivary amylase are coded by two different DNA segments, they are side by side in the genome. It has been suggested that an endogenous retrovirus inserted DNA in-between the two copies of amylase that existed in our ancestors’ genome; this interruption in the open reading frame of the gene caused a mutation that promoted amylase production in the saliva from one of the gene copies that originally coded for pancreatic amylase. This mutation would have had a clear advantage, allowing for greater breakdown of starchy foods. Further evidence for the positive selection of salivary amylase production can be seen in its independent convergent evolution in mice and humans.

So the story for amylase is fairly short. Our ancestors began with two pancreatic amylase genes, which split to create one pancreatic and one salivary amylase gene. Over time, copy-number variations in genes occurred and were either selected for or against. Random gene duplication in conjunction with varying diets among human populations has resulted in the amylase locus being one of the most variable copy-number loci in the entire human genome.

The Somewhat Longer Story of Lactase

            The Neolithic (agricultural) revolution brought about some of the biggest cultural changes that our species has ever seen. Small groups of hunter-gatherers began to morph into large societies of agricultural-based farmers that existed in tandem with a group of people who lived a nomadic herding lifestyle. Nomadic herders could travel between these newly formed cities, trading meat, milk, or animals for agricultural products such as recently domesticated plants and grains. This substantial change in lifestyle caused a rapid overhaul in many aspects of human biology, including immunity, body size, and prevalence of certain digestive enzymes.

Lactase is the enzyme that breaks down the disaccharide sugar lactose, found in dairy products, into the monosaccharides glucose and galactose. Lactase is an essential enzyme because it allows infants to break down the lactose in the mother’s milk. However, there is a down-regulation of the lactase gene during childhood for a significant portion of world’s population. Curiously, the portion of the world’s population that does not experience down-regulation of the lactase gene are mostly of European descent. There is an interesting correlation between geographic location and percent of the population with lactase persistence. The further North you go in Europe, the more lactase persistence you find. This probably has to do with the fact that the colder climate of Europe, especially northern Europe, left fewer options for food consumption. The ability to digest and reap the benefits of lactose into adulthood could have acted as a major factor in surviving to reproductive age, thus increasing the prevalence of lactase persistence in those cultures.

Milk has a decent amount of calories and fat to keep energy reserves up, allowing people to survive harsh winters in Northern Europe. In addition, it provides nutrients such as calcium, protein, and vitamins B12 and D. Today in the Western world we see the high caloric and fat content of milk as a threat of weight gain. However, people living in 7000 B.C. would have seen this as a gold mine for survival. As essential as the calories and fats were to Northern European Neolithic people, the vitamin D content of milk may have been equally as important. In order for the body to synthesize vitamin D, it needs UVB rays from sunlight. This is an issue at northern latitudes, where it’s colder and there’s less sunlight than many other areas on Earth. Moreover, the amount of UVB light that can be absorbed is dependent upon angle at which the Sun’s rays strike the Earth. So even during a clear sunny day in the winter, people living in northern latitudes may not be absorbing UVB rays.


One way to combat the low levels of UVB rays is to have fair skin. UVB rays that strike the skin will cause the synthesis of cholecalciferol (Vitamin D3) from 7-dehydrocholesterol that is already present in the skin, eventually leading to the production of a usable form of vitamin D. Specifically, 7-dehydrocholesterol is found predominately in the two innermost layers of the epidermis. This can be an issue for UVB absorbance since, melanin, which is the pigment responsible for darker skin, absorbs UVB at the same wavelength as 7-dehydrocholesterol. Indeed, it turns out that fair-skinned people (who tend to live in colder and more northern climates) are more efficient at producing vitamin D than darker skinned people.

Vitamin D is really an underappreciated nutrient. It is essential for absorption of calcium, which is nearly ubiquitous in its usage throughout the body, from brain functioning to muscle contraction. Recent research has illuminated other uses for vitamin D, including regulation of genes associated with autoimmune diseases, cancers, and infection. One study in Germany found that participants (average age of 62) with low vitamin D levels are twice as likely to die, particularly of cardiovascular problems, in the following 8 years than those with the highest vitamin D levels.

Though it isn’t too important to us today, lactase persistence might have saved the populations of Neolithic people in Northern Europe. Its dose of fat and calories helped bump up energy stores while the calcium and vitamin D found in whole milk reaped significant nutritional benefits. Though there are still many questions surrounding the evolution of lactase persistence in sub-populations of humans, the selection of this phenotype is quite clear. Those with lactase persistence would have had supplemental nutrition that might have helped them survive the Northern European winters.



When DNA Isn’t Enough: Methylation, Forensics, and Twins

DNA evidence is often considered a “home run” in forensics. If you find readable DNA at the crime scene, and it matches a suspect, a correct conviction is almost assured. A DNA sample can often point to a single individual with ridiculous specificity – often 1 in a quadrillion or greater. But, what happens when someone else shares your DNA?

Monozygotic, or “identical” twins differ from dizygotic, or “fraternal” twins in that they come from the same zygote, hence, “mono”zygotic. In other words, identical twins come from 1 fertilized egg, while fraternal come from two. This means that Identical twins will share the same DNA, while fraternal twins will share as much DNA as any other sibling pair. There are, of course, many iterations of monozygosity depending when during development the split actually takes place. This nuance has led scientists in Germany to a possible solution to the issue of identical twin DNA.

During development, only a few cells are present. These cells begin to differentiate into the different tissue types that they will become. As these cells divide rapidly to produce the all of the daughter cells, mutations can occur in the DNA. If the mutation occurs earlier, it will be present in a larger ratio of the daughter cells, and will be more easily detectable during the twin’s lifetime. This differentiation of tissues also means that, the earlier the twins split, the less mutations they will have in common (and, thus, the more differences you can detect in their DNA). It has been suggested recently that, a handful of single nucleotide mutations, or “SNPs” can be found between twins. However, these SNPs aren’t so easy to find in a sea of 3 billion other nucleotides. To find these few differences and find them reliably, the entire genome of both twins must be sequenced several times over. In the case of the German scientists, their experiment results in 94-fold coverage, meaning they covered each of the 3 billion nucleotides 94 times. This must be done to ensure accuracy. At 3 billion nucleotides, a 99.9% accuracy will still result in 30 million errors. If anything, this shows how incredibly accurate our cellular machinery is.

At any rate, the scientists tested their new method on a set of twins, and it worked. In the end, twelve SNPs were identified between the twin brothers. Typically, one experiment is not considered to hold much weight in science, but this particular experiment is backed by strongly reinforced genetic theory, and the results were exactly what we would expect.

So, case solved, right? Well, maybe not. It turns out that this method comes with a hefty price tag – over $100,000. This is far too much to be practical in forensic case work, especially when you consider that about 1 person in 167 is an identical twin. Of course, this price will go down as DNA sequencing prices continue to plummet in light of newer, better technology. Still, it will be many years before anything like this will be affordable (a typical forensic DNA test costs in the neighborhood of $400-$1000). Furthermore, the instruments used in this method (Next generation sequencing), though typical in research science, have not been approved for use in court. That in and of itself can be a challenging obstacle to overcome, regardless of costs.

Perhaps in a few decades these issues will be resolved. Perhaps not. Either way, it might be a good idea to have a plan in the meantime. This is (hopefully) where my master’s thesis comes in.

DNA is composed of four nucleotides, commonly noted as “A T C and G.” Throughout life, a methyl group – a carbon and three hydrogens – attaches to some of the C’s in your genome. This is known as DNA methylation, which is a big component of the larger phenomenon known as epigenetics. As it turns out, these methyl groups attach randomly to the C’s, though some evidence suggests that environmental conditions may play some part in this. In any case, the attaching of methyl groups to C’s is different among individuals – even identical twins. In fact, studies have shown that newborns already exhibit DNA methylation discordance. Presumably, these differences would become more pronounced as time goes on. Not many studies have looked at this, but the ones that have also show evidence of greater discordance with age.

There is a potential issue with studying DNA methylation: it doesn’t occur uniformly among tissues. In other words, a blood sample and a skin sample from the same individual will show different patterns of methylation. Moreover, cells within the same tissue can show different methylation patterns. Though not insurmountable, these issues make methylation analysis a tricky subject.

To tackle the first issue of tissue discordance, one could simply match the type of DNA you take from the suspect with the type of DNA you have at the scene. The second issue of intra-tissue discordance is a bit trickier to tackle. For starters, we don’t know too terribly much about how DNA methylation works. Ostensibly, if methylation differences occurred early in development, then they would show the same pattern of proliferation as the SNPs that occur early in development. This means that the same DNA methylation pattern would be present in all of the daughter cells, and show up easily in a DNA sample from that tissue.

Another possible solution would be to take a statistical approach. This would involve looking at the methylation patterns several times and coming up with an “average” methylation. For example, let’s say there are 10 C’s susceptible to methylation in a particular DNA sequence. If I run 10 samples from a DNA swab, I might find the number of methylated C’s to be: 3, 4, 5 ,3, 2, 4, 5, 3, 4, 4. If you average these, you get 3.7 out of 10 possible methylated C’s. Thus, you might say that this DNA sequence shows 37% methylation. If you do the same thing for the other twin and come up with 5.5 out of 10 possible methylated C’s, you could say that the other twin’s sequence shows 55% methylation. Ideally, these number would be relatively reproducible, especially as you increase the sample number and/or number of potentially methylated C’s per sequence.

Compared to the SNP method, my project is less definitive. However, good protocols would still make the method definitive enough. Once you narrow the suspects down to two twins via normal DNA testing, you have two possible outcomes: a match between one twin and the sample at the crime scene, or inconclusive. At this point, you just need to differentiate between two people, not 7 billion. Thus, the required statistical power is much, much lower. The big difference between my method and the SNP method is the price. Whereas the SNP method costs between $100,000 and $160,000, my method could be done in-house for less than $5000. Furthermore, my method is performed using the same instruments as traditional DNA testing, meaning that the new instrumentation does not need to be validated for use in court.

So, while it will take some work, and my project is more of a proof of concept study, the use of DNA methylation in forensics is generating a lot of attention. One of the issues with methylation in my study, i.e., different patterns in different tissues, has been a major benefit to a different use of DNA methylation – tissue identification. The idea here being that if you can identify consistent methylation patterns among a tissue type, you can use those patterns to identify the tissue. Another aspect that is relevant to my project, the increase in methylation with age, has been vetted as a possible investigative tool. If you can identify level of methylation that are consistent with different age groups, you can potentially “age” a suspect just by their DNA methylation. Studies on methylation aging are few and far between, but preliminary results are promising, suggesting that age-based methylation analysis can get within +/- 5 years of an individual’s actual age.

As we learn more about DNA methylation, it will become more useful. This is true not only for forensics, but also medicine, since methylation plays an important role in turning genes “on” or “off.” This is particularly true in cancer, where abnormal DNA methylation seems to occur. But, before we try to cure cancer with methylation, perhaps we can perform the smaller task of telling two twins apart from each other.

*Also published in part at


Multiplex Automated Genome Engineering: Changing the world with MAGE

Humans have evolved a most unique mastery of toolmaking through advanced technology. As an extension of our biological bodies, technology has loosened the grip of natural selection. This is particularly true in the field of biomedicine and genetic engineering. We have the ability to directly alter the blueprint of life for any purpose we wish. Beginning in the 1970’s with the creation of recombinant DNA and transgenic organisms, genetic engineering has offered scientists the ability to study genes on a level that may not have seemed possible at the time. The field has provided a wealth of knowledge as well as practical implications, such as knockout mice and the ability to produce near-endless amount of human insulin for diabetics.

As of 2009, multiplex automated genome engineering (MAGE) has ushered in a new branch of genetic engineering – genomic engineering. We are no longer restricted to altering single genes, but rather are able to alter entire genomes by manipulating several genes in parallel. This new ability, brought about by MAGE technology, allows for nearly endless applications that stretch well beyond medicine or industry; agriculture, evolutionary biology, and conservation biology will benefit tremendously as MAGE technology progresses. Genetic engineering advancements such as MAGE are poised to revolutionize entire fields of science, including synthetic biology, molecular biology, and genetics by offering faster, cheaper, and more powerful methods of genome engineering.

Homologous Recombination

Genetic engineering underwent a revolutionary change in the 1980’s, largely due to the pioneering work of Martin Evans, Mario Capecchi, and Oliver Smithies. Evans and Kauffman were the first to describe a method for extracting, isolating, and culturing mouse embryonic stem cells. This laid the foundation for gene targeting, a method that was independently discovered by both Oliver Smithies and Mario Capecchi. Mario Capecchi and his colleagues were the first to suggest mammalian cells had the machinery capable for homologous recombination with exogenous DNA. Smithies took this a step further, demonstrating targeted gene insertion using the β-globin gene. Ultimately, the combined work of Evans, Smithies, and Capecchi on homologous recombination earned them the Nobel Prize in Physiology or Medicine in 2007. The science of homologous recombination has allowed for many scientific discoveries, primarily through the creation of knockout mice.

Homologous recombination works under many of the same principles are chromosomal recombination in meiosis, wherein homologous genetic sequences are randomly exchanged. The difference lies in the fact that homologous recombination works with exogenous DNA and on a gene level rather than chromosomal level.



The method works by using a double stranded genetic construct with flanking regions that are homologous to the flanking regions of the gene of interest. This allows for the sequence in the middle, containing a positive selection marker and new gene, to be incorporated. The positive control should be something that can be selected for, such as resistance to a toxin or a color change. Outside of one of the flanking regions of the construct should lie a negative selection marker; the thymidine kinase gene is commonly used. If homologous recombination is too lenient, and the thymidine kinase gene is incorporated into the endogenous DNA, it can be detected and disposed of. This is to prevent too much genetic information from being exchanged.

Using this method, knockout mice can be created. A knockout mouse is a mouse that is lacking a functional gene, allowing for elucidation of the gene’s function. Embryonic stem cells are extracted from a mouse blastocyst and introduced to the gene construct via electroporation. The successfully genetically modified stem cells are selected using the positive and negative markers. These are isolated and cultured before being inserted back into mouse blastocysts. The mouse blastocysts can then be inserted into female mice, producing chimeric offspring. These offspring may be mated to wild-type mice. If the germ cells of the chimeric mouse were generated from the modified stem cells, then the offspring will be heterozygous for the modified gene and wild-type gene. These heterozygous mice can then be interbred, with a portion of the offspring being homozygous for the modified gene. This is the beginning of a mouse line with the chosen gene “knocked-out.”


Multiplex Automated Genome Engineering Process

The major drawback of the previously described method of “gene targeting” is the inability to multiplex. The process is not very efficient, and targeting more than one gene becomes problematic, limiting homologous recombination to single genes. In 2009, George Church and colleagues solved this issue with the creation of multiplex automated genome engineering (MAGE). MAGE technology uses hybridizing oligonucleotides to alter multiple genes in parallel. The machine may be thought of as an “evolution machine,” wherein favorable sequences are chosen at a higher frequency than less favorable sequences. The hybridization free energy is a predictor of allelic replacement efficiency. As cycles complete, sequences become more similar to the oligonucleotide sequence, increasing the chance that those sequences will be further altered by hybridization. Eventually, the majority of endogenous sequences will be completely replaced with the sequence of the oligonucleotide. This process only takes about 6-8 cycles.


After the E. coli cells are grown to the mid log phase, expression of the beta protein is induced. Cells are chilled and the media is drained. A solution containing the oligonucleotides is added, followed by electroporation. This step is particularly lethal, killing many of the cells. However, the cells are chosen based on positive markers (optional, but increases efficiency) and allowed to reach the mid-log phase again before repeating the process. Church and his colleagues have optimized the E. coli strain EcNR2 to work with MAGE. EcNR2 contains a plasmid with the λ phage genes exo, beta, and gam as well as being mismatch gene deficient. When expressed, the phage genes will help keep the oligonucleotide annealed to the lagging strand of the DNA during replication, while the mismatch gene deficiency prevents the cellular repair mechanisms from changing the oligonucleotide sequence once it is annealed. Using an improved technique called co-selection MAGE (CoS-MAGE), Church and colleagues created EcHW47, the successor to EcNR2. In CoS-MAGE, cells that exhibit naturally superior oligo-uptake are selected for before attempting to target the genes of interest.

MAGE technology is currently in the process of being refined, but shows incredible promise in practical applications. Some of the immediate applications include the ability to more easily and directly study molecular evolution and the creation of more efficient bacterial production of industrial chemicals and biologically relevant hormones. Once the technique has been optimized in plants and mammals, immediate applications could be realized in GMO production and creation of multi-knockout mice that will give scientists the ability to study gene-gene interactions on a level previously unattainable. A more optimistic and perhaps grandiose vision could see MAGE working towards ending genetic disorders (CRISPR technology, an equally incredible genomic editing technique, may beat MAGE there) and serving as a cornerstone technique in de-extinction. The ability to alter a genome in any fashion brings with it immense power. The possibilities for MAGE are boundless, unimaginable, and are sure to change genomic science.

For more information on Homologous recombination, see:

For more information on MAGE, see:

Wang, H. H., Isaacs, F. J., Carr, P. A., Sun, Z. Z., Xu, G., Forest, C. R., & Church, G. M. (2009). Programming cells by multiplex genome engineering and accelerated evolution. Nature, 460(7257), 894-898.

Wang, H. H., Kim, H., Cong, L., Jeong, J., Bang, D., & Church, G. M. (2012). Genome-scale promoter engineering by coselection MAGE. Nature methods, 9(6), 591-593.

For more information on CRISPR (which I highly recommend; it’s fascinating), see: