Each day of Genome Year is a lot of text: at 7.7-8.9 million letters, they are longer than the King James Bible (4.4 million letters including spaces) and the complete works of Shakespeare (5.6 million letters including spaces). Compare Day 1 to Project Gutenberg’s complete Shakespeare in the same format. They are large enough to take a bit of time to load, but not so large (6 Mb, 11Mb) that they will break your browser or bankrupt your data plan – about as much data as a vacation photo album on Facebook.
Click here to jump to a gene with a typical structure, AJAP1. The gene is in the forward direction – that is, it starts at the top of your screen. Half of genes are in the reverse direction. It starts with a string of black letters, which are the 5′ untranslated region. Then comes the instructions to start making the AJAP1 protein: an ATG start codon (which is AUG in the resulting mRNA.) Almost all coding sequences start with ATG, which encodes methionine as the first amino acid in the protein. After several codons, a long intron starts in light blue. The coding region of the gene switches between blue introns and multicolored coding sequences before ending with a stop codon, in this case the opal stop codon TGA. The gene ends with a long 3′ untranslated region.
Day 2 has 125 protein-coding genes, including the gene encoding mTOR (the mechanistic target of rapamycin). mTOR commemorates two places in its name. Rapamycin, an antifungal drug that is also used to prevent transplant rejection, was named after Rapa Nui (Easter Island) where it was discovered. And according to Joseph Heitman, who worked to discover the TOR genes in yeast:
TOR also means door or gateway in German, and the TOR protein serves as a gateway to cell growth and proliferation. This name also commemorates the city in which TOR was discovered, as Basel is an older European city once ringed by a protective wall with large decorative gates, including one still standing, named the Spalentor.
“Basel – Spalentor” by Taxiarchos228 – Own work. Licensed under FAL via Commons – https://commons.wikimedia.org/wiki/File:Basel_-_Spalentor.jpg#/media/File:Basel_-_Spalentor.jpg
mTOR is a kinase that regulates cell growth and is important in many diseases. Mutations that activate mTOR can lead to cancer. Therefore it is an attractive drug target.
The mTOR gene is truly ancient – it can be found in species as distant as rice. This suggests that it is as old as the common ancestor of eukaryotes (> 1.6 billion years).
Click here to jump to the location of a S2215F mutation in mTOR (flashing) which has been found in multiple skin cancers. Note that the mutation isn’t a SNP in the reference sequence – it’s listed as just the reference (A). That is because S2215F is found in tumors but not in normal genomes – it is a somatic mutation that happens in a subset of cells during cancer, but has never been observed as an inherited mutation.
“RHESUS AB-“, Cyril Margouillat (metal sculpture)
Day 3 includes 105 protein-coding genes. Two, RHD and RHCE, define the Rh (Rhesus) blood groups, so-called because rhesus monkeys were instrumental to their discovery. The Rh proteins’ normal role is to transport ammonia across the surface of blood cells.
Like many genes that are similar and next to each other in the genome, RHD and RHCE arose from an ancient duplication of a single gene. RHD has been deleted in about 40% of European-ancestry chromosomes, but the deletion is rare elsewhere, indicating that the deletion was relatively recent in human history. If a mother with the deletion (Rh-) carries a Rh+ baby, her immune system can attack the baby’s blood cells.
The RHD and RHCE genes can be found in other animals as distant as frogs, suggesting they arose in the common ancestor of tetrapods (390 million years ago.)
Click here to see RHD (followed by RHCE) in the context of Day 3.
Day 4 has the 134 protein-coding genes, more than any day on the p arm of Chromosome 1. However, there are some important non-coding genes here: a cluster of genes that encode snoRNAs (small nucleolar RNAs). The job of these RNAs is to help the nucleolus to make chemical modifications to other RNA molecules.
One of the genes here is SNORA73A. The SNORA73A RNA goes to the nucleolus to chemically modify ribosomal RNAs, which then become part of the ribosome, the cell’s protein factory.
SNORA73 relatives are found across vertebrates, even the lamprey, implying that the gene’s common ancestor is at least 530 million years old.
Click here to see your SNORA73A gene. Note that, like many other short RNA genes, it is embedded within the intron of a longer gene.
“Argonauta argo Merculiano” by Comingio Merculiano (1845–1915) in Jatta Giuseppe – I Cefalopodi viventi nel Golfo di Napoli (sistematica) : monografia. Licensed under Public Domain via Commons – https://commons.wikimedia.org/wiki/File:Argonauta_argo_Merculiano.jpg#/media/File:Argonauta_argo_Merculiano.jpg
Day 5 has 108 protein-coding genes, including AGO1 (argonaute 1 RISC catalytic component.) Argonaute is an critical part of the cell’s RNA interference (RNAi) machinery.
Fire and Mello won the 2006 Nobel Prize in Physiology or Medicine for their characterization of RNAi using the nematode C. elegans in 1998, but the gene Argonaute got its name from a group working in the plant A. thaliana. They named the gene family Argonaute because mutations in the plant’s version of the genes led to an appearance that reminded them of a small squid, and named it after the octopus Argonauta argo.
Argonaute proteins are ancient: even bacteria have a version of them, which they use to chew up foreign DNA as a defense against viruses.
Click here to see your human version of AGO1.
Day 6 contains 99 protein-coding genes, including MUTYH (mutY homolog). Throughout life, your cells suffer DNA damage, which is constantly repaired by enzymes – one of which is made by the gene MUTYH.
When the DNA encoding a repairer like MUTYH is itself mutated, though, mutations can start to run amok in the genome, leading to cancer. Inherited MUTYH variants are associated with polyposis and colon cancer.
MUTYH is named after the mutY gene in E. coli bacteria. The similarity to a bacterial gene means that it is as ancient as the common ancestor of prokaryotes and eukaryotes (>1.7 billion years.)
Click here to see your MUTYH gene where you will see a cancer-associated variant flashing.
Two bags of fresh frozen plasma. The bag on the left was obtained from a patient with hypercholesterolemia.
Day 7 has 68 protein-coding genes, including PCSK9 (proprotein convertase subtilisin/kexin type 9). The PCSK9 gene was discovered in 2003 by studying families with very high cholesterol. It soon became clear that different people harbored a whole spectrum of PCSK9 variants, some of which deactivated the protein and led to low LDL and lower risk of cardiovascular disease. PCSK9 became an extraordinary case of a genetic finding leading quickly to new drugs to lower cholesterol.
Click here to see a common variant, rs11591147 (R46L), in PCSK9 – it will be flashing. Having a T instead of a G here leads to a 2-3 fold lower risk of heart disease. This mutation is found on 1-2% of European-ancestry genomes but is rare elsewhere in the world.
Day 8 has 40 protein-coding genes, including LEPR (the leptin receptor.) This gene encodes the protein in the brain that senses leptin, a hormone released by fat cells. Although in general leptin is referred to as the “satiety hormone,” the relationship between leptin and satiety is more complex.
Mutations in the LEPR gene are associated with obesity, in both mice (above) and humans.
The LEPR gene is found in species as distant as fish, meaning it originated at least 440 million years ago.
Click here to see your LEPR gene.
Day 10 has 32 protein-coding genes. One is CYR61 (cysteine-rich angiogenic inducer 61), when encodes a protein that is secreted from cells into the extracellular matrix in response to wounds, tumors, and other sites of inflammation.
CYR61 is notable because the proteins we have seen so far have lived inside or on the surface of the cell. When a cell makes a protein like CYR61, how does it know to secrete it, while keeping others in the cell? The answer is in a sequence of amino acids at the beginning of the protein known as the signal peptide. Gunter Blöbel won the 1999 Nobel Prize for Physiology or Medicine for discovering signal peptides. The UniProt database highlights this sequence at the beginning of the CYR61 protein. Note that the amino acid sequence of the protein, unlike the 4-letter alphabet of DNA, is conveyed using a 20-letter alphabet.
CYR61 is found across bony vertebrates, meaning it originated over 420 million years ago.
Click here to see your CYR61 gene.