Introduction
Our understanding of each of the biological sciences becomes heightened by the study of biochemistry and molecular biology. In the last few decades, advances in laboratory techniques for the study of these microscopic sciences have led us to a greater understanding of the central dogma of molecular biology – that DNA transcribes RNA which then gets translated into protein. Understanding protein synthesis is paramount in studying various medical fields, from the molecular basis of genetic diseases through antibiotic development to expressing recombinant proteins as drugs or clinical laboratory reagents. As one of the foundational concepts in biology, protein synthesis is sufficiently complex that many believe it evolved once, giving the protein synthetic machinery in all organisms on the planet a common ancestry. Despite having certain underlying similarities in their mechanism, protein synthesis in the three major lines of descent (bacteria, archaea, and eukaryotes) has diverged to the point that substantive mechanistic differences have arisen. These differences have been exploited in nature as organisms produce compounds targeting the protein synthetic machinery of competitors as they vie for limited resources. Science has modified many of these compounds that target the machinery for protein synthesis in pathogenic microbes for use in the clinic as antibiotics. As our understanding of the mechanisms of protein synthesis continues to grow, there will likely be countless additional applications for this knowledge in medicine, research, and industry.
Fundamentals
Protein synthesis involves a complex interplay of many macromolecules.
- Ribosomes:
- The eukaryotic ribosome has two subunits: a 40S small subunit and a 60S large subunit. Together, the eukaryotic ribosome is 80S. There are several sites of functional significance, but the most important ones are the A (aminoacyl), P (peptidyl), and E (exit) sites. The eukaryotic ribosome is a ribonucleoprotein complex composed of 4 RNAs and 80 proteins. Many of the functions of the ribosome, including catalyzing peptide bond formation, are attributed to ribosomal RNA (rRNA) rather than ribosomal proteins, which instead play a primary role in subunit assembly. Ribosomes can be found either adherent to membranes of the endoplasmic reticulum or free within the cytoplasm.[1]
- Bacterial ribosomes have two subunits, 30S and 50S, that join to form a 70S particle. In general, bacterial ribosomes are smaller than their eukaryotic counterparts, including fewer ribosomal proteins (55) and shorter rRNAs (3 in total). Certain regions of rRNA and some of the ribosomal proteins remain conserved between bacteria and eukaryotes. Other regions of rRNA and proteins are unique to either eukaryotes or bacteria and account in part for differences in mechanisms of protein synthesis discussed above.
- Eukaryotic cells contain a second type of ribosome found within the mitochondrion, which maintains a system of protein synthesis distinct from that found in the cytoplasm. Despite their presence in eukaryotic cells, the origins of the mitochondrial ribosome are traceable to bacteria, consistent with the endosymbiont theory of mitochondrial origins. Care must be taken during antibiotic development to avoid targeting characteristics of the mitochondrial ribosome shared with bacterial ribosomes.
- Messenger RNA (mRNA): the mRNA is another type of ribonucleic acid that functions to carry the coding section of a gene for protein synthesis. It contains portions of non-coding and coding sequences. The coding sequence groups nucleotides into codons, which are three specific nucleotides that correspond to a particular amino acid specified by the genetic code.[2]
- Transfer RNA (tRNA): tRNAs are adaptors bridging the nucleotide sequence found in mRNAs to the amino acid sequence found in a growing protein. Transfer RNAs assume a cloverleaf-like secondary structure with an amino acid linked to its 3’ end through an ester linkage and a stretch of three nucleotides at the base of the cloverleaf referred to as the anticodon. The three bases of the anticodon base pair with complementary codon sequences in an mRNA during the process of protein synthesis. This base-pairing interaction plays a critical role in the readout of the genetic code from mRNA to protein. There are 20 different aminoacyl-tRNA synthetases, one for each of the 20 common amino acids. Once an amino acid links to its cognate tRNA it is referred to as an aminoacyl tRNA, or “charged” tRNA.[3]
- Genetic code: The genetic code sequence is three nucleotides originally encoded an organism’s genome that specifies individual amino acids found in proteins. There are 20 common amino acids used by the protein synthetic machinery and 64 potential sequence permutations of the four bases used to specify the 20 amino acids. Early studies revealed that the code was degenerate, with many of the amino acids specified by multiple 3-base combinations. In general, when multiple codons specify a single amino acid, degeneracy is found at the third or “wobble” position.[1] Sixty-one of the 64 sequence permutations specify amino acids, whereas three of the sequence permutations serve as “stop” codons to terminate protein synthesis. While initially thought to be the same for all living organisms, scientists now know that there are a small number of deviations from the universal code found in mitochondria and specific bacterial species.
- Genetic code and human disease: Appropriate readout of the genetic code is essential for human health. Mutations that alter protein-coding sequences can affect proteins in many different ways. The effect of mutations on the coding sequence can classify as either synonymous or nonsynonymous depending on whether they are predicted to alter the primary structure of a protein.
- Synonymous mutations relate to the degeneracy of the code and the fact that changes in base sequence may not have an effect on which amino acid a codon represents (though it should be noted that some synonymous mutations may affect pre-mRNA splicing and so influence a protein’s primary structure). Synonymous mutations typically fall in the third position of a codon.
- Nonsynonymous mutations fall into three different classes:
- Missense mutations where there is substitution of one amino acid for another.
- Nonsense mutations which introduce a premature termination codon in an mRNA sequence. These mutations typically result in a truncated protein.
- Frameshift mutations result from insertion or deletion mutations that shift the reading frame of a coding sequence such that sequencing downstream of the mutational event no longer code for the correct amino acid sequence of a protein.
- Protein factors– the process of protein synthesis requires multiple non-ribosomal proteins that transiently participate during the initiation, elongation, and termination phases of protein synthesis. These factors are named for the phase in which they function (for example, eukaryotic initiation factor 2, eIF2).[2]
Issues of Concern
The proteins that a cell expresses are the ultimate manifestation of its phenotype. Cells within tissues of the human body have variable phenotypic expression involved in defining tissue organization and function despite having identical genomes due to the differential expression of genes within the genome. While the differential regulation of gene expression primarily occurs at the level of transcription, regulation of gene expression can also take place at the post-transcriptional level, including regulated translation. Because of the importance of protein expression to the phenotypic properties of a cell, errors in the cellular proteome manifested at all levels of the correct readout of genetic information from gene to protein can have broad implications on health.
Cellular Level
The eukaryotic cell is compartmentalized, with different cellular compartments defined by biological membranes. The synthesis of components of the translational machinery begins with the transcription of mRNAs, tRNAs, and rRNAs in the nucleus by RNA polymerases II, III, and I, respectively. Transfer RNAs and the mRNAs encoding ribosomal proteins exit the nucleus and the latter get translated in the cytoplasm. Ribosomal proteins then return to the nucleus where they assemble hierarchically on rRNAs being transcribed by RNA polymerase I. This assembly process defines a compartment of nucleus referred to as the nucleolus. Ribosome assembly is a complex process involving hundreds of accessory factors that transiently associate with ribosomal subunits during their maturation. While most of the steps involved in maturing ribosomal subunits occur within the nucleolus before the subunits exiting through nuclear pores, final steps in subunit maturation occur in the cytoplasm. Ribosomes translating most cellular mRNAs do so as free ribosomes in the cytoplasm. In contrast, ribosomes translating mRNAs encoding proteins destined for secretion from the cell or resident proteins of the endoplasmic reticulum, Golgi apparatus, lysosome, or plasma membrane get localized to the endoplasmic reticulum membrane.[4]
Mechanism
Briefly, translation can be broken down into three phases initiation, elongation, and termination. Initiation consists of identifying the exact site in the sequence of nucleotides in an mRNA to begin translation. This process has significant differences between eukaryotes (described here) and prokaryotes. Upon identification of the start site for translation, elongation ensues as the ribosome moves along the mRNA “reading” groups of three nucleotides that specify each amino acid added to the growing polypeptide chain. Finally, termination occurs when the ribosome encounters one of three termination codons, and the completed protein gets released from the ribosome.
Translation begins with the assembly of an 80S initiation complex on mRNA. This process involves identifying appropriate codon to initiate translation. The AUG codon specifies the amino acid methionine and virtually all proteins specified by the genetic code begin with methionine. In eukaryotes, the AUG used to initiate protein synthesis is usually the first AUG downstream of the cap structure, found at the 5’ end of the mRNA. A protein complex known as eIF4F recognizes the cap structure. The eIF4F complex then recruits the 43S pre-initiation complex comprised of 40S subunits together with a ternary complex formed of the initiator tRNA (Met-tRNA), eIF2, and GTP to the 5’ end of an mRNA. The 40S complex subsequently scans down the mRNA until encountering the first AUG and the 48S initiation complex forms. In addition to eIF4F and eIF2, multiple other initiation factors facilitate the formation of the 48S initiation complex. At this point, the 60S ribosomal subunit joins the 48S initiation complex, all initiation factors are released, and the elongation phase of translation is set to begin. In the 80S initiation complex, the initiator Met-tRNA is base-paired to the initiating AUG in the ribosomal P site with the next codon of the mRNA positioned in the ribosomal A site. Translational re-initiation facilitation occurs by the interaction of the eIF4F complex with both the 5’ cap and the 3’ polyA tail of an mRNA.[5]
As with initiation, elongation requires the use of non-ribosomal proteins known as elongation factors. Eukaryotic EF1A (eEF1A) forms ternary complexes with aminoacyl-tRNAs and GTP. These ternary complexes enter the empty A site of the ribosome and if an appropriate codon-anticodon interaction forms between the incoming aminoacyl-tRNA and the codon in the A site, GTP will be hydrolyzed and eEF1A released. At this point, the peptidyl-transferase site of the ribosome catalyzes peptide bond formation as the free amino group of the incoming aminoacyl-tRNA attacks the ester bond linking the growing polypeptide to the tRNA in the ribosomal P site. The resultant uncharged tRNA occupying the P site moves to the E (exit) site and leaves the ribosome. The growing polypeptide chain previously in the P site is now elongated by one amino acid as it transfers to the aminoacyl-tRNA in the A site. The peptidyl-tRNA in the A site is then translocated to back to the P site with the help of eEF2 and GTP. The A site is now empty, and the entire process is repeated over and over again as the ribosome moves down the mRNA.
Termination occurs when eRF1, a release factor structurally analogous to tRNA, recognizes termination codons in an mRNA and recruits eRF3 to hydrolyze the polypeptide chain from the tRNA occupying the P site. Termination of translation completes by the dissociation of the ribosomal subunits, which are now capable of initiating another round of protein synthesis. Multiple ribosomes can translate a single mRNA simultaneously forming complexes known as polysomes.[5][6][7]
Testing
There are many possible methods of confirming that a particular protein is being synthesized.
Immunostaining
Because of the large number of proteins synthesized in a typical cell, verifying the presence of a particular protein is understandably challenging. One way to confirm the presence of a specific protein in a clinical specimen is through immunostaining. This technique introduces an antibody to a protein of interest, and the exquisite specificity of the antibody serves for protein detection.
In immunostaining, the specimen is incubated with a primary antibody solution. This antibody can contain a fluorescent molecule on its heavy chain or an enzyme (such as horseradish peroxidase) that will fluoresce in the presence of a suitable substrate. The light released can be visualized under a microscope or exposed to photosensitive film in a dark room for later development. Immunostaining can either be direct where the primary antibody possesses the means of fluorescent detection or indirect, where a secondary antibody raised against the primary antibody is detectable via fluorescence.[8]
Protein Electrophoresis
As with nucleic acids, proteins can be separated based on size and/or charge using gel electrophoresis. Proteins can be run in their native configurations or undergo denaturing before electrophoresis. In denaturing electrophoresis, a detergent such as sodium dodecyl sulfate (SDS) is used to disrupt non-covalent bonding forces within proteins. SDS also gives proteins common charge to mass ratios, so the only force operating during SDS-polyacrylamide gel electrophoresis is the molecular sieving action of the polyacrylamide gel. Proteins separated in this manner can be detected either non-specifically with dyes like coomassie blue or specifically using antibodies in a procedure referred to as Western blotting or immunoblotting.
Pathophysiology
Many human diseases result from changes in protein sequence caused by mutations that alter the correct readout of genetic information from gene to a functional protein. Defects in the protein synthetic machinery also cause a small but growing number of human diseases. Examples of such pathologies follow.
Sickle Cell Anemia
Human hemoglobin contains two alpha and two beta chains to create a heterotetramer. In Sickle Cell Anemia, the sixth codon of the beta chain contains a missense mutation, in which glutamic acid, a charged amino acid, is replaced with valine, a neutral amino acid. This single amino acid difference affects the tertiary and quaternary structures of hemoglobin such that it distorts the biconcave shape of erythrocytes into sickle shapes in certain conditions.[9]
Duchenne Muscular Dystrophy
Like many X-linked diseases, DMD primarily affects males at an early age. It is characterized clinically by muscle weakness, calf pseudohypertrophy, and the Gower sign in a child. One of the pathophysiologic origins of this disease is the formation of a premature stop codon in an early exon of the dystrophin gene, which leads to a truncated dystrophin protein which compromises the integrity of the sarcomere and contractile function of the muscle.[10]
Diamond-Blackfan Anemia
While many human diseases result from mutations in the coding sequences of genes that affect protein production, Diamond-Blackfan anemia (DBA) is one of a growing number of conditions resulting from defects in the protein synthetic machinery. DBA is caused by autosomal dominant mutations in genes encoding proteins of either the 40S or 60S ribosomal subunit. While the exact mechanisms underlying the pathophysiology of DBA are currently unknown, it seems likely that changes in cellular proteomes (the protein composition of a cell) resulting from suboptimal numbers of ribosomes contribute in part to the clinical features of the disease. These clinical features include a deficit in red blood cell production, small size, and a heterogeneous number of congenital anomalies.[11]
Clinical Significance
The clinical significance of protein synthesis lies not only in human translation but in differences between human and bacterial translation. The bacterial ribosome (70S) has the same core components and many structurally similar sites compared to the eukaryotic ribosome (80S). However, translational differences between humans and bacteria create targets for antimicrobial drugs. These differences allow certain antibiotics to bind selectively to bacterial ribosomes at low concentrations, targeting bacteria selectively and either inhibiting growth or killing the microbe. Several commonly prescribed antibiotics target specific components of the bacterial ribosome and mRNA. Aminoglycosides target the 30S small ribosomal subunit; specifically, this class binds to the rRNA segment active in the A site. The tetracyclines operate similarly by competing for the A site with charged aminoacyl tRNA. The macrolide antibiotics act on the 50S large ribosomal subunit. When they bind to the rRNA of the large subunit, it prevents the formation of the peptide bond and promotes the early expulsion of the tRNA in the P site.[12][3]
The clinical manifestations of differences in protein synthesis can also be useful in diagnosis. Native protein electrophoresis can help identify hemoglobinopathies in newborn screenings. Similarly, serum protein electrophoresis can identify characteristic M protein spikes of monoclonal protein expression in multiple myeloma.