Large-scale experimental studies show unexpected amino acid effects on protein expression and solubility in vivo in E. coli

Springer Science and Business Media LLC - Tập 1 - Trang 1-20 - 2011
W Nicholson Price1,2, Samuel K Handelman1,2, John K Everett1,3, Saichiu N Tong1,3, Ana Bracic4, Jon D Luff1,2, Victor Naumov1,2, Thomas Acton1,3, Philip Manor1,2, Rong Xiao1,3, Burkhard Rost1,5, Gaetano T Montelione1,3,6, John F Hunt1,2
1Northeast Structural Genomics Consortium, USA
2Department of Biological Sciences, Columbia University, New York, USA
3Department of Molecular Biology and Biochemistry, Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, USA
4Wilf Family Department of Politics, New York University, New York, USA
5Department of Biochemistry and Molecular Biophysics, Columbia University, New York, USA
6Department of Biochemistry, Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, Piscataway, USA

Tóm tắt

The biochemical and physical factors controlling protein expression level and solubility in vivo remain incompletely characterized. To gain insight into the primary sequence features influencing these outcomes, we performed statistical analyses of results from the high-throughput protein-production pipeline of the Northeast Structural Genomics Consortium. Proteins expressed in E. coli and consistently purified were scored independently for expression and solubility levels. These parameters nonetheless show a very strong positive correlation. We used logistic regressions to determine whether they are systematically influenced by fractional amino acid composition or several bulk sequence parameters including hydrophobicity, sidechain entropy, electrostatic charge, and predicted backbone disorder. Decreasing hydrophobicity correlates with higher expression and solubility levels, but this correlation apparently derives solely from the beneficial effect of three charged amino acids, at least for bacterial proteins. In fact, the three most hydrophobic residues showed very different correlations with solubility level. Leu showed the strongest negative correlation among amino acids, while Ile showed a slightly positive correlation in most data segments. Several other amino acids also had unexpected effects. Notably, Arg correlated with decreased expression and, most surprisingly, solubility of bacterial proteins, an effect only partially attributable to rare codons. However, rare codons did significantly reduce expression despite use of a codon-enhanced strain. Additional analyses suggest that positively but not negatively charged amino acids may reduce translation efficiency in E. coli irrespective of codon usage. While some observed effects may reflect indirect evolutionary correlations, others may reflect basic physicochemical phenomena. We used these results to construct and validate predictors of expression and solubility levels and overall protein usability, and we propose new strategies to be explored for engineering improved protein expression and solubility.

Tài liệu tham khảo

Makrides SC: Strategies for achieving high-level expression of genes in Escherichia coli. Microbiology and Molecular Biology Reviews. 1996, 60: 512- Sorensen HP, Mortensen KK: Advanced genetic strategies for recombinant protein expression in Escherichia coli. Journal of biotechnology. 2005, 115: 113-128. 10.1016/j.jbiotec.2004.08.004. Tresaugues L, Collinet B, Minard P, Henckes G, Aufrère R, Blondeau K, Liger D, Zhou CZ, Janin J, van Tilbeurgh H, others: Refolding strategies from inclusion bodies in a structural genomics project. Journal of Structural and Functional Genomics. 2004, 5: 195-204. Davis GD, Elisee C, Newham DM, Harrison RG: New fusion protein systems designed to give soluble expression in Escherichia coli. Biotechnology and bioengineering. 1999, 65: Kudla G, Murray AW, Tollervey D, Plotkin JB: Coding-sequence determinants of gene expression in Escherichia coli. Science. 2009, 324: 255-8. 10.1126/science.1170160. Trevino SR, Scholtz JM, Pace CN: Amino acid contribution to protein solubility: Asp, Glu, and Ser contribute more favorably than the other hydrophilic amino acids in RNase Sa. J Mol Biol. 2007, 366: 449-460. 10.1016/j.jmb.2006.10.026. Tanha J, Nguyen T, Ng A, Ryan S, Ni F, Mackenzie R: Improving solubility and refolding efficiency of human V(H)s by a novel mutational approach. Protein Eng Des Sel. 2006, 19: 503-509. 10.1093/protein/gzl037. Wilkinson DL, Harrison RG: Predicting the solubility of recombinant proteins in Escherichia coli. Nature Biotechnology. 1991, 9: 443-448. 10.1038/nbt0591-443. Idicula-Thomas S, Balaji PV: Understanding the relationship between the primary structure of proteins and its propensity to be soluble on overexpression in Escherichia coli. Protein Science. 2005, 14: 582-10.1110/ps.041009005. Smialowski P, Martin-Galiano AJ, Mikolajka A, Girschick T, Holak TA, Frishman D: Protein solubility: sequence based prediction and experimental verification. Bioinformatics. 2007, 23: 2536-2542. 10.1093/bioinformatics/btl623. Magnan CN, Randall A, Baldi P: SOLpro: accurate sequence-based prediction of protein solubility. Bioinformatics. 2009, 25: 2200-7. 10.1093/bioinformatics/btp386. Tartaglia GG, Pechmann S, Dobson CM, Vendruscolo M: A Relationship between mRNA Expression Levels and Protein Solubility in E. coli. Journal of Molecular Biology. 2009, 388: 381-9. 10.1016/j.jmb.2009.03.002. Kapust RB, Waugh DS: Escherichia coli maltose-binding protein is uncommonly effective at promoting the solubility of polypeptides to which it is fused. Protein Science. 1999, 8: 1668-1674. 10.1110/ps.8.8.1668. Lee C, Sun H, Hu S, Chiu C, Homhuan A, Liang S, Leng C, Wang T: An improved SUMO fusion protein system for effective production of native proteins. Protein Sci. 2008, 17: 1241-1248. 10.1110/ps.035188.108. Carstens CP: Use of tRNA-supplemented host strains for expression of heterologous genes in E. coli. Methods in Molecular Biology. 2003, 205: 225-234. Gustafsson C, Govindarajan S, Minshull J: Codon bias and heterologous protein expression. Trends in biotechnology. 2004, 22: 346-353. 10.1016/j.tibtech.2004.04.006. Hatfield GW, Roth DA: Optimizing scaleup yield for protein production: Computationally Optimized DNA Assembly (CODA) and Translation Engineering. Biotechnol Annu Rev. 2007, 13: 27-42. Etchegaray JP, Inouye M: Translational enhancement by an element downstream of the initiation codon in Escherichia coli. Journal of Biological Chemistry. 1999, 274: 10079-10085. 10.1074/jbc.274.15.10079. Gottesman S: Minimizing proteolysis in Escherichia coli: genetic solutions. Methods in enzymology. 1990, 185: 119-29. Chen J, Acton TB, Basu SK, Montelione GT, Inouye M: Enhancement of the solubility of proteins overexpressed in Escherichia coli by heat shock. Journal of molecular microbiology and biotechnology. 2002, 4: 519-524. Wagner S, Klepsch MM, Schlegel S, Appel A, Draheim R, Tarry M, Högbom M, van Wijk KJ, Slotboom DJ, Persson JO, de Gier J: Tuning Escherichia coli for membrane protein overexpression. Proc Natl Acad Sci USA. 2008, 105: 14371-14376. 10.1073/pnas.0804090105. Pédelacq JD, Piltch E, Liong EC, Berendzen J, Kim CY, Rho BS, Park MS, Terwilliger TC, Waldo GS: Engineering soluble proteins for structural genomics. Nature biotechnology. 2002, 20: 927-932. 10.1038/nbt732. Roodveldt C, Aharoni A, Tawfik DS: Directed evolution of proteins for heterologous expression and stability. Curr Opin Struct Biol. 2005, 15: 50-56. 10.1016/j.sbi.2005.01.001. Dale GE, Broger C, Langen H, Arcy AD, Stüber D: Improving protein solubility through rationally designed amino acid replacements: solubilization of the trimethoprim-resistant type S1 dihydrofolate reductase. Protein Engineering Design and Selection. 1994, 7: 933-939. 10.1093/protein/7.7.933. Mayer S, Rüdiger S, Ang HC, Joerger AC, Fersht AR: Correlation of levels of folded recombinant p53 in escherichia coli with thermodynamic stability in vitro. J Mol Biol. 2007, 372: 268-276. 10.1016/j.jmb.2007.06.044. Krüger MK, Pedersen S, Hagervall TG, Sorensen MA: The modification of the wobble base of tRNAGlu modulates the translation rate of glutamic acid codons in vivo. Journal of molecular biology. 1998, 284: 621-631. 10.1006/jmbi.1998.2196. Zeldovich KB, Berezovsky IN, Shakhnovich EI: Protein and DNA sequence determinants of thermophilic adaptation. PLoS Comput Biol. 2007, 3: e5-10.1371/journal.pcbi.0030005. Trevino SR, Schaefer S, Scholtz JM, Pace CN: Increasing protein conformational stability by optimizing beta-turn sequence. J Mol Biol. 2007, 373: 211-218. 10.1016/j.jmb.2007.07.061. Trevino SR, Scholtz JM, Pace CN: Measuring and increasing protein solubility. J Pharm Sci. 2008, 97: 4155-4166. 10.1002/jps.21327. Tartaglia GG, Pechmann S, Dobson CM, Vendruscolo M: A Relationship between mRNA Expression Levels and Protein Solubility in E. coli. Journal of Molecular Biology. 2009, 388: 381-389. 10.1016/j.jmb.2009.03.002. Tartaglia GG, Pawar AP, Campioni S, Dobson CM, Chiti F, Vendruscolo M: Prediction of aggregation-prone regions in structured proteins. J Mol Biol. 2008, 380: 425-436. 10.1016/j.jmb.2008.05.013. Dobson CM: The structural basis of protein folding and its links with human disease. Philos Trans R Soc Lond, B, Biol Sci. 2001, 356: 133-145. 10.1098/rstb.2000.0758. Gekko K, Timasheff SN: Thermodynamic and kinetic examination of protein stabilization by glycerol. Biochemistry. 1981, 20: 4677-4686. 10.1021/bi00519a024. Niwa T, Ying B, Saito K, Jin W, Takada S, Ueda T, Taguchi H: Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins. Proc Natl Acad Sci USA. 2009, 106: 4201-4206. 10.1073/pnas.0811922106. Price WN, Chen Y, Handelman SK, Neely H, Manor P, Karlin R, Nair R, Liu J, Baran M, Everett J, Tong SN, Forouhar F, Swaminathan SS, Acton T, Xiao R, Luft JR, Lauricella A, DeTitta GT, Rost B, Montelione GT, Hunt JF: Understanding the physical properties that control protein crystallization by analysis of large-scale experimental data. Nat Biotechnol. 2009, 27: 51-57. 10.1038/nbt.1514. Kumar S, Nussinov R: How do thermophilic proteins deal with heat?. Cell Mol Life Sci. 2001, 58: 1216-1233. 10.1007/PL00000935. Scandurra R, Consalvi V, Chiaraluce R, Politi L, Engel PC: Protein thermostability in extremophiles. Biochimie. 1998, 80: 933-941. 10.1016/S0300-9084(00)88890-2. Strop P, Mayo SL: Contribution of surface salt bridges to protein stability. Biochemistry. 2000, 39: 1251-1255. 10.1021/bi992257j. Karshikoff A, Ladenstein R: Ion pairs and the thermotolerance of proteins from hyperthermophiles: a "traffic rule" for hot roads. Trends Biochem Sci. 2001, 26: 550-556. 10.1016/S0968-0004(01)01918-1. Elcock AH: The stability of salt bridges at high temperatures: implications for hyperthermophilic proteins. J Mol Biol. 1998, 284: 489-502. 10.1006/jmbi.1998.2159. Campbell JW, Duée E, Hodgson G, Mercer WD, Stammers DK, Wendell PL, Muirhead H, Watson HC: X-ray diffraction studies on enzymes in the glycolytic pathway. Cold Spring Harb Symp Quant Biol. 1972, 36: 165-170. Wu X, Jörnvall H, Berndt KD, Oppermann U: Codon optimization reveals critical factors for high level expression of two rare codon genes in Escherichia coli: RNA stability and secondary structure but not tRNA abundance. Biochemical and Biophysical Research Communications. 2004, 313: 89-96. 10.1016/j.bbrc.2003.11.091. Goh CS, Lan N, Douglas SM, Wu B, Echols N, Smith A, Milburn D, Montelione GT, Zhao H, Gerstein M: Mining the structural genomics pipeline: identification of protein properties that affect high-throughput experimental analysis. Journal of molecular biology. 2004, 336: 115-130. 10.1016/j.jmb.2003.11.053. Chen L, Oughtred R, Berman HM, Westbrook J: TargetDB: a target registration database for structural genomics projects. Bioinformatics. 2004, 20: 2860-2862. 10.1093/bioinformatics/bth300. Acton TB, Gunsalus KC, Xiao R, Ma LC, Aramini J, Baran MC, Chiang YW, Climent T, Cooper B, Denissova NG: Robotic cloning and protein production platform of the Northeast Structural Genomics Consortium. Methods in Enzymology. 2005, 394: 210-243. Elowitz MB, Levine AJ, Siggia ED, Swain PS: Stochastic gene expression in a single cell. Science. 2002, 297: 1183-1186. 10.1126/science.1070919. Wigley WC, Stidham RD, Smith NM, Hunt JF, Thomas PJ: Protein solubility and folding monitored in vivo by structural complementation of a genetic marker protein. Nat Biotechnol. 2001, 19: 131-136. 10.1038/84389. Marin M: Folding at the rhythm of the rare codon beat. Biotechnol J. 2008, 3: 1047-1057. 10.1002/biot.200800089. Golovanov AP, Hautbergue GM, Wilson SA, Lian L: A Simple Method for Improving Protein Solubility and Long-Term Stability. Journal of the American Chemical Society. 2004, 126: 8933-8939. 10.1021/ja049297h. Niwa T, Ying B, Saito K, Jin W, Takada S, Ueda T, Taguchi H: Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins. Proceedings of the National Academy of Sciences. 2009, 106: 4201-4206. 10.1073/pnas.0811922106. Bantscheff M, Schirle M, Sweetman G, Rick J, Kuster B: Quantitative mass spectrometry in proteomics: a critical review. Anal Bioanal Chem. 2007, 389: 1017-1031. 10.1007/s00216-007-1486-6. Hosmer DW, Lemeshow S: Applied logistic regression. 2004, Wiley-Interscience Plata G, Gottesman ME, Vitkup D: The rate of the molecular clock and the cost of gratuitous protein synthesis. Genome Biol. 2010, 11: R98-10.1186/gb-2010-11-9-r98. Kyte J, Doolittle RF: A simple method for displaying the hydropathic character of a protein. Journal of Molecular Biology. 1982, 157: 105-132. 10.1016/0022-2836(82)90515-0. Saeys Y, Rouzé P, Van de Peer Y: In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi and protists. Bioinformatics. 2007, 23: 414-420. 10.1093/bioinformatics/btl639. Myasnikov AG, Simonetti A, Marzi S, Klaholz BP: Structure-function insights into prokaryotic and eukaryotic translation initiation. Curr Opin Struct Biol. 2009, 19: 300-309. 10.1016/j.sbi.2009.04.010. Kozak M: Initiation of translation in prokaryotes and eukaryotes. Gene. 1999, 234: 187-208. 10.1016/S0378-1119(99)00210-3. Kozak M: Regulation of translation via mRNA structure in prokaryotes and eukaryotes. Gene. 2005, 361: 13-37. Feldman DE, Frydman J: Protein folding in vivo: the importance of molecular chaperones. Curr Opin Struct Biol. 2000, 10: 26-33. 10.1016/S0959-440X(99)00044-5. Young JC, Agashe VR, Siegers K, Hartl FU: Pathways of chaperone-mediated protein folding in the cytosol. Nat Rev Mol Cell Biol. 2004, 5: 781-791. 10.1038/nrm1492. Rost B: How to use protein 1D structure predicted by PROFphd. The proteomics protocols handbook. Totowa (New Jersey): Humana. 2005, 875-901. Derewenda ZS: Rational protein crystallization by mutational surface engineering. Structure. 2004, 12: 529-535. 10.1016/j.str.2004.03.008. Wolf M, Wolf Y, Koonin E: Comparable contributions of structural-functional constraints and expression level to the rate of protein sequence evolution. Biology Direct. 2008, 3: 40-10.1186/1745-6150-3-40. Pal C, Papp B, Hurst LD: Highly Expressed Genes in Yeast Evolve Slowly. Genetics. 2001, 158: 927-931. Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH: Why highly expressed proteins evolve slowly. Proc Nad Acad Sci USA. 2005, 102: 14338-14343. 10.1073/pnas.0504070102. Han S, Kim PM: Chaperonin activity modulates codon adaptation. Mol Syst Biol. 2010, 6: Lee Y, Zhou T, Tartaglia GG, Vendruscolo M, Wilke CO: Translationally optimal codons associate with aggregation-prone sites in proteins. Proteomics. 2010, 10: 4163-4171. 10.1002/pmic.201000229. Crombie T, Swaffield JC, Brown AJ: Protein folding within the cell is influenced by controlled rates of polypeptide elongation. J Mol Biol. 1992, 228: 7-12. 10.1016/0022-2836(92)90486-4. Siller E, DeZwaan DC, Anderson JF, Freeman BC, Barral JM: Slowing bacterial translation speed enhances eukaryotic protein folding efficiency. J Mol Biol. 2010, 396: 1310-1318. 10.1016/j.jmb.2009.12.042. Sanbonmatsu KY, Joseph S, Tung C: Simulating movement of tRNA into the ribosome during decoding. Proc Nat Acad Sci USA. 2005, 102: 15854-15859. 10.1073/pnas.0503456102. Pedersen S: Escherichia coli ribosomes translate in vivo with variable rate. The EMBO Journal. 1984, 3: 2895-8. Krissinel E, Henrick K: Inference of macromolecular assemblies from crystalline state. J Mol Biol. 2007, 372: 774-797. 10.1016/j.jmb.2007.05.022. Goh CS, Lan N, Echols N, Douglas SM, Milburn D, Bertone P, Xiao R, Ma LC, Zheng D, Wunderlich Z: SPINE 2: a system for collaborative structural proteomics within a federated database framework. Nucleic acids research. 2003, 31: 2833-8. 10.1093/nar/gkg397. Krogh A, Larsson B, von Heijne G, Sonnhammer ELL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001, 305: 567-580. 10.1006/jmbi.2000.4315. Juncker AS, Willenbrock H, Von Heijne G, Brunak S, Nielsen H, Krogh A: Prediction of lipoprotein signal peptides in Gram-negative bacteria. Protein Sci. 2003, 12: 1652-1662. 10.1110/ps.0303703. Bendtsen JD, Nielsen H, von Heijne G, Brunak S: Improved prediction of signal peptides: SignalP 3.0. J Mol Biol. 2004, 340: 783-795. 10.1016/j.jmb.2004.05.028. Creamer TP: Side-chain conformational entropy in protein unfolded states. Proteins. 2000, 40: 443-50. 10.1002/1097-0134(20000815)40:3<443::AID-PROT100>3.0.CO;2-L. Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT: The DISOPRED server for the prediction of protein disorder. Bioinformatics. 2004, 20: 2138-2139. 10.1093/bioinformatics/bth195. Akaike H: A new look at the statistical model identification. IEEE transactions on automatic control. 1974, 19: 716-723. 10.1109/TAC.1974.1100705. Rost B: PHD: Predicting one-dimentional protein structure by profile-based neural networks. Methods in Enzymology. 1996, 266: 525-539. Rost B, Yachdav G, Liu J: The predictprotein server. Nucleic Acids Research. 2004, 32: W321-6. 10.1093/nar/gkh377. Mehlin C, Boni E, Buckner FS, Engel L, Feist T, Gelb MH, Haji L, Kim D, Liu C, Mueller N, Myler PJ, Reddy JT, Sampson JN, Subramanian E, Van Voorhis WC, Worthey E, Zucker F, Hol WGJ: Heterologous expression of proteins from Plasmodium falciparum: results from 1000 genes. Mol Biochem Parasitol. 2006, 148: 144-160. 10.1016/j.molbiopara.2006.03.011. Brant R: Assessing proportionality in the proportional odds model for ordinal logistic regression. Biometrics. 1990, 46: 1171-1178. 10.2307/2532457. Liu G, Shen Y, Atreya HS, Parish D, Shao Y, Sukumaran DK, Xiao R, Yee A, Lemak A, Bhattacharya A, Acton TA, Arrowsmith CH, Montelione GT, Szyperski T: NMR data collection and analysis protocol for high-throughput protein structure determination. Proc Nad Acad Sci USA. 2005, 102: 10487-10492. 10.1073/pnas.0504338102. Snyder DA, Chen Y, Denissova NG, Acton T, Aramini JM, Ciano M, Karlin R, Liu J, Manor P, Rajan PA, Rossi P, Swapna GV, Xiao R, Rost B, Hunt J, Montelione GT: Comparisons of NMR spectral quality and success in crystallization demonstrate that NMR and X-ray crystallography are complementary methods for small protein structure determination. Journal of the American Chemical Society. 2005, 127: 16505-16511. 10.1021/ja053564h. Luft JR, Collins RJ, Fehrman NA, Lauricella AM, Veatch CK, DeTitta GT: A deliberate approach to screening for initial crystallization conditions of biological macromolecules. Journal of Structural Biology. 2003, 142: 170-179. 10.1016/S1047-8477(03)00048-0. Nakamura Y, Gojobori T, Ikemura T: Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res. 2000, 28: 292-10.1093/nar/28.1.292.