Scientific paper Gene Expression Profiling of Recombinant Protein Producing E. coli at Suboptimal Growth Temperature I V - ^ V i ^ Mitja Mahni~,1* [pela Baebler,3 Andrej Blejec,3 [pela Jalen,4 Kristina Gruden,3 Viktor Menart and Simona Jev{evar2 1 Development center Slovenia, Lek Pharmaceuticals d.d., a Sandoz company, Verovskova 57, 1526 Ljubljana, Slovenia 2 Sandoz Biopharmaceuticals Menges, Lek Pharmaceuticals d.d., a Sandoz company, Kolodvorska 27, 1234 Menges, Slovenia 3 Department of Biotechnology and Systems Biology, National Institute of Biology, Vecna pot 111, 1000 Ljubljana, Slovenia 4 Laboratory for Biosynthesis and Biotransformation, National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Slovenia * Corresponding author: E-mail: mitja.mahnic@sandoz.com; Phone: + 386 1 5803275 Received: 07-03-2011 Dedicated to inspiring mentor dr. Viktor Menart. Abstract Recent studies have revealed that at lower cultivation temperatures (25 °C) much higher percentage of correctly folded recombinant hG-CSF protein can be extracted from inclusion bodies. Hence, the goal of our research was to investigate mechanisms determining characteristics of non-classical inclusion bodies production using gene expression profiling, focusing on proteases and chaperones gene expression. Statistical analysis of microarray data showed prominent changes in energy metabolism, in metabolism of amino acids and nucleotides, as well as in biosynthesis of cofactors and secondary metabolites if the culture was grown below its optimal temperature. Moreover, 24 differentially expressed up to now known genes classified among proteases, chaperones and other heat or stress related genes. Among chaperones UspE and among proteases YaeL and YeaZ might play an important role in accumulation of correctly folded recombinant proteins. Membrane localized protease yaeL gene was found to have higher activity at 25 °C and is thus potentially functionally related to the more efficient recombinant protein production at lower temperatures. The results of this study represent advance in the understanding of recombinant protein production in E. coli. Genes potentially influencing production of recombinant protein at lower growth temperature represent basis for further research towards improvement of E. coli production strains as well as fermentation process. Keywords: Recombinant protein production, non-classical inclusion bodies, E. coli, expression microarrays, YaeL protease, GroEL chaperone 1. Introduction The „art nouveau" of the modern biopharmaceuti-cal industry is getting towards understanding of the mechanisms underlying recombinant protein production in different organisms. The obtained knowledge should further contribute to improvements of fermentation processes and thus gain economical advantages for the industry. Escherichia coli is the most widely used recombinant protein producing organism due to its ease of cultivation and fast production rate. Protein misfolding is a com- mon event during bacterial over-expression of recombinant genes.1 Incorrectly folded or misfolded proteins can appear as a result of cell exposure to the environmental stress, such as elevated temperatures and over-expression of recombinant genes. The resulting misfolded proteins may be degraded by proteases, fold by chaperones, or aggregated and sequestered as inclusion bodies (IBs).2 Hence, a common limitation of recombinant protein production in bacteria is the formation of insoluble protein aggregates known as IBs.3 It has been believed for a long time that IB proteins are biologically inactive and therefore undesired in bioprocesses.4 The potential of chaperones in assisting folding of misfolded proteins has been investigated from several aspects.5 On the other hand, it has already been reported that functional proteins could be easily extracted from IBs using non denaturing mild detergents and polar solvents provided that cultures were grown at lower temperatures.3,4,6-9 Such IBs, termed „non-classical" inclusion bodies (ncIBs) by Jevševar et al. (2005),7 are defined by containing large amount of correctly folded protein precursor produced in E. coli at lower temperature (around 25 °C). Compared to classical IBs they are characterized by higher fragility and solubility, irreversible contraction at acidic pH and most importantly, by a high amount of correctly folded target protein or its precursor.6 One of the most important recombinant proteins in the field of modern oncology is human granulocyte colony stimulating factor (hG-CSF) protein. Due to its regulatory role in the growth, differentiation, survival, and activation of neutrophils and their precursors, hG-CSF is central to neutrophil-based immune defenses1. Four types of hG-CSF are clinically available: a glycosylated form (lenograstim) produced in CHO cells, an N-terminal replaced nonglycosylated form of granulocyte colony-stimulating factor (nartograstim),10 and nonglycosylated form (filgrastim), both produced by using the expression in E. coli.1 In addition to aforementioned forms long acting form of filgrastim - PEGfilgrastim, a modified PEG-ylated filgrastim enabling less frequent administration has been available since 2002. As shown before7 cultivation temperature was the most important variable affecting properties of hG-CSF IBs and thus its efficient production. Therefore the goal of our research was to investigate mechanisms determining characteristics of ncIBs production by comparing physiology of recombinant E. coli [BL21 (DE3)] at three different temperatures (T = 25 °C (suboptimal), 37 °C (optimal) and 42 °C (heat shock), respectively) using gene expression profiling approach. As formation of various proteases and chaperones under different temperature conditions was previously reported6-8,11 we have inspected behavior of these genes in more detail. 2. Experimental 2. 1. Cultures and Plasmids In this study the recombinant E. coli strain BL21 (DE3) (Novagen), carrying expression plasmid pET3a without hG-CSF insert (control strain) or with hG-CSF insert [Fopt5] (production strain) was used. hG-CSF insert ([Fopt5]) was prepared as described in7. 2. 2. Culture Conditions Bacterial inoculum of the production and control strain was prepared in a shake flask culture and grown overnight at 25 °C and at 160 rpm in the LBPG/amp100 medium7. After reaching optical density of OD600nm ~ 4 the inoculum was transferred to the GYSP medium and immediately induced with IPTG. The cultures were then incubated in shake flasks at 160 rpm and at three different temperatures (i.e. T = 25 °C, 37 °C and 42 °C, respectively) until the appropriate culture's optical density (OD), indicating the transition from the exponent to the stationary phase was reached (OD600nm ~ 10 for the culture grown at 25 °C and OD600 nm ~ 4 for the cultures grown at 37 °C or 42 °C).7 At that point cultures were stabilized with RNA protect Bacteria Reagent (Qiagen), aliquoted, centrifuged and the bacterial pellet was stored at - 80 °C for further RNA and protein expression analysis. The cultivation experiment was repeated 3 times, thus yielding 18 samples altogether. 2. 3. Isolation of Total RNA and DNase Treatment RNA isolation and DNase treatment was performed as described by Petek et al. (2010),12 except for substituting lysostaphin with lysocyme (500 mg/ml) in the cell lysis step. RNA quality, quantity and integrity were checked by NanoDrop (NanoDrop Technologies, USA), gel elec-trophoresis and Bioanalyzer (Agilent Technologies). 2. 4. Microarray Hybridization Purified RNA (approximately 30 pg) was used for the cDNA synthesis and direct labeling (Superscript II, In-vitrogen). Luciferase control mRNA (1 ng/pg; Promega) and 3 pg of random primers were added to each RNA sample. This was followed by 10 minutes of incubation at 70 °C and immediate chilling on ice. cDNA synthesis was carried out using SuperScript II reverse transcriptase (In-vitrogen) according to manufacturer's instructions. Synthesized cDNA was purified using MinElute PCR purification Kit (Qiagen). The concentration of cDNA, efficiency of dye (Cy5) integration and integrity of labeled cDNA were checked by NanoDrop and gel electrophore-sis. Pre-designed oligonucleotide microarrays (Custom-Array™ 12K Microarray, Combimatrix Corporation) containing 12,000 features arrays of complete E. coli genome (4,200 genes, positive and negative control sequences) were used. Labeled cDNA was hybridized to the arrays according to the protocol recommended by CombiMatrix except for using 2X formamide based hybridization buffer (Genisphere) containing Salmon testis DNA (1 pg / pl, Sigma) and shorter hybridization time (1h). 2. 5. Microarray Imaging and Data Analysis After hybridization semiconductor microarray surfaces were covered by imaging solution and were scanned using a fluorescence LS200 scanner (TECAN).13 Combi- matrix Microarray Images Software was used for image analysis and quality control. Further data analysis was pe-formed in R software environment for statistical computing and graphics (http://www.r-project.org/). Bioconduc-tor's packages affy, limma and KEGGsoap were used for quality control, preprocessing, statistical significance testing14 and annotation. The data was normalized using the quantile normalization. Intensities of the factory-built in control probes were compared to information on Combi-matrix FAQ internet site to confirm the validity of the chosen preprocessing approach. Differentially expressed transcripts were functionally analysed according to Gene Ontology (GO) using GSEA15. Normalized data was further analyzed for significance using the linear models with different contrast settings and empirical Bayes (p<0.05). To minimize the possibility of false positive results, all log2 values of gene expression ratios between -0.5 and 0.5 were considered not relevant and were excluded from the data interpretation16. The final results were expressed as log2 values (logFC) of the ratios between the mean expression in sample groups. Moreover, the dataset was defined by GeneID identification tags for easier access to different knowledge databases. Differentially expressed genes (DE) were visualized in the EcoCyc database (http://www.ecocyc.org/expression.html) that allows representation of the obtained results on the metabolic and signaling pathways. The microarray data have been deposited in NCBI's Gene expression Omnibus and are accessible through GEO Series accession number GSE25561 (http://www. ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE25561). 2. 6. Real-Time PCR Six DE genes yeaZ, ydcP, groEL, ecpD, torD and uspD (yiiT) were selected for real-time PCR analysis based on TaqMan® MGB™ technology17. 16S rRNA was used as the reference gene. Gene specific sequences were chosen for assay design using NCBI BLAST (http://blast. ncbi.nlm.nih.gov/Blast.cgi). The assays were designed by Applied Biosystems, primer and probe sequences are listed in Table 1. Total RNA (approximately 3 ^g) was reverse transcribed using High Capacity cDNA Reverse Transcription Kit (Applied Biosystems) according to manufactures instructions. Real-time PCR reaction were set up as described in Petek et al. (2010)12 in ABI PRISM 7900 HT Fast Sequence Detection System (Applied Biosystems) using 5 |jl reactions and standard cycling parameters. Data quality control and analysis was performed as described in Petek et al. (2010).12 For the purpose of comparison of qPCR and microarray data, the results were expressed as the log2 of the ratio between relative gene expressions at two different temperatures for the control and production strain and as the ratio between relative gene expressions in control and production strain grown at the same temperature. Statistical significance of differences in gene expression was calculated using the same model as in analysis of microar-ray data.13 2. 7. Western Blotting SDS-PAGE was performed using a 4-12% Nu-PAGER Novex Bis-Tris gel (Invitrogen) according to manufacturer's protocols. Prior to electrophoresis all samples were resuspended in 10 mM TRIS/HCl (pH = 8.0) buffer and diluted according to their final OD to obtain similar sample loads. Samples were further treated by addition of NuPAGER LDS sample buffer, denatured for 10 minutes at T = 70 °C and applied to gel. Electrophoresis was performed at 200 V, 125 mA for 40 minutes at room temperature. Proteins that were separated with SDS-PAGE were afterwards transferred onto the Nitrocellulose membrane by using iBlotTM Dry blotting System (Invitrogen). Immunodetection was made by primary antibody (GroEL antibody mouse Monoclonal, IgG1, Antibodies-online GmbH) followed by secondary antibodies (Anti-mouse IgG - HRP, Sigma). Colorime-tric detection was achieved with addition of detection solution (mixture of solution A (15 mg of 4-chloro-1-naftol dissolved in 5 ml of Methanol) and solution B (15 ml of H2O2 added to 25 ml of TBS (pH = 7.5). At the end the membrane was imaged and obtained images were further analyzed by ImageJ program (Image Processing and Analysis in Java, http://rsbweb.nih.gov/ij/) used for optical density measurements. We semi-quantified all proteins from electrophoresis gel images and GroEL from western blotting membranes images and determine the relative GroEL content in different samples. Table 1: Primer and probe sequences used for real-time PCR Gene Forward primer Reverse primer Probe yeaZ CGCTGATATTCGGCCCAGTAAA GTGCTGGCAGCCATTGAC CTTCGCCCATTCGCG ydcP GATATTGGCGCGTTCGATTCG GAGATGATCTTTCGCCACTTTCAAT CAGGCCGATAAATTT groEL GCAACTCTGGTTGTTAACACCAT TGCAGCATAGCTTTACGACGAT AACCGCAGCGACTTT ecpD GAACACGCTCTCTCTGTCTTTAGG CAAACGTGGGCAAACAATCAAATT ACAGCCAGCACCTCAC torD ACAGGACGAGCAAGAGATTAAACG CGTTGAAATTGCCGCTGGTTT CCCTGCCTCAACTAAC uspD CGGATAACAAGCTGTATAAACTGACGAA GCATTTCTCCGCGTTCAATACG TCGGCCATTGAATATT 16S rRNA GGAGTACGGCCGCAAGGT CATGCTCCACCGCTTGTG AAAACTCAAATGAATTGACG 3. Results It is known that cultivation at T = 25 °C yields high amounts of correctly folded hG-CSF protein within the IBs7. 98% of the expressed hG-CSF was in the form of IBs and only 2% was produced in the cytoplasm.8 In these conditions, recombinant hG-CSF protein accumulated to yield 35% to 40% of the total proteins of E. coli, almost 50% of hG-CSF extracted from the IBs showed biological activity.7,8 In order to investigate the underlying mechanisms of this phenomenon the experimental design was set to profile E. coli gene expression at different cultivation temperatures. Therefore, control (recombinant strain carrying empty expression plasmid) and production strain (recombinant strain carrying expression plasmid containing hG-CSF) were incubated at three different temperatures, 25 °C, 37 °C and 42 °C, respectively. The cultures were sampled just before transition into stationary state, when the cells were still fully viable and already under stress of recombinant protein production. At this point maximal plasmid copy number was determined as well as maximal productivity was achieved - accumulation level of hG-CSF reached plateau.7,18 After initial data analysis special attention was given to the temperature dependent changes in expression levels of proteases, chaperones and other heat or stress related genes. 3. 1. Overview of Identified Gene Expression Differences Linear models were set to identify DE genes in comparisons of the following group pairs: control strain cultures grown at 37 °C compared to 25 °C (C37_25), control strain cultures grown at 42 °C compared to 25 °C (C42_25), production strain cultures grown at 37 °C compared to 25 °C (P37_25), production strain cultures grown at 42 °C compared to 25 °C (P42_25) as well as comparison of control and production strain cultures grown at 25 °C (P_C25), 37 °C (P_C37), 42 °C (P_C42). 282 DE genes showed statistically significant differences in expression levels if comparing control strain cultures grown at T = 37 °C and T = 25 °C and 226 DE genes if comparing control strain cultures grown at T = 42 °C and T = 25 °C. In production strain 28 and 34 DE genes showed changes in expression levels at T = 37 °C and T = 42 °C compared to T = 25 °C, respectively. Comparison in gene expression profiles of control and production strain grown at the same cultivation temperature identified on average 113 DE genes; i.e. from 74 to 161 DE genes at all studied temperatures. Additionally, two-way analysis of variance was performed to test for the effects of recombinant protein production (P_C) and growth temperature as well as interaction of both factors (Supporting information 1). The percentage of DE genes identified was similarly up to about 7% (i.e. up to about 305 DE genes) as in pair-wise comparisons (Fig.1). When comparing control S3 -a -a ¿»I "T3 co o a eg c T3 S3 ftfto HHrS OOoi • >3 a 2 ^ 13 £ S3 o Q ¡3 (N CO t- r- CO CO o o o o o o -■tino rninm ooo ooo ooo << o no o NOOO ooo ooo ooo Ü 3 •■ö T3 o o <3 f^ . S $ ft O CO CO co in (N -it in in CD CD O O -D S3 § s i3 <1 a S3 o ) u c '<3 ¡fig O c G u So 00 "-Ö u C3