Essentially leading antibody production: An investigation of amino acids, myeloma and natural V-region signal peptides in producing Pertuzumab and Trastuzumab variants

Boosting the production of recombinant therapeutic antibodies is crucial in both academic and industry settings. In this work, we investigated the usage of varying signal peptides by antibody genes and their roles in recombinant transient production. Comparing myeloma and the native signal peptides of both heavy and light chains in 168 antibody permutation variants, we performed a systematic analysis, finding amino acids counts to be involved in antibody production to construct a model for predicting co-transfection transient recombinant antibody production rates using the HEK293 system. The findings also provide insights into the usage of the large repertoire of antibody signal peptides.


Introduction 31
Signal peptide in antibody production 2 This is a provisional file, not the final typeset article The signal peptide (SP) of a protein is a short tag of amino acids at the N-or C-terminal that 32 predestinates the protein location extracellularly or within the cell to the organelles. Known organelle 33 targeting SPs include the nucleus localization (Kalderon et al., 1984) or export signal (Wen et al., 34 1995), mitochondria signals (Maccecchini et al., 1979), endoplasmic reticulum (ER) secretion 35 (Blobel and Dobberstein, 1975) or retention signal (Stornaiuolo et al., 2003)), and peroxisome signals 36 (Gould et al., 1987). 37 Secreted by plasma B cells, antibodies are tagged with the ER secretion signal/SP at the N-terminal 38 and translocated into the ER lumen (Walter et al., 1981), before being passed to the Golgi apparatus 39 and sorted into secretory vesicles for extracellular secretion (Beams and Kessel, 1968). The SP is 40 unique to the protein and generally contains a positively charged N-terminal, followed by a 41 hydrophobic region and a neutral polar C-terminal (von Heijne and Gavel, 1988). Depending on its 42 location at the N or C terminal of the protein, there is a cleavage site separating the SP from the 43 protein that would be recognized by a signal peptidase (Milstein et al., 1972). While secretory SPs 44 are involved in the co-translational co-translocation pathway (Blobel and Dobberstein, 1975) with the 45 primary function to export proteins (Kapp et al., 2000), it remains enigmatic to why antibodies utilize 46 a large repertoire of SPs (e.g. Vκ1 family with 22 SPs, VH3 family with 50 SPs, retrieved from the 47 IMGT database (Giudicelli et al., 2005) at the point of writing), hinting of possible roles in antibody 48 production. 49 Modifying SPs for better recombinant protein production has been largely successful in many studies 50 (Haryadi et al., 2015;Huang et al., 2017;Ohmuro-Matsuyama and Yamaji, 2018;Zhou et al., 2016), 51 however, the underlying mechanism of such effects remains enigmatic. With recent studies 52 demonstrating cross-talks of antibody elements, where the constant regions (Lua et al., 2018;Su et 53 al., 2018), variable regions and their pairings (Ling et al., 2018;Lua et al., 2019) affect antibody 54 production and function, there is increasing evidence that antibodies ought to be investigated 55 holistically (Phua et al., 2019), especially as therapeutics (Ling et al., 2020

Recombinant antibody production 68
All VH and Vκ sequences used were described previously (Ling et al., 2018) and codon optimized 69 using GenSmart™ Codon Optimization (Genscript) online software, followed by synthesis by 70 Eurofin (Japan) to avoid codon usage bias. The genes were transformed into competent E. coli 71 (DH5α) strains (Chan et al., 2013) followed by plasmid extraction (Biobasic Pte Ltd) and sub-cloning 72 into pTT5 vector (Youbio) using restriction enzyme sites, as previously performed (Ling et al., 2018;73 Lua et al., 2018;Lua et al., 2019;Su et al., 2017 2011)) was used in a 20-fold cross validation process to fine-tune the regularization strength. A 107 weighted precision scoring function was used to evaluate the model performance and the categorized 108 classes (low, medium, high) were weighted to counter the slight imbalance in the dataset, i.e. 65% 109 low, 70% medium, and 33% high production samples. The optimization process was performed in 110 1000 iterations. 111 This is a provisional file, not the final typeset article The probability of each categorized production label (class label) of each antibody variant was 112 calculated using the equation below: 113 where: 115 ( ) is probability of the predicted class label yk (i.e. the production level) of variant i, with k = 0 116 (low), 1 (medium), 2 (high) 117 0 is the corresponding intercept of class k, with β0 = [-17.88, 14.62, 3.25]  2018) tend to be higher than those using native SPs (e.g. highest producing pair using the IgE SP: 142 Vκ4|VH3 at 125%, while the highest producing pair utilizing their respective SP, were Vκ3|VH7 at 143 80%) with exceptions (Vκ1|VH1, Vκ2|VH1, Vκ1|VH4, Vκ2|VH4, Vκ1|VH7, and Vκ2|VH7) . 144 Regardless of the SP, antibodies paired with Vκ5 family constructs gave poor yields ( Figure 1C, 145 sorted by light chain families). 146 Of the Trastuzumab variants ( Figure 1B & D), there is general agreement to the trends observed in 147 the Pertuzumab dataset where IgE SP variants generally having higher production rates than native 148 SP variants. The highest producing antibody with IgE SP is Vκ4|VH3 at 235%, with its 149 corresponding counterparts with native SPs at 94%. The only exceptions where the native SPs had 150 higher productions were the Vκ1|VH5 and Vκ1|VH7 pairs. 151 In the Pertuzumab dataset, the Vκ5 family is the sole poor producing family, whereas the low 152 production families in the recombinant Trastuzumab model dataset ( Figure 1D) extended to VH1, 2, 153 4 and 6 (light chains that paired with these VHs had lower yields). One notable exception was the 154 Vκ5|VH3 pair that was produced at higher levels compared to other Vκ5 family permutations in the 155 Trastuzumab dataset ( Figure 1D). 156 VH1 and VH7 genes shared the same SP (as classified in IMGT). amino acid sequence but with 157 different codons. While this was normalized through codon optimization, there were still distinct 158 different productions between the two VH families, with VH7 being the better producing partner 159 (average production of VH7 with its light chain partners at 48% when using native SPs and at 58% 160 when using IgE SP) compared to VH1 (at 37% when using native SPs and at 48% when using IgE 161 SP) in the Pertuzumab dataset. The effect between VH1 and VH7 was even more pronounced in the 162 Trastuzumab dataset, where VH1 recombinant production level were significantly reduced (at 7% 163 when using native SPs, and 20% when using IgE SP) as compared to VH7 (at 37% when using native 164 SPs, and 110% when using the IgE SP). 165 production at 48% and 87% for the Pertuzumab and Trastuzumab models, respectively. The Vκ1 SP 174 (Vκ1SP-Vκ1, Vκ1SP-VH3) had the highest production at 112% and 240% in the Pertuzumab and 175 Trastuzumab models, respectively. 176 177

Myeloma SPs production rates 178
To study the effect of myeloma SPs on improving production levels, we performed single amino acid 179 mutagenesis (P18R and P18S) on the Vκ1 SP. The mutated SPs were generated based on a previous 180 reported myeloma Vκ1 SP associated Fanconi's syndrome SP (Rocca et al., 1995), and the IgE SP 181 sequence, also from a myeloma patient (Nilsson et al., 1970 productions compared to the native Vκ1 SP, but the P18S SP Trastuzumab had notably higher 185 recombinant production than the wild-type IgE SP. 186

The roles of essential and non-essential amino acid on antibody production 187
This is a provisional file, not the final typeset article With no clear correlation between production levels and the SP used, the content of the SPs was 188 investigated. EAAs were found to make up half of the SP length (Table 1) with general varied usage,  189 with some being more prominent, e.g. leucine (L). 190 Light chain SPs tend to be longer in length, more homogenous (4-8 EAA) with lower EAA counts 191 (10-12 amino acids) than the heavy chain SPs. This involvement of EAAs may elucidate the 192 observations that the Vκ1 SP yielded better production rates than those of the IgE SP (Figures 1 & 2), 193 which in turn, yielded better production than the native SPs. 194 Analysis of EAA content (Table 1) demonstrated that native SPs consisting of higher EAA counts 195 result in lower production levels compared to IgE or Vκ1 SPs (Figure 1 & 2). 196 family. We observed that higher counts of L, arginine (R) and lower counts of I, lysine (K), and 202 aspartic acid (D) may account for the increased production in the Vκ3 family. Higher counts of 203 tryptophan (W) may also account for higher production in general. 204 Analysis of the Trastuzumab variants ( Figure 5) showed similar trends in amino acids usage due to 205 the high similarities of CDRs between Trastuzumab and Pertuzumab. This extended to the poor 206 production in Vκ5 of both Trastuzumab ( Figure 5) and Pertuzumab (Figure 4). Higher counts of D 207 appear to improve production in Trastuzumab Vκ4 variants, contrary to Pertuzumab Vκ3 variants. 208 Higher counts of Tyrosine (Y), Proline (P), Glycine (G) and lower counts of Glutamine (Q) are 209 associated with better production in our Trastuzumab repertoire. 210 211

The effects of amino acid supply on recombinant antibody production 212
Since the culture media is the predominant nutrient source in transient recombinant protein 213 production, we analyzed the amino acid constituents and the demand for the respective amino acids. 214 For simpler analysis, we deemed batch variations involving co-transfection procedure variations and 215 serum differences to be negligible. Based on the DMEM formulation (Sigma Aldrich, Cat no. 216 D1152), W was found to be the limiting amino acid (0.47 x 1020 molecules) compared to other 217 EAAs, restricting our maximum production to 1.88 x 1018 of antibodies (a representative average of 218 amino acid usage from all the antibodies in this study, see Supplementary Table 1-2). 219 Five NEAAs: A, D, E, N and P were not provided for in DMEM probably assumed to be sufficient 220 from internal cell synthesis. However, three NEAAs: S, D, and Glutamic acid (E) are precursors to 221 Cysteine (C), G, N, P, Q, and R but were also absent in the media, making these amino acids 222 potential limiting factors. 223

A statistical predictive model of antibody production rate 224
To demonstrate the noticeable involvement of varying SP and the amino acid counts, we constructed 225 a statistical model to computationally predict antibody production rates based on our co-transfection transient HEK293 cell system. The logistic regression model is based on the data in this study and 227 includes other antibodies from our previous work (Ling et al., 2018) for better machine learning. 228 The prediction scores of AUC ~0.79-0.95 ( Figure 6) and F1-score 0.62-0.7 (of which 1 reflecting the 229 best balance between precision and recall) depicted a reasonable prediction model of production rate 230 categories (low, medium, or high) using various signal peptide sequences. 231

4
Discussion 232 We investigated the effects of the numerous antibody signal peptides (SP) on recombinant antibody 233 production in a co-transfection transient system using HEK293 cells, normalizing with the same 234 transfection agents and backbone plasmids to varying only the signal peptides. We found that the IgE 235 SP generally gave better yields than the native SPs ( Figure 1). 236 To study possible effects of SPs from heavy and light chains (Haryadi et al., 2015), we grafted the 237 Vκ1 SP on the wild-type Vκ1|VH3 of the recombinant Pertuzumab and Trastuzumab (Figure 2) for 238 comparison to IgE SP and the respective native SP counterparts. Both IgE and Vκ1 SPs yielded 239 better productions. With the possibility that myeloma SP might give better production, we studied 240 another myeloma-linked SP -a variant of Vκ1 SP -with P18R or P18S different from the Vκ1 SP in 241 IMGT (Rocca et al., 1995). 242 Grafting of the myeloma Vκ1 SP onto the wild-type Vκ1|VH3 recombinant Pertuzumab and 243 Trastuzumab models showed that mutation P18S increased production only in the Trastuzumab 244 model, while the other mutation had significantly lower production than the native Vκ1 SP ( Figure  245 3 hydrophobic center region of the SPs was shown to be conserved, while the N-and C-terminal 248 regions were more distinctive. Given these findings, we did not find support for the role of SP in 249 hyperglobulinemia pathogenesis within myeloma, and that there remains an enigmatic relationship 250 between the SP and antibody production. 251 Analyzing the nine essential amino acid (EAA) content in SP sequences, especially given that 252 previous experiments showed that the addition of supplements may be helpful in protein production, 253 we found that light chain SPs had a lower variety and number of EAAs than the heavy chain SPs 254 (Table 1), providing a clue to their possible contribution to recombinant production. 255 Extending the essential and non-essential amino acid analysis (Figure 4 and 5) to both full-length 256 Pertuzumab and Trastuzumab variants, we found that the presence of F, H, I, A, N, L and S amino 257 acids may underlie the poor production of Vκ5 Pertuzumab and Trastuzumab variants. On the other 258 hand, L, R, I, K and D amino acids may allow for better production as seen with specific variants 259 paired with Vκ3 (Pertuzumab) and Vκ4 (Trastuzumab). 260 Considering DMEM formulation ( Table 2) on assumptions of negligible differences between batches 261 of serum and co-transfected cell batches, we found that DMEM media had insufficient amino acids to 262 support optimal antibodies production, especially since EEA: W, and NEAAs: A, D, E, N, and P, 263 were not provided at all nor present in inadequate quantities. This thus explains the boost in 264 This is a provisional file, not the final typeset article production when supplements such as peptone and casein are added (Davami et al., 2014;Heidemann 265 et al., 2000;Michiels et al., 2011;Pham et al., 2005). 266 Our predictive model based on amino acid accounts ( Figure 6) was able to predict the ordinal 267 production levels (low, medium, or  With amino acid counts able to predict antibody production rates, the underlying reason that antibody 273 genes utilize such a big repertoire of SPs may be rationalized. It is possible that the repertoire may 274 serve to mitigate over reliance of specific EAAs that impact antibody production. Considering that 275 the hypervariable antibody VDJ genes (Tonegawa, 1983) utilize a wide permutation of the amino 276 acids, a fixed signal peptide for all antibodies can increase the probability of heavy bias towards 277 specific amino acids. Such biases can thus create bottlenecks to certain EAA(s) and hamper not only 278 antibody production, but also essential cellular protein production. Therefore, by varying the SPs 279 with differing EAA content, such bottlenecks of EAAs may be mitigated, thus explaining a function 280 of the large repertoire of SPs in antibody genes. 281 In conclusion, the study of antibodies SP with EAA factors provides a new approach on the 282 understanding of transient mammalian protein production and possible insights to the repertoire of 283 SPs utilized by antibodies. These new EAA factors could have potential for wide-spread application 284 to other production systems in a better understanding of protein production. 285 286

Conflict of Interest 287
The authors declare that the research was conducted in the absence of any commercial or financial 288 relationships that could be construed as a potential conflict of interest. 289 promises of a holistic view of proteins-impact on antibody engineering and drug discovery. 366 Bioscience Reports 39. 367 Rocca,A.,Khamlichi,A.A.,Touchard,G.,Mougenot,B.,Ronco,P.,Denoroy,L.,Deret,S.,368 Preud'homme, J.L., Aucouturier, P., and Cogné, M. (1995). Sequences of V kappa L subgroup light 369 chains in Fanconi's syndrome. Light chain V region gene usage restriction and peculiarities in 370 myeloma-associated Fanconi's syndrome. J Immunol 155, 3245-3252. 371 Stornaiuolo, M., Lotti, L.V., Borgese, N., Torrisi, M.R., Mottola, G., Martire, G., and Bonatti, S. 372 (2003). KDEL and KKXX retrieval signals appended to the same reporter protein determine different 373 trafficking between endoplasmic reticulum, intermediate compartment, and Golgi complex. Mol Biol 374 Cell 14, 889-902. 375 Su, C.T.-T., Ling, W.-L., Lua, W.-H., Poh, J.-J., and Gan, S.K.-E. (2017) S. (1983). Somatic generation of antibody diversity. Nature 302, 575-581. 382 von Heijne, G., and Gavel, Y. (1988). Topogenic signals in integral membrane proteins. Eur J 383 Biochem 174  axis) and amino acid counts (line, right axis) charts for the comparison to amino acids counts in 408 recombinant production of Pertuzumab variants. The green arrows/boxes refer to an increase in 409 production level while the red arrows depict a decrease in production levels. 410 This is a provisional file, not the final typeset article production in Trastuzumab variants. The green arrows/boxes refer to an increase in production level 413 while the red arrows depict a decrease in production levels. 414 Figure 6. Logistic regression model to predict the recombinant antibody production rate using amino 415 acid counts. The prediction is evaluated using Area under the ROC (AUC) in triplicates, each with 416 two independents (a, b) testing datasets. The 'dummy' classifier is used as a baseline control. 417 418