The soft ionisation revolution in mass spectrometry took place in last decade of 20th century. After that, mass spectrometry quickly rose to prominence in the life science laboratories. The paradigm of the pharmaceutical R&D has also changed over the same time and biological drugs—generally called ‘big molecules’— are brought to market, a lot many are expected in coming decades. Thus, it is hardly a coincidence that the role of mass spectrometry in biological drug discovery is increasing. It is already an indispensable tool in biopharma R&D and it is on its way to occupy the centre stage of this industry.
The protein drugs constitute almost everything of the biopharma today. This include signalling proteins (e.g. insulin, erythropoietin), monoclonal antibodies (e.g. Bevacizumab, Rituximab) that are used as drugs, peptide drugs (e.g. Liraglutide, Icatibant), antibodydrug conjugates (e.g. Trastuzumab emtansine), modern recombinant protein vaccines (e.g. Tetanus toxoid) etc. The production of these protein drugs is majorly done through manipulations of biological processes and entities. Since the systems that are used for this purpose are extremely complex, delicate and sensitive living forms, the drug development process is highly complicated in nature. Characterisation of these protein drugs, their variants and impurities is far more arduous as compared to small molecule drugs. Further, pre-clinical and clinical studies of these protein drugs are laden with problems that are unique to this class of therapeutics. Mass spectrometry based proteomics is a technology that helps in all these phases and it is an only available platform for addressing many of the roadblocks.
The objective of this article is to summarise the different mass spectrometry-based workflows that are currently being used during the R&D of the therapeutic proteins. Although soft ionization mass spectrometry has been commodiously used to analyse the carbohydrates, lipids, nucleic acids and small metabolic molecules, we shall focus on the protein-related applications.
A protein molecule is nothing but one or more polypeptide chains folded in a specific way. The polypeptide chains are unique sequences of amino acid residues produced by living cells via process of translation. Some post-translational modifications are incorporated and the proteins are ensured to be folded to their respective functional state. Sometimes the finished protein product is artificially processed after recovery to include some more chemical changes in the molecule. Thus the characterisation of the therapeutic protein product comprises following aspects;
This is usually called ‘intact mass’ to differentiate from the ‘reduced mass’ (explained later). It is the exact molecular mass of the protein. This analysis quickly provides information on the heterogeneity of the product. It can also be used to determine whether and how much of the intended protein is present in the given sample. Intact mass analysis is performed using time-of-flight (TOF) analysers. The ionisation method could be MALDI or ESI. The data obtained from ESI-TOF generally needs further software processing to find the intact mass. The sample intended for intact mass analysis should be reasonably pure. Higher is the number of proteins in the sample, lower is the probability that the protein of interest gets properly ionised and identified.
When the protein is composed of more than one polypeptides hold together, the masses of all the polypeptides can be separately checked. The polypeptide chains are covalently held together by disulphide links. These links are broken using a reducing agent (hence the name ‘reduced mass’) and alkylated so that their spontaneous re-formation can be avoided. Once the polypeptide chains are separated, the intact mass of the chains can be assessed using the TOF analysers as explained earlier.
The complete amino-acid sequences of all the polypeptides present in the protein under enquiry are determined using the peptide mapping technique. The protein is reduced, alkylated, digested with a site-specific protease, and the peptides generated from the protease activity are extracted. These peptides are separated through reversephase chromatography (RP-HPLC) and fed online to ESI mass spectrometer. Then the peptides are subjected to fragmentation in collision cell and these fragments’ masses are reported in the form of a ‘fragment ion spectrum’. This technique belongs to the ‘bottom-up proteomics’ philosophy. Since this is a tandem MS method the systems capable of tandem MS are to be used for peptide matching. There are variety of mass analysers that can be used for this analysis which include various ion traps and TOF. The method described here is the mainstream technique. Many variations have been and can be successfully attempted using ‘top-down’ approach, using offline coupling of RP-HPLC to MALDI based machines and using different kinds of peptide separation methods other than RP-HPLC. The data obtained from peptide mapping is extremely complex and huge. Specialised analysis software has to be used for processing the raw data and obtain the amino-acid sequence of the protein. The sequence of the protein is to be added to the sequence database of the software. The software processes that sequence in siilico to get theoretical digest of the protein and the theoretical fragment ion spectra for all the peptides. The experimentally obtained data is matched with the in silico data and the identification is done. With the advent of technology, new software have been being developed which are capable of analysing bigger data, more number of proteins, and better models of fragment ion spectrum matching.
Identification of locations of interchain and intra-chain disulphide links is a highly recommended since they define the folding of the protein to a large extent. The correct folding of the protein is required for its biological activity. The protein under enquiry is digested using a site-specific protease without reducing the disulphide links. This digestion yields di-peptides held together by disulphide links along with other peptides. The peptides and di-peptides thus formed are subjected to tandem mass spectrometry as described in peptide mapping section. The fragment ion spectra of the disulphide containing di-peptides are analysed using software tools to get an idea of the disulphide link locations. The software used for peptide mapping is equipped with the disulphide identification as well. If the expected locations of disulphide links are already known (as in case of bio-similar development), there are specialised software available to suit to their speedy analysis. Although technically disulphide links are one of the post-translational modifications (PTMs), their mapping is separately covered since the methodology used in that assay is different than that used for other PTMs.
Several other PTMs are, for example deamidation, phosphorylation, addition of carbohydrate or lipid moieties etc can be efficiently analysed by mass spectrometry. The general workflow resembles to that of peptide mapping, but the in silico processing is quite challenging. For some modifications like N-linked glycans, the glycans are enzymatically separated from the protein and are analysed separately to identify the set of all the glycans present. This information is subsequently utilised while analysing the peptide mapping data, and the locations of each type of glycans on the protein can be identified. The approaches to be taken for PTM characterization are as varied as the PTMs.
Quantisation of drug in biological matrices for the checking of bioequivalence (BE) and bioavailability (BA) using triple-quadruple mass analyser is a standard practice in clinical studies. Performing similar study on therapeutic proteins is ridden with various problems. Being a protein, the drug may interact non-specifically with other components interfering the quantisation. Mass spectrometric identification of an intact protein present in a complex mixture of other proteins is not feasible. Thus this quantisation is performed by a ‘signature peptide’ approach. A signature peptide is a peptide of the protein under enquiry which is formed during its digestion using site-specific protease, and is uniquely different from the other peptides present in the digests of proteins in that biological matrix. The choice of signature peptide also needs consideration of its chemical nature, ease of ionisation and the reproducibility of the quantisation. The approach requires both, software support and user’s conjecture. The quantisation experiment consists of digesting the entire biological matrix with site-specific protease and running it onto RP-HPLC coupled with ESI based mass spectrometers. The biological matrix may be pre-processed to enrich the protein of interest and remove some unwanted high abundant proteins before the digestion. The calibration curve is obtained by spiking the synthesised sequence of signature peptide in blank matrix in various concentrations. The assay broadly follows the quantisation principles of ELISA experiments. The analysis too is performed in a similar way using the peak areas of the signature peptide in total ion chromatogram. This assay can be performed using almost all routinely used mass analysers, including the triple quadrupole analysers used ubiquitously in the traditional small molecule drugs BA/BE sector.
Study of drug-target interactions can be performed using special applications like hydrogen-deuterium exchange (HDX) and native mass spectrometry. In fact HDX can be used to certain extent to study folding of a given protein too. Identification of interacting partners of the protein under enquiry can be performed by co-immunoprecipitation / pull-down assay followed by mass spectrometry based identification of the proteins. This approach is ordinarily called 'interactomics'. The field has a lot of scope of development and currently this potential of mass spectrometry is largely unutilised. More automation and software support will spice up structural studies on therapeutic proteins using mass spectrometry.
High-throughput proteomics is useful in the initial exploratory studies for identification of potential protein therapeutics. The purpose of such exploratory studies can also be identification of biomarkers for specific disorder, or studying of effects of certain factor on the proteome of the model. Such studies are more common for fundamental research in academic scenario. In this approach, a large number of proteins from given biological matrix or complex sample are identified and quantified at a very rapid rate. The collection of data normally takes few hours followed by in silico analysis. Given the increasing capacity of new generation mass spectrometers and analysis softwares, few thousands of proteins can be identified and quantified in a single experiment. Such an enormous amount of data is usually viewed on the background of the overall molecular functioning of the model system, hence it requires a lot of bioinformatic tools and expertise to make sense out of it. The quantification of the proteins in high-throughput mode can be performed using either label-free mode or using labelling technologies. The most widely used labelling methods are isobaric chemical labels and isotope based labels. As of now, orbitrap mass analysers are unmatched for such exploratory studies, but linear ion traps also have been used for this purpose with reasonable success before the orbitraps hit the proteomics labs. Currently high-throughput proteomics and big-data intensive discovery is a gold mine of therapeutic and diagnostic molecules.
The proteins from the host system in which the recombinant therapeutic protein is synthesised, need to be completely got rid of in the final preparation. The purification process development adopts variety of ways depending upon the nature of the contaminants. It is helpful to know the identity of the contaminating host proteins at every step to decide the further course of purification. Using mass spectrometry based high throughput identification the host protein contamination can be determined accurately and sensitively.
All the approaches described so far are based on an assumption that the sequence of the protein under enquiry is known. In case of high-throughput proteomics, the database of the entire proteome is used. But there are instances when the sequence of the protein or peptide is unknown or partially known. The screening and discovery of antimicrobial peptides, neuropeptides, non-ribosomal peptides, insect/reptile venoms, impurities in chemically synthesised peptides need de novo sequencing for their sequence identification. De novo sequencing means the identification of the sequence of the peptides directly from the fragment ion spectra, without sequence database. Many software tools have been developed dependent on different models of scoring of fragment ion matches. For a very few number of peptides a trained and experienced person can manually perform de novo sequencing. The power of this technique makes possible invention of new therapeutics from biological sources on which no information is available. For de novo sequencing, tandem MS and good quality of fragment ion spectra are must. Different mass analysers may be used as per the chemical nature of the peptide under enquiry.
In spite of being a powerful technology, mass spectrometry has not yet occupied as much space it deserves. The main hurdle is a high cost of purchasing and maintaining mass spec instruments and analysis software. Besides, the training of manpower in biological mass spectrometry takes far longer time as compared to the traditional techniques. Many of the quantitative analyses could be performed using mass spectrometry with more defined way and more refined data output as compared to the traditional techniques. But the bio-similar development industry tends to prefer traditional techniques over the modern ones as they want to match their data with innovator's data. Still, mass spectrometry based assays is a growing trend in R&D of new therapeutic proteins. With the development of more and more therapeutic proteins, mass spectrometry is set to become a most important tool in bio-pharma research.