Demystifying Biomarker Discovery

Leticia Cano,  IRTA Postdoctoral Fellow, Laboratory of Applied Mass Spectrometry, National Heart, Lung and Blood Institute, National Institutes of Health, USA

The discovery of a disease biomarker using mass spectrometry involves careful planning and strategy. The article describes how a rheumatoid arthritis biomarker was identified and explains various steps in biomarker discovery.

Proteomics is an emerging field that is frequently used for clinical biomarker discovery. For this article, proteomics is defined as a mass spectrometry-based approach used to identify proteins. The biomarker discovery process is distinct from biomarker validation or the development of a clinical test. The development of a clinical biomarker incurs heavy expenditure and also requires relevant expertise and resources. The biomarker discovery process consists of choosing the best proteomic strategy, the right clinical samples, selecting the appropriate protein separations, performing the best mass spectrometry, analysing the data and verifying results with another set of clinical samples and assay. The strategy used to discover a biomarker in Rheumatoid Arthritis (RA) and the process of biomarker discovery are discussed below.

Rheumatoid arthritis biomarker discovery

A biomarker for RA was identified in the synovial fluid of patients with RA using a modified strategy previously used to identify cancer biomarkers. An Agilent immunodepletion column was used to remove six abundant serum proteins (albumin, antitrypsin, haptoglobin, IgA, IgG, and transferrin) then the depleted synovial fluid was fractionated using a Beckman ProteomeLab PF2D Protein Fractionation System. Proteins were fractionated in the first dimension by chromatofocussing and in the second dimension by reverse phase High Performance Liquid Chromatography. This protein fractionation strategy increased the chance of identifying the lower abundance proteins by separating them from the higher abundance proteins, which could mask their presence. The second dimension fractions were used to construct protein arrays on nitrocellulose membranes, which were used to test for the presence of autoantigens by analysing differential reactivity of RA and control sera. Only sera obtained from RA patients is thought to contain autoantibodies which bind to autoantigens. Fractions that tested positive when probed with RA serum but negative when probed with normal control serum were found in the first dimension fraction eluting at pH 5.63-5.45. In order to obtain enough material for mass spectrometry analysis, SF from nine different patients was pooled and used for a large-scale procedure. Fractions corresponding to the region that tested positive for RA serum binding in the protein array were digested with trypsin and analysed by LC / MS / MS. SEQUEST searches were performed using the SwissProt database limiting the search to tryptic peptides. From the protein array experiments, the candidate autoantigen specifically detected by the RA serum was estimated to be in fractions 20-22. Amongst other proteins, these fractions contained fibrinogen, a known autoantigen that can be citrullinated in vivo. Fibrinogen alpha (FIBA_HUMAN, SwissProt Accession # P02671) was identified in fraction 20 with 9 unique peptide hits (15 per cent sequence coverage) and in fraction 22 with 18 unique peptides (24 per cent sequence coverage). Peptides were only found originating from the centre of the fibrinogen protein (amino acids 250-599) corresponding to the alphaC domain of fibrin (amino acids 221-610).

Careful examination of the mass spectra assigned to arginine-containing fibrinogen peptides led to the assignment of a citrullinated peptide corresponding to fibrinogen 259-287. The mass calculated from the mass spectrum was one Dalton higher than the calculated mass for the unmodified peptide. The lower resolution MS / MS spectrum obtained with the ion trap part of the LTQ-FT did not allow determination of the exact location of the modification. However, a number of other fragments corresponding to parts of the peptide were observed in the spectra obtained with the Ion Cyclotron Resonance (ICR) analyser of the LTQ-FT, which provided precise mass measurements. There are four sites on the fibrinogen 259-287 peptide that could possibly be modified resulting in a mass shift of +1 Da. The peptide contains three arginines that can be citrullinated and an asparagine that can be deaminated to form aspartic acid. All fragments that did not show the expected tryptic cleavage at Arg 271 were one Dalton higher than the expected mass for the unmodified form. Trypsin does not cleave after citrullinated Arg residues. The conversion of Arg 271 to citrulline is consistent with the failure of trypsin to cleave at that site. As a final proof that the citrullination site was correctly assigned, the peptide corresponding to residues 259-287 was synthesised with and without the citrulline in position 271. Both the charge state distribution in the MS spectrum and the fragment masses in the MS / MS spectrum of the 3+ charge state of 271X matched spectral data obtained with the clinical sample.

To establish that the citrullinated fibrinogen 259-287 peptide was recognised specifically by RA patient sera, the two fibrinogen 259-287 synthetic peptides were tested in an ELISA. An additional citrullinated synthetic peptide, corresponding to profilaggrin 619-631 (FIL) with a citrulline substitution at Arg 625, was included as a control. The immobilised peptides were incubated with sera from RA, SLE, or healthy controls, followed by detection of bound antibodies by HRP-conjugated anti-human IgG, IgA and IgM antibodies and a colorimetric assay. Of 18 healthy control sera tested, two reacted to the 271R peptide, two reacted to the 271X peptide and 1 reacted to the FIL peptide. Of the 12 RA sera tested, four reacted to the 271R peptide, 10 reacted to the 271X peptide and three reacted to the FIL peptide. Of the 10 SLE sera tested, one patient reacted to all three peptides. The number of sera that reacted exclusively to the 271X peptide, and not with the 271R or FIL peptides, were 5/12 RA sera, 0/18 healthy sera, 0/10 SLE sera. These results provide evidence that antibodies in RA sera bind specifically to a fibrinogen peptide biomarker generated by substituting arginine to citrulline at position 271. This study proves that clinical biomarkers can be discovered using proteomics. General guidelines for discovering more biomarkers are described in the next section.

Biomarker discovery process

The first and most important step of biomarker discovery is to plan the proteomic strategy. Each strategy needs to be custom-designed to achieve specific goals for each individual biomarker discovery project. There are many different proteomic approaches, each with distinct advantages and disadvantages. The planning should be made regarding the required amount, number, type and quality of the samples. An assessment of the available equipment, expertise, and computer resources has to be done. The details such as the amount of time that can be dedicated to the project, the expected budget, and competing projects, if any, have to be considered. A lot of time needs to be spent on this step.

The selection of the clinical samples is the second crucial step. Clinical proteomics should use a team science approach with leaders from different disciplines. Input is needed from each of the leaders and there needs to be an honest discussion amongst the group. The samples that are likely to contain biomarkers that can be identified using the available resources, have to be chosen. The necessary patient information required to complete the project has to be collected during this stage itself because the access to this information might not be available at a later stage. The entire process of collecting and processing the samples has to be monitored closely. Serum or plasma are complex proteomes and are unlikely to yield biomarkers using shotgun approaches. Proximal fluids and tissues are more likely to contain biomarkers in relatively high concentrations. Using a biorepository to obtain all or a few clinical samples might be considered.

Protein separations should be designed to produce clean, pure and fresh samples for mass spectrometry. Separations and any sample processing should be performed in clean tissue culture room-type environments. One should be careful not to introduce any new contaminants (detergents, salts, and keratin) that might interfere with the analysis and be aware of the fact that high concentrations of sample may contaminate the instrument and interfere with subsequent runs. Sample carryover issues may become a problem in shared instruments. The proteins that bind non-specifically are a problem in every experiment and the same proteins will bind non-specifically in different sample sets.

Mass spectrometry should be performed by an expert or in collaboration with a mass spectrometry facility. There are many types of mass spectrometers. The type of mass spectrometer used for analysis should be discussed and the mass spectrometrist should be trained to use that particular instrument. The scientists of the instrument company can be approached for any advice. In using a core facility for a large number of complex samples that require expert handling and analysis, the scientists may be expected to ask for authorship. This is fair since their expertise will be important for the project.

Analysing the mass spectrometry data is the most time-consuming part of the project. Complex projects can result in hundreds of thousands of spectra. Protein identifications are performed by matching experimentally derived mass spectra to theoretically derived spectra obtained from a specific database. A statistical analysis is performed to rank the best matches. Even the poorest spectra will be assigned a match and the software can match only what is there in the chosen database. Searches performed with different databases, search parameters or algorithms can yield slightly different results. It is important to estimate the probability that the correct peptide and protein have been identified.

The final step of biomarker discovery is verification. Generally this is an ELISA-based experiment performed with a different and larger sample set. The samples should include disease samples, health and disease controls. The disease controls will help identify markers of inflammation. The sample set should reflect the population seen in the clinic. The final point is to find people who are genuinely interested in studying the particular disease. Although the long hours spent to complete these enormous projects cannot be compensated, people need to feel proud that their work is meaningful. People need to value the clinical samples and involve themselves in the process of innovation. Also, it requires a lot of time for planning the strategy.

Author Bio

Leticia Cano is currently an IRTA Postdoctoral Fellow in the Laboratory of Applied Mass Spectrometry at the National Heart, Lung and Blood Institute (NHLBI) of the National Institutes of Health (NIH). Her research focusses on identification of biomarkers in autoimmune disease and the development of novel protein separation strategies. She identified a biomarker for rheumatoid arthritis for her PhD dissertation at the City of Hope Graduate School of Biological Sciences. She is also a former MALDI-TOF MS Applications Scientist with Bruker Daltonics Inc.


1. Rifai, N., M.A. Gillette, and S.A. Carr, Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nature Biotechnology, 2006. 24(8): p. 971-983.
2. Cano, L., The Identification of an Autoantigen in the Synovial Fluid of Patients with Rheumatoid Arthritis, in Immunology Department. 2006, City of Hope Graduate School of Biological Sciences: Duarte.
3. Yan, F., et al., Protein microarrays using liquid phase fractionation of cell lysates.Proteomics, 2003. 3: p. 1228-1235.
4. Masson-Bessiere, C., et al., The Major Synovial Targets of the Rheumatoid Arthritis- Specific Antifilaggrin Autoantibodies Are Deiminated Forms of the a- and b-Chains of Fibrin. The Journal of Immunology, 2001. 166: p. 4177-4184.
5. Sebbag, M., et al., The Antiperinuclear Factor and the So-called Antikeratin Antibodies Are the Same Rheumatoid Arthritis-Specific Autoantibodies. J. Clin. Invest., 1995. 95: p. 2672-2679.
6. Nakayama-Hamada, M., et al., Comparison of enzymatic properties between hPADI2 and hPADI4. Biochemical and Biophysical Research Communications, 2005. 327: p. 192-200.
7. Kubota, K., T. Yoneyama-Takazawa, and K. Ichikawa, Determination of sites citrullinated by peptidylarginine deiminase using 18O stable isotope labeling and mass spectrometry. Rapid Commun. Mass Spectrom., 2005. 19: p. 683-688.
8. Qian, W.-J., et al., Advances and Challenges in Liquid Chromatography-Mass
9. Spectrometry-based Proteomics Profiling for Clinical Applications. Molecular & Cellular Proteomics, 2006. 5: p. 1727-1744. 9. Righetti, P.G., et al., Prefractionation techniques in proteome analysis: The mining tools of the third millennium. Electrophoresis, 2005. 26: p. 297-319.
10. Stasyk, T. and L.A. Huber, Zooming in: Fractionation strategies in proteomics. Proteomics, 2004. 4: p. 3704-3716.
11. Smith, R.D., Trends in mass spectrometry instrumentation for proteomics. Trends Biotechnology, 2002. 20(12 Suppl): p. S3-7.

Author Bio

Leticia Cano