We’re sorry, the page you are looking for cannot be found.
do not necessarily reflect the views of UKDiss.com.
Mass spectrometry instrumentation
Mass spectrometry, based on its superior performance, has played a critical role in analytical chemistry for very long time. During the last few decades, mass spectrometry has experienced a new revolution with the advent of new hybrid instrumentation and new applications in large-scale characterization of biomolecules such as proteome, lipidome, and metabolome. A mass spectrometer is composed of three typical components: ionization source, mass analyzer and the detector. The gas-phase ions are first produced in ionization source and later transmitted into the analyzer for separation based on their mass-to-charge ratio (m/z). Detector records the signal and the computer system transfers the signal to the spectra.
Mass spectrometry separates different analytes based on their mass-to-charge ratio. The first step in the mass spectrometry measurement is the generation of the gas phase ions of the molecules. A variety of ionization techniques are available for mass spectrometry, and the selection of the proper ionization technique depends mostly on the sample features. The ionization techniques can be divided into two groups: low pressure ionization method and atmospheric pressure ionization method. Low pressure ionization methods include electron ionization (EI) and chemical ionization (CI).101 Atmospheric pressure ionization methods includes electrospray ionization (ESI), matrix-assisted laser desorption/ionization (MALDI),102 atmospheric-pressure chemical ionization (APCI).103 EI and CI are only suitable for vapor phase ionization and thus they are incompatible with nonvolatile and thermally labile molecules, such as most of the biomolecules. In addition, EI is also known as a hard ionization method, which induces large degrees of fragmentation along with the ionization. For atmospheric pressure ionization methods, APCI and ESI operate at atmospheric pressure. MALDI normally preformed in vacuum, but can be performed at atmospheric pressure as well.104APCI works for molecules with a molecular mass below 2000Da, the sensitivity, ruggedness and reliability makes it a good choice for pharmaceutical application.103,105
MALDI and ESI are the two major ionization techniques used for large biomolecular characterization. In MALDI, the sample is typically mixed with a matrix. When the sample is bombarded with a laser, the matrix absorbs the laser radiation and transfers the proton to the sample. MALDI holds the advantage of easy sample preparation and large tolerance to contaminates such as salts, buffer and detergents. However, typical MALDI spectra contain mainly the singly charged ions, and the nature of the MALDI as a pulsed ionization technique, limits the mass analyzer that can be coupled to it. Time of flight (TOF) mass analyzer is the most common one that is coupled with MALDI since TOF analyzer has a wide mass range and well suited to the pulsed ionization.106,107
ESI can generate multiple charged ions, which facilitates the characterization of large biomolecular like proteins by allowing their molecular weight to be determined with lower mass range instruments. In addition, ESI also achieves high sensitivity and can be easily coupled with separation techniques such as high-performance liquid chromatography HPLC, μHPLC or capillary electrophoresis. The principle of the ESI has been extensively studied.108 Generally, the charged gaseous species of ESI were generated by applying a strong electric field to the liquid which flow through a heated capillary tube with a slow flux. The electric field is produced by applying the potential difference between the capillary and the counter-electrode. This electric fields is strong enough to accumulate sufficient charges on the surface of the liquid at the end of the capillary and generates the Taylor cone – the drying process and the coulomb explosion further cause the breakdown of liquid to a lot of highly charged droplets. This process can be nicely explained by Rayleigh equation, which clearly illustrates that proper balancing of surface tension and charge is necessary to keep the stable shape of a charged liquid droplet. Once the charge gets sufficiently large compared to the surface tension, the shape of the droplet becomes unstable and fission takes place.109 When the droplets enter the capillary, the remaining solvent molecules will be finally removed with the assistance of the heated capillary or heated inert gas.
Actually, the spray starts from the “initial voltage” given the specific source and the surface tension of solvent that is used. The voltage needs to be set high enough to conquer the surface tension of the solvents, thus be able to change the shape of the drop to “Taylor cone”. Solvent which has higher surface tension always needs higher initial voltage, and vice versa.110 Small charged droplets are released from the Taylor cone and they start shrinking due to the solvent evaporation. When Rayleigh stability limit is reached, charged droplets start to divide and explode. This kind of division continues for several generations until the electric field on their surface becomes large enough. Then desorption of the charged molecules starts from the surface. It explains that ions of higher concentration in the surface, which are normally more hydrophobic, are always having better sensitivity and can cause the ion suppression on those ions sit inside. Ions will be further transferred to the high vacuum mass analyzer for m/z determination. The electrospray ionization process is presented in Figure 2.3.
Nano-ESI (nanoelectrospary ionization) has emerged ten years after conventional ESI, which requires very low amount of sample, but achieves higher sensitivity than ESI.111 As mentioned above, The ESI process is largely depending on the electrochemical process happened in the probe tip, which is influenced by the physical and chemical properties of the sample liquid (surface tension, conductivity, ionizable sites) and external factors such as: voltage, liquid flow rate, length of the capillary. After the occurrence of the ESI, a number of sprayer modifications, such as pneumatically assisted electrospray, ultrasonic nebulizer electrospray, electrosonic spray and nanoelectrospray, have been developed to expand the range of ESI applications. The nanoelectrospray is the most popular one among all these techniques. 108
Nano-ESI is a more efficient spraying technique than conventional ESI. By decreasing the flow rate to nL/min, lower sample amount and lower voltage are required for nanospraying. Conventional spray needle is replaced by borosilicate glass capillary with a fine tip (1–4 μm inner diameter) which is pulled with a micropipette puller. By using Nano-ESI, the size of first generation charged droplets are much smaller than that generated by conventional ESI, the desolvation process thus becomes much easier and quicker, and leading to the improvement of ion efficiency by 50~60%. Currently, nano-ESI is the most widely used ionization technique in peptide and protein analysis. All the Mass Spectrometry instruments used in our lab are coupled with nano-ESI.111
Mass Analyzers and Detectors
When gas-phase ions are transmitted into the analyzer, they are separated by their m/z. Different mass analyzers, which operated on different principles (Table 2.2), are available.
The most basic principle is that all the analyzers use static or dynamic electric and magnetic fields which can be combined or used alone. Each analyzer has its own advantages and limitations and no one is ideal to all the applications. The selection of the mass analyzer relies on the sample type and experimental goal and the performance of a mass analyzer can be evaluated by the following figures of merit: mass range, scan speed, sensitivity, mass accuracy and resolution. Mass range measures the range of the lowest mass and highest mass that can be detected. Scan speed refers to the speed that the mass analyzer takes to measure over a particular mass/charge range. Sensitivity gives the signal response to different quantity of analyte, generally presents as the slope of the calibration curve. Mass accuracy indicates the difference between the theoretical m/z and the measured m/z and often expressed in parts per million (ppm). Resolution determines the capability to distinguish two m/z with very small difference and is obtained by calculating FWHM (full width of at half maximum) of a peak. Ion trap and Orbitrap are the two types of analyzer used in this dissertation, thus their operational principles are introduced below. Additionally, the principles of quadruple analyzers were also introduced, since it represents a very important separation scheme in MS analyzers, and it also helps better understand the principles of the ion trap analyzer.
Both the quadrupole and ion trap mass analyzer separate different m/z by oscillating electric fields, to manipulate the stable trajectories of different ions. A quadrupole is made up of four electrically isolated hyperbolic or cylindrical rods which are arranged in perfect parallel position. The principle of the quadrupole was described by previous paper.112,113 The combination of an alternating field and a constant field creates a hyperbolic field, which is an electrical region with strong focusing and selectivity. The ions enter the space between the quadrupole rods along the z axis with certain speed, but they are also forced to move towards x and y axis due to the electric fields. 113 As long as the ion does not hit the rods (x≠r0, y≠r0), they can keep a stable trajectory along with z directory towards the detector. But for other ions which do not have the stable trajectory, they will strike the rods and neutralize upon contact, thus will not be detected. Ion motion in a quadruple follows the Mathieu equation, which can be further rearranged as follows:
For a given quadrupole, r0 (radius) is the constant, ω=2πv (ω the angular frequency (in radians per second=2πv, where v is the frequency of the RF field) is also maintained constant, leaving the U (direct potential) and V (amplitude of the RF voltage) as variables to determine the m/z. For an ion of any mass, the stability area can be represented in a V, U diagram. Separation of ions with different m/z can be achieved by ramping the RF and DC voltages at a nearly constant ratio. And different resolution can be achieved by adjusting the slope of V, U diagram.
Besides serving as the mass analyzer, a quadrupole can also be used as ion guide or collision cell. Ion guide, also called ion focusing device, is used to efficiently transmit ions from one point to another without mass separation. Ion loss, normally caused by their collision with residual gas molecules or space charge effects, can result in lower sensitivity. Therefore, an ion guide should transmit ions efficiently and reduce the ion loss. Quadrupole, when operating in RF-only mode (U=0), can keep ions of a large range of m/z in their stable trajectories under certain V. However, when V is set very low, although it can keep more ions stable, it has poor focusing on large m/z. In order to increase the transmission of ions with larger m/z, V has to be increased, with the sacrifice of losing lower m/z due to the unstable trajectories. There is thus a compromise to find the appropriate V.
Ion Trap Analyzers
The ion trap is a device that not only can trap ions by oscillating electric fields; but also can be used as mass filter and collision cell. The quadrupole analyzer manipulates the potentials to allow only the selected m/z go through the rods. In ion trap, all the ions of different masses are trapped in the trap, and are then expelled out of the trap according to their m/z to reach the detector.
There are two types of ion trap: 2D ion trap and 3D ion trap. The two types of ion trap operate on the similar principles; however, due to the better performance of the 2D ion trap on trapping more ions before the space charge limit, the 2D ion trap thus has a better sensitivity and a wider dynamic range than 3D ion trap. The 2D ion trap is also known as linear ion trap (LIT), which is commonly combined with Orbitrap mass analyzer nowadays for proteomics measurements.
LIT is composed of an array of four rods and ending in lenses that can repel ions. Similar as in quadrupole analyzer, the direct potential (DC) and alternating potential (AC) are applied to the metal rods to control the ion motion. In the trap, ions are confined in the radical dimension by RF electric field and in the axial dimension by a static electric field using DC voltage at the ends of the trap. Once in the LIT, ions are cooled with an inert gas to reduce collision with neighboring ions and ion motion follows the Mathieu equations:
U is the DC voltage, V is the RF amplitude, r0 is the radius of the ring electrode, ω is the RF angular frequency. Three key parameters exist in the equation: a, q, β. β is an important parameter since the amplitude and the frequency of the oscillation performed by the ions in the trap are closely related to β. β is a function of a and q. For low value of q, and when a=0, β is given by β ≈ q/√2.
Ion trap can have two different ways to expel ions out of the trap. The first scenario is to take advantage of the stability limit and let unstable ions at the boundaries of stability diagram to be scanned out first. In this case, DC voltage is set as 0, and only the fundamental RF voltage is applied to the rod. Its frequency is set as the constant but with the amplitude V varied. Then the additional RF voltage of selected frequency and amplitude is applied to the end cap. Since heavier ions have lower β value and lower secular frequency, when V is increased, β is increased and will hit the stability limit of lighter ions first, so lighter ions will be expelled from the trap.114
The second way to expel ions out of the trap is by resonant ejection. An ion oscillates in the trap at the secular frequency follows: fz = βzv/2. Since βz is related to qz, for an ion of given m/z oscillates at a selected frequency fz, V can be finally calculated based on qz. If an RF voltage at frequency fz is applied on the end caps, the ion will resonate along with z axis, and the amplitude of its oscillation will increase. When the amplitude increased to a certain level, the oscillation of the ion will be so large and cause the ion become unstable, and thus the unstable ion will be expelled out of the trap. And a range of m/z can also be ejected by applying multiple frequencies.
Orbitrap is an electrostatic ion trap, and the first commercial instrument was introduced into the market in 2005 and immediately attracted a great attention. The design of the Orbitrap analyzer is based on a new concept, described by Cooks in 2005.115 The typical Orbitrap is composed of an external electrode shaped as a barrel, and a central electrode with a spindle shape. The central electrode was applied with electrostatic voltage, opposite to the charge of the ions; while the outer electrode is at ground potential. Ions are injected into the trap and start to oscillate around the central electrode along the z axis under the electrostatic field. Careful mathematical calculation is required to ensure a reasonably circular or oval trajectory of the ions around the central electrode. For example, for positive ions, while the inner electrode is set at about −3200V, the ions need to have a kinetic energy around 1600 eV when they enter the trap. Ions with different m/z have different frequencies while they oscillate around the central electrode, which is independent of the kinetic energy of the injected ions:
ω is frequency; m is mass; q is the total charge (q=ze); k is field curvature.
This is a very important property of the Orbitrap, indicating the separation of different ions is only depending on m/z value. The current induced by the oscillating ions is measured and converted by Fourier transform to obtain mass spectra. In a complete stack of Orbitrap, including an atmospheric pressure source, an analytical quadrupole, a storage linear trap and an Orbitrap, the ions are transmitted from atmospheric pressure source to the vacuum area and accumulated in the C trap. The curved shape of C trap is allowing fast injection of fair amount of ions into the Orbitrap, thus improving the detection sensitivity. It is this ion trapping device that makes the coupling of continuous ion sources and Orbitrap mass analyzer possible since Orbitrap operates in a pulsed fashion. In addition, because of this trapping device, the amount of ions injected into the Obitrap can be controlled, further reducing the space-charge effects. Ions with the same m/z have very coherent oscillation along z axis, while the rotation around the central electrode does not have this kind of coherence, which reduces the background noise. Therefore, this perfect coherent oscillation achieves very high resolution and accuracy. The resolution is positively correlated to the detection time, increasing the detection time can obtain even higher resolution (>240,000 at m/z 400). High mass accuracy: <3ppm with external calibration and <1ppm with internal calibration can be achieved with Orbitrap.
Hybrid MS Instrumentation
Since each analyzer has its own strength and limitation, combination of different analyzers are engineered to obtain better performances. Different combinations can be found in the market, while the LTQ-Orbitrap is one of the favorite instrument platform in proteomics fields. In the hybrid configuration, LTQ is normally used for ion trapping, ion selection, ion fragmentation and low resolution ion detection; while the Orbitrap is used for high resolution, high mass accuracy ion detection. This hybrid instrument can operate on two different detection modes. One is FT-IT mode, which means the MS1 precursors ions are detected in Orbitrap while the MS2 fragment ions are detected in LTQ. The other mode is FT-FT mode, referring to the situation that both MS1 and MS2 are detected in Orbitrap. There are different generations of the LTQ as well as Orbitrap analyzer with continuously improved performance.116 LTQ-Orbitrap Elite (Figure 2.4), combined the most recent generation of LTQ and ultra-high resolution Obitrap, is introduced in detail here. S-lens in the front side is an ion guide with spaced stacked ring shape, which improves the pumping efficiency and ion transmission. Square Quadrupole with neutral blocking facilitates ion focusing. The dual-pressure linear ion trap is designed to achieve better whole-package performance since high pressure cell has higher efficiency on ion trapping, isolation and fragmentation; while the low-pressure cell has higher scan speed, resolving power and mass accuracy. Following gas-free multipole optics provides higher ion transmission and gas-filled C-Trap efficiently traps all the ions. The new collision cell (HCD) next to the C-Trap offers another way for fragmentation (Thermo Fisher Scientific Orbitrap Elite Hardware Manual). HCD refers to higher-energy collisional dissociation, which is also one CID technique specific for Orbitrap analyzer. The precursors ions trapped in C-Trap are transmitted into the HCD cell and fragmentation occurs, later on, fragmented ions are transferred back to the C-Trap and finally reach the Orbitrap for high resolution detection. Compared with traditional ion trap-based collision-induced dissociation, HCD has higher activation energy and shorter activation time. It also has no low mass cut-off restriction and can generate more high quality MS/MS spectra. One drawback is that the spectral acquisition time is up to two-fold longer.117 The Orbitrap was developed with a decreasing gap between the inner and outer electrodes, thus providing even higher frequencies of ion oscillations and hence higher resolving power. Overall, the key performance characteristics of Orbitrap Elite are listed as follows: Mass Range: m/z 50 – 2,000 or m/z 200 – 4,000; Resolution (FWHM): > 240,000 at m/z 400; Mass Accuracy: < 3 ppm with external calibration and < 1 ppm using internal calibration; Dynamic Range: > 5,000 within a single scan; MS Scan Power: MSn, for n = 1 through 10.
The ions passing through the mass analyzer are then detected and recorded by different detectors. Several different types of detectors exist, but only two detectors (Electron Multiplier and Image Current) that are commonly coupled with LTQ and Orbitrap are described here. The selection of the detectors relies on the types of the instruments and their analytical applications. Electron Multipliers are the most widely used detector in mass spectrometry. In this detector, the conversion dynode is held at a high potential from ±3 to ±30 kV, opposite to the charges of the targeted ions. Due to this high potential, ions from the analyzers are accelerated to a high velocity to enhance the detection efficiency. Ions strike the conversion dynode and thus generate several secondary particles. These secondary particles then strike the first dynode, leading to the emission of secondary electrons. These electrons are amplified by a cascade effect to produce the final current. There are two types of electron multipliers, one is discrete dynode, and the other type is continuous dynode (channeltron, microchannel plate or microsphere plate).The discrete dynode electron multipliers are composed of a series of 12 to 20 dynodes, which are held at decreasing negative potentials. The first dynode is held at the highest negative potential, while the output of the multiplier is held at ground potential. So when the secondary electrons generated after striking the first dynode by the secondary particles, these electrons are accelerated to the next dynode due to the lower potential. They strike the second dynode and generate more electrons, and this process is repeated for other dynodes until reaching the ground potential. And the amplified electric current is generated at the end of the electron multiplier. LTQ-Obitrap-Elite, which is introduced above, is using this type of electron multiplier. The other type of electron multiplier replaces the discrete dynodes by one continuous dynode, named continuous dynode electron multipliers. We will use channeltron as one example of this type to briefly illustrate its principles. The channeltron has a curved tube which has a voltage applied between the two extremities, thus producing a gradient accelerating field along tube walls. Secondary particles generated from the conversion dynode by one ion strike the curved inner wall and produce secondary electrons. These electrons pass further and strike the wall of the electron multiplier, thus generating more and more electrons. Thus the cascade of electrons is created and amplified electric current is measured. Continuous-dynode electron multiplier is also widely used in LIT detection system. The number of the secondary particles and the multiplying factor of the dynodes determine the amplifying power of the electron multiplier. Given the features of high amplification and fast response time, electron multipliers are able to be coupled with rapid scanning analyzers, such as quadrupole or linear ion trap analyzer. However, electron multipliers have limited lifetime (1 to 2 years) due to the surface contamination from ions or relatively poor vacuum.
The image current detection is commonly employed in Orbitrap and FTICR detection system. Unlike electron multiplier, the detector has a pair of metal plats within the mass analyzer region, thus is very close to the ion trajectories. To be detected, ions of a certain mass circulate as a tight packet in the same orbit. The ions of the same mass are excited to the same energy and thus have the same orbiting frequency. The movement of ions between rods induces an electric current which can be measured and then converted to mass/charge spectra by a fast Fourier Transform algorithm. This detection method can count multiple masses and detect all the ions that arrive at the same time. In addition, image current detection does not require the amplification of signals.
Tandem mass spectrometry
Tandem Mass Spectrometry, abbreviated as MS/MS or MSn (n=2,3,…), normally referring to two or n stages of mass analysis associated with ion fragmentation process. Tandem mass spectrometry is employed to improve the specificity of the detection, in particular to facilitate structure elucidation by fragments information and enhance the signal-to-noise of the spectra. For MS/MS, a first mass analyzer is used for isolating the precursor ion of interest, which further undergoes the ion activation to produce product ions and neutral fragments.
Generally, tandem mass spectrometry can be operated in space or in time. In space tandem mass spectrometry requires coupling two distinct mass analyzers, such as triple quadrupoles (QqQ). The fragmentation happens in the second quadrupole (q), and the first and third quadrupole are two mass analyzers, which are used to measure MS1 and MS2 individually. Tandem mass spectrometry can also be achieved in time, which means that MS1 and MS2 are measured sequentially in the same mass analyzer assisted by an ion trap device. Mass analyzers suitable for in time separation are like ion trap, Orbitrap and FTICR.
Ion activation is achieved by increase of the ion internal energy to induce dissociation. There are three major methods which can supply the energy for ion dissociation: (1) Energetic ion collisions with neutral collision gas, including collision-induced dissociation (CID) or collision-activated dissociation (CAD); (2) Photons, including photo dissociation (PD) and infrared multiphoton dissociation (IRMPD); (3) Electrons, including electron-capture dissociation (ECD).
Different ion activation processes fragment ions in different ways. By using CID or CAD, ion kinetic energy is converted to ion internal energy upon the collision with a neutral collision gas, bringing the ion to its excited stage and later leading to the unimolecular dissociation of the activated ions. Since the dissociation rate is slower than the rate of energy randomization, energy is redistributed among all of the vibrational modes before dissociation occurs. Under this condition, the weakest bonds are preferentially cleaved. Upon colliding with neutral collision gas, only a fraction of the ion kinetic energy can be converted to ion internal energy. This energy fraction can be expressed by the following equation:
Ecom=Elab * mcollision gas/ (mcollision gas + mion)
Ecom is the maximum amount of kinetic energy that can be converted into internal energy in a single collision event; Elab is the ion kinetic energy. From the equation, it is easy to notice that increasing the ion kinetic energy or the mass of the collision gas will increase the energy available for the conversion. In practice, there are two different collision regimes: high energy (several thousand electronvolts) and low energy (1-100eV). Normally, only TOF, electromagnetic or hybrid instrument can function at high energy, while the quadrupole, ICR or ion trap instruments can only perform at low energy. Generally speaking, high energy collision always generate simpler and more clear fragmentations, and low energy collision leads to more diverse fragmentation pathway, sometimes including rearrangements. As indicated by the equation, the mass of the collision gas also impacts the convertible energy. Helium is the commonly used collision gas at high energy collision, while the heavier gas such as argon or xenon are normally applied in low energy collision. As we mentioned above, the weakest bond is always broken first by CID, so for peptides, the peptide bonds are always preferentially cleaved and b, y ions are thus largely formed in CID (Figure 2.5).
There is an inherent disadvantage associated with CID given the nature of this collision method: the converted energy is limited, thus the fragmentation is limited. When MS/MS is used to fragment large molecules, the energy needs to be re-distributed in a great number of bonds, resulting in a slower reaction rate of fragmentation. In addition, the time for collision gas introduced into the instrument further compromise the vacuum. Thus, other ion activation methods (SID, ECD, IRMPD) were developed to avoid these drawbacks. Since none of these methods was used in my dissertation work, they are not introduced here.
Data acquisition in proteome measurement
Advanced development in chromatography and MS instruments, offers significantly improved peptide separation and scan speed, leading to the increase of peptide identification in a shorter time manner. The selection of peptide ions for MS/MS analysis, which is controlled by data acquisition methods, is critical for the final results. There are two data acquisition methods available currently: data-dependent acquisition method (DDA) and data-independent acquisition method (DIA).
Data-dependent acquisition refers to the method that top n abundant precursor ions can be selected for MS/MS analysis in a user defined time. The parameters included in this process are: repeat count, repeat duration, minimum MS signal, and dynamic exclusion. Optimized setting of these parameters was shown to improve the proteome coverage. In particular, the number of MS/MS events/cycle and the dynamic exclusion setting are directly associated with the depth of identified proteome and protein quantification. 118,119
Data-independent acquisition method (DIA) method is not looking for top n abundant precursor ions, instead, it divides the entire m/z into a series of consecutive windows and precursor ions appear in the same window are simultaneously fragmented regardless of the intensity. In this case, all peptides are systematically fragmented in the entire m/z range, which delivers more complete proteome information. The differences between DDA and DIA were examined by a previous paper. 118 However, it also brings in the fragmentation of more interference and thus leads to more complicated data interpretation, and this challenge increases with the increased complexity of proteome samples. Therefore, data-dependent acquisition method is still the most widely used scheme in proteome analysis and used through this dissertation.
Bioinformatics for proteome interrogation
With the advent of the robust and high efficient LC/MS/MS platform, systemically analysis of all the proteome within a cell/community becomes feasible. However, manual assignment of this large amount of raw data to peptides is impossible, leading to the development of various bioinformatics tools to accomplish this task. Due to the complexity of the proteome data, it still represents a significant challenge for these bioinformatics tools. In principle, if a peptide can be fragmented completely, the mass difference between neighboring two tandem mass spectra represents the mass of one amino acid and the whole tandem mass spectra can deliver the information of the amino acid sequence. Based on this principle, different computational strategies were proposed to identify peptides/proteins from tandem mass spectrometry (MS/MS) data. One method is depending on de novo assignment, which is database independent. The deduction of amino acid sequences from MS/MS spectra by using this method largely rely on very high quality of MS/MS spectra and subsequent sophisticated mathematical models, sometimes also assisted by the chemical techniques such as isotopic labelling. This method is superior in terms of the potential to provide all the information collected by mass spectrometry, but the measured tandem mass spectra in real world is always much more complicated than it is predicted to be, thus adding more challenges for this method. There are several software available currently, such as Lutefisk, PEAKS 120 and PepNovo 121, which use this method to extract entire or partial amino acid sequence without the use of databases. Given the difficulty of the de novo sequencing, the other more widely adopted method is database based searching algorithm, which compares the experimental spectra with theoretical spectra that is generated from the known proteome database and further assigns the matched spectra to specific peptides by a series of statistical calculation.
Database searching algorithms for peptide assignment
The prerequisite to perform database searching algorithm for peptide identification is the availability of proteome database. Although various MS/MS database search algorithms have been developed, the basic principle is similar. Here, I am going to introduce the first developed database searching algorithm: SEQUEST, and followed by several others that are used in my dissertation. The user provided database goes through the in-silico digestion process to generate the theoretical peptide list. The program takes each experimental MS/MS spectrum of a precursor ion to compare with theoretical sequences by matching the molecular weight within a defined mass tolerance. The next step is to assign the score Sp to all the candidates by using several different criteria, including the number and intensity of the predicated fragment ions that matched to the measured fragment ions, the continuity of an ion series and the presence of immonium ions for the ammo acids His,Tyr, Trp, Met, or Phe. The Sp score is calculated by following formula:
i: intensity of matching ions; n: the number of matching ions within 1Da mass tolerance; β: the level of continuity of an ion series; ρ: the presence of immonium ions for the ammo acids His,Tyr, Trp, Met, or Phe containing peptides; nt: total predicted sequence ions.
The next step is using cross-correlation analysis to compare the top 500 candidates that are ranked by Sp score with acquired tandem mass spectrum in terms of the spectra similarity. To compare the amino acids sequence from the database search with the measured tandem mass spectrum via cross-correlation analysis, theoretical fragmentation spectra is reconstructed for top 500 candidates which contains predicted b and y ions. The experimentally obtained spectra are also normalized and noise is reduced to create measured spectra barcode-ish. In this way, predicted tandem spectra can be compared with the measured spectrum in the same barcodes style and these spectra are further assigned with the XCorr score to indicate the level of similarity. If the score is above a given threshold and significantly better than the next best score, it is reported in final output. At the same time, there is another score that is calculated, termed DeltCn, which is used to evaluate how well SEQUEST can distinguish the best hit and the second best match. The higher this score, more confident is the match.23 Once the spectrum is assigned with the peptide sequence, the next question is how to distinguish if it is a true match. Therefore, FDR (false discovery rate) is employed to control the false positive identifications. The calculation of FDR is based on searching the MS/MS spectra against the target reference database as well as the decoy database such as reversed database, randomized or shuffled database. FDR is calculated by using the formula [(2*decoy IDs]/(total IDs)] * 100. The FDR value is impacted by the filter cutoffs, like the XCorr.
Following the development of SEQUEST, other database searching algorithms are also developed. MyriMatch and Andromeda are introduced here since they are applied in my dissertation work. Based on the similar principle by comparing the observed tandem spectra to the theoretical spectra, MyriMatch further improved the statistical model by scoring peptide matches considering intensity. The first step is called “tunable preprocessing and scoring”. In this step, the sum of fragment ion intensities (TIC) for each MS/MS is calculated and ranked in descending order. The top 95% were retained and other 5% were removed as noise. All the retained peaks are divided into user defined numbers of intensity classes based on their intensity. The basic rule is to make the most intense class holding the fewest peak, and the adjacent class sizes are in a ratio of 2:1. The second step is to generate the theoretical spectra according to the provided database, and MyriMatch employs novel system for modeling fragments by considering the charge differences. The third step is the typical comparison between acquired spectra and predicated spectra. Measured m/z value is first matched to the predicted m/z based on the instrument determined mass tolerance, coupling with the intensity class information, the probability of this match occurring by random chance is evaluated by multivariate hypergeometric (MVH) distribution. For each spectrum, the negative of log (MVH) value is reported; indicating that lower is this value, less chance it is a random match; in other word, lower is this value, higher confidence is this match.122
Andromeda is the database searching engine that uses a probabilistic scoring model for the scoring of peptide-spectrum matches. This searching engine was developed to improve the searching space to enable analysis of complex data sets, such as data with different modifications and labeling, in a simple analysis workflow on a desktop computer. Probabilistic scoring method was first employed in Mascot by emphasizing that the match between experimental data set to each sequence is depending on the probability. The match which has the lowest probability is regarded as the best match, and the significance is also impacted by the size of the database.123
Generally, it first builds the theoretical fragment ions based on the provided database, coupling with the consideration of modification and specific molecule loss, such as H2O, NH3. At the meanwhile, the algorithm also processes the measured spectra by centroiding, de-isotoping and transferring all other charges to charge 1. Then it counts the number of the predicted spectra that are falling into the defined mass tolerance with measured spectra. Based on the number of matching ion in spectrum and total number of theoretical ions, the probability of getting this number of matches by chance is calculated. Other information, including the intensity, peptide length, number of modifications or missed cleavages further assist the specific assignment of peptide to spectra. 124
After database searching, peptide sequences are given the specific score metrics which indicate their levels of match with spectra; however, due to the complexity of these results, automated handling is still required to further process this immature information to get more confident peptide identification and assemble peptides back to proteins. IDPicker is one of automated handling tools that filters peptide identifications to a desired FDR by using decoy database matches and further assembles peptides to proteins. Since IDPicker is only coupled with MyriMatch for my dissertation work, how IDPicker process the results generated by MyriMatch is introduced in details here.
Basically, IDPicker filters the peptide identification based on the user defined peptide spectrum match FDR (normally 2%). In IDPicker, FDR is reported in three levels, including peptide spectrum match FDR, peptide level FDR or the protein level FDR. The most widely used FDR threshold is the peptide level FDR ≤ 0.01. Several parameters are known to impact the FDR value. In the peptide-spectrum-match filter, the maximum q value, minimum spectra per peptide, minimum spectra per match, and maximum protein groups can be adjusted individually. IDPicker can be configured to use either one score or “Monte Carlo weighted multiple scores” for computing the q-values. In our setting, we normally applied Monte Carlo method to calculate the q value. This method figures out the best combination of multiple scores (XCorr, MVH, and mzFidelity output from MyriMatch) to compute the q-values. The score weights of combination are figured out using a Monte Carlo simulation. MVH score is used to evaluate the probability of the match occurring by random chance, mzFidelity score is used to access the proximity of the observed and expected fragmented m/z value, XCorr reflects the quality of the match. In the protein level filters, minimum distinct peptides, minimum additional peptides, minimum spectra can be optimized individually to adjust the desired FDR value. Normally, in order to confidently identify a protein, one unique peptide and one additional peptide is required. The most important information contained in IDPicker output is the spectra counts of identified protein and peptides, which is used for further protein quantification.
Protein identification indicates the presence of the protein in the sample and mostly based on the tandem mass spectra; but the abundance or relative abundance information is provided via protein quantification which is normally based on full-scan mass spectra. Quantitative proteomics is the approach to determine the abundance ratios for thousands of proteins between different conditions. Mass spectrometry basically records the intensity for each spectrum, and the intensity is always used for quantification. MS intensity is influenced by analyte concentration, sample recovery during sample preparation and LC, ionization efficiency and instrument sensitivity. Therefore, the MS intensity of different analyte does not necessarily reflect their abundance difference under natural condition. To do protein quantifications, there are several different methods depending on the research goals. The first one is the absolute quantification, which takes advantage of the internal standard with known concentration, to determine the analyte’s actual abundance in a sample. This method is also applicable to compare different analytes and always requires the labeling technique like AQUA (absolute quantification of proteins) for constructing the corresponding internal standard. Second method is the relative quantification, which determines the abundance ratio of the same analyte between different samples. This relative quantification method is routinely used in proteomics analysis for protein quantification. Relative quantification can be achieved by label free method or labeled method. For label free method, spectral counting, NSAF (normalized spectral abundance factors), spectral indexing, MS1 intensity, matched ion intensity, etc can be used as the quantification matrices. For labeling-based methods, there are different labeling techniques. Chemical labeling includes iTRAQ (isobaric tag for relative and absolute quantification) and TMT (tandem mass tag), metabolic labeling includes SILAC (stable isotope labeling by amino acids in cell culture) and N15 (nitrogen-15 metabolic labeling).
For label free quantification method which is carried out in my dissertation, I used spectral counting output from IDPicker or MS1 intensity output from MaxQuant for relative protein quantification. Spectral counting of a protein is the sum of the spectral counting of all its peptides, which refers to the number of MS/MS spectra assigned to each peptide. In DDA analyses, more abundant peptides are sampled more often than lower abundance ones. Although it is an easy-to-apply method and also shows high correlation to the relative protein concentration when the protein has more than two fold changes, the drawbacks of this quantification method are obvious: 1) peptide ion properties are not the inherent characteristics of the peak recorded by mass spectrometry, thus potentially discarding other important information of a peak; 2) measurement can be easily affected by the level of competition with other ions for DDA selection within or across different samples and by the setting of dynamic exclusion methods; 3) saturation effects are observed for the spectral counting method; 4) the quantification power is weak for low abundant proteins.17,35,125 Therefore, MS1 intensity method is superior in terms of the quantification power to the spectral counting method by avoiding all the drawbacks listed above. With the development of high resolution instrument and advanced separation skills, peptides currently can be resolved well and the information of extracted ion currents (XICs) of peptides can be obtained, which is critical for accurate MS1 intensity based quantification. MaxLFQ is a method developed by Dr.Matthias Mann’s group to achieve more accurate label free quantification for shotgun proteomics analysis. They employed a new method to normalize the differences originating from different fractions and developed a new method to extract maximum peptide ratio for protein quantification. Previously, there are always arguments on the selection of proper peptides intensity that can represent for the proteins. The simplest way is to sum up all peptide signals for each protein and compares the protein ratio. Alternative methods include only using the top n abundant peptides for protein quantification or using the average to represent the protein intensity. However, these methods discard the ratio information carried by single protein. Actually, the signal ratio of one peptide should be able to represent for the ratio of protein. To avoid this drawback, there is another method which selects peptide that is always identified across samples; however, with the increase of the sample numbers, the chance that one peptide is always identified decreased dramatically. Therefore, they proposed a new method to conquer these limitations. Generally, this LFQ method calculates each peptide ratio in pair-wise, and requires at least 2 pair-wise ratios to determine the final intensity of the proteins. In other words, across all the samples, the quantifiable proteins need to have at least two shared peptides that are identified in at least two samples. After the pair-wise ratio is calculated for each protein, the pair-wise protein ratio is defined as the median of the peptide ratios. Then a least-squares analysis is preformed to reconstruct the abundance profile according to individual protein ratio and assign intensity to each protein. The method is proved to be much accurate than other label free quantification methods.126