UDK 533.5:543.51:621.384.8 Original scientific article/Izvirni znanstveni članek ISSN 1580-2949 MTAEC9, 43(1)39(2009) THE PSEUDO-GRADIENT ALGORITHM FOR RESIDUAL GAS ANALYSIS PSEVDOGRADIENTNA METODA ZA ANALIZO MASNIH SPEKTROV Igor Belic Institute of metals and technology, Lepi pot 11, 1000 Ljubljana, Slovenia igor.belic@imt.si Prejem rokopisa — received: 2008-12-19; sprejem za objavo - accepted for publication: 2009-01-12 The article focuses on a special approach to the qualitative and quantitative residual gas analysis of a vacuum-chamber atmosphere. The main outline of mass-spectrometry usage is given together with a brief comparison of two completely different scientific fields. The new approach, where the mass spectra are formalized in a vector annotation, to residual gas analysis is proposed. The main problem with residual gas analysis based on a mass spectrum is its ambiguity, which cannot be easily resolved. One way to deal with the problem is practical knowledge about the interpretation of mass spectra. The pseudo-gradient algorithm for mass-spectra analysis was developed and tested. The general testing platform was developed and the algorithm was tested within it. The presented work is the foundation for a comprehensive study of mass-spectrum analysis techniques. An important concept is the virtual environment that provides the mass-spectrum generator, the standard fragmentation patterns' database, the data space for the various algorithms that can test various approaches to the mass-spectrum analysis, and the database of the achieved results, which is necessary for the comparison of different algorithms and represents the backbone for the dynamic mass-spectra generation and analysis. Kay words: mass spectra, residual gas analysis, gradient descent, optimization V prispevku je opisan nov način kvalitativne in kvantitativne analize rezidualnih plinov atmosfere v vakuumskem sistemu. Uvodoma je podan pregled različnih vrst uporabe masne spektrometrije, skupaj s pregledom dveh povsem različnih znanstvenih področij. Prikazan je nov način, ki pojmuje masne spektre v formalni vektorski obliki. Osnovni problem pri analizi masnih spektrov, ki ga ni mogoče enostavno rešiti, je njihova večličnost. Ena od možnosti je uvedba praktičnih izkušenj, ki algoritmu pomaga pri razrešitvi nekaterih dvoumij. Razvita in analizirana je psevdo-gradientna metoda za rezidualno analizo plinov iz masnega spektra. Razvito je bilo preizkusno okolje, v katerem je bil analiziran algoritem. Predstavljeno delo je temelj za široko študijo tehnik za analizo masnih spektrov. Pomemben koncept je virtualno okolje, ki vsebuje generator masnih spektrov, podatkovno bazo standardnih masnih spektrov, podatkovni prostor za hranjenje rezultatov preizkusov algoritmov, ki so nujni za kasnejšo primerjavo delovanja posameznih algoritmov. Virtualno okolje je tudi temelj za kasnejšo dinamično analizo masnih spektrov. Ključne besede: masna spektrometrija, rezidualna analiza plinov, gradientna metoda, optimizacija 1 INTRODUCTION The use of mass spectrometry is very widespread and versatile. Many types and various configurations of mass spectrometers are in use. Basically, there are two different ways that mass spectrometry is used: one is for the detection of substances through the fragmentation patterns detected by the mass spectrometers and the other is the use of mass spectrometry in vacuum science. The first is a very wide field that includes chemical, pharmaceutical, biological (from now on bio-chemical) and other purposes, mainly for the study of the structure of substances. The main feature of mass spectrometry that deals with the structures of numerous organic substances (combined with other methods - gas chromatography, etc.) is that one substance (or a small number of them) of a very complex and often unknown structure is introduced to the mass-spectrometer analyzer and the obtained mass spectrum is then analysed.1,2 Finding the fragmentation patterns of organic substances is a very complicated process, but necessary in order to implement the automatic mass-spectra identification.3 For this purpose several software products were developed and are widely used. To illustrate the unbalance in software support that utilizes the mass-spectra analysis in biochemical and vacuum fields, several products are listed. The bio-chemical field is very well supported by numerous commercial software packages. One such software product is the Mascot mass-database search program.4 This is a powerful search engine that uses mass spectrometry data to identify proteins from primary-sequence databases.5 6 Another one is Analyst software.7 A Microsoft Windows-based data system that provides instrument control and data analysis for the entire family of Thermo Scientific mass spectrometers and related instruments is the Xcalibur.8,9 Several specialized software modules have been designed to work with Xcalibur to meet the needs of specific applications. MassLynx10,11 is a fundamental platform for acquiring, analysing, managing and sharing mass-spectrometry information. The Sample List is a key feature of MassLynx, keeping everything about the samples together in one place. It is also the central place for initiating any activities relating to the sample. It provides Materiali in tehnologije / Materials and technology 43 (2009) 1, 39-42 39 I. BELIČ: THE PSEUDO-GRADIENT ALGORITHM FOR RESIDUAL GAS ANALYSIS general purpose and specialised application managers dedicated to providing information for specific types of mass-spectrometry analyses and data. Agilent Technologies MassHunter Workstation software12 is designed to streamline mass-spectrometry analysis, from instrument tuning through to the final report. It enables users to find all the compounds in samples, find differences between samples and find sample sets. The MassHunter workstation uses data-mining techniques to search the mass databases in public and private domains. Geneva Bioinformatics (GeneBio) has launched a software platform known as SmileMS for the identification and analysis of small molecules by mass spectrometry.13 SmileMS provides machine-independent software for the analysis of MS spectra from small molecules. It allows each user to add both public and private databases. Mascot Distiller5 is used to identify the mass spectra corresponding to proteins from the Information Non-Redundant Protein Sequences Database. The list of commercial software dealing with bio-chemical mass spectrometry is long and verstile14 (4000 Series Explorer, Analyst QS, BioAnalyst, Cliquid, DiscoveryQuant, GPS Explorer, LightSight, MALDI Imaging, MarkerView, Metabolite ID, MRMPilot, MultiQuant, Pro ICAT, Pro ID, Pro QUANT, ProteinPilot, SimGlycan, TissueView, etc.). These mass spectra are obtained by various types of instrumentation layouts, such as tandem mass spectrometers MS/MS3'15'1617'1819, collision-induced dissociation (CID) MS/MS20 time-of-flight mass spectrometers (TOF)16'18'21'22'23'24, ion-trap mass spectrometers19, quadru-pole/ion-trap mass spectrometers15 25, and the MALDI (Matrix-assisted laser desorption/ionization) technique used for mass spectrometry6 23, etc. In the bio-chemical field mass spectrometry is widely used in order to identify the peaks resulting from a chro-matographic separation.26 The most common approach to solve the problem for unknowns on whom very little other structural information is available is the use of a retrieval algorithm and a reference mass-spectra database. There are large reference databases of mass spectra at the NIST and Wiley libraries. The NIST/EPA/NIH mass spectral databases27 contain, in the 2008 version, 220,460 spectra of 192,108 unique compounds (for electron-impact mass spectrometry), and 14,802 spectra of 5308 precursor ions for tandem mass spectrometry. The largest database is the Wiley's Registry of Mass Spectral Data28, in 2008 containing 560,000 different spectra, over 348,000 spectra with chemical structures. It is one of the most comprehensive mass spectral libraries ever published. There are also specially designed mass-spectra search programs to make use of these vast databases. In vacuum science there are again two major categories of how mass spectrometry is used. The first, which is in terms of operation very close to the bio-che- mical uses, is the vacuum-system leak detection.29 The tracer gas is introduced to the system and again one gas fragmentation pattern is monitored by a mass spectrometer. Another and yet much more complicated category is residual gas analysis, which is a general term for the analysis of gas and vapour species in vacuum chambers and vacuum processes. The main feature of all the previously mentioned methods is that the main objective of it is the detection and analysis of various species that compose the vacuum-chamber atmosphere simultaneously. It is no more the case where one or a very small number of species is to be detected and compared to the known spectra in a database. Here, the number of "expected" vacuum-chamber atmosphere constituent gases is to be combined in such a way as to provide the partial pressures of each and every one of them. A search over the internet reveals that the field is far less equipped with software than is the case with bio-chemical mass spectrometry. Among the features found in the literature30, the software (MASsoft30, Merlin Research31, Questor 5 Process Analysis software31) provides the following interesting features: • Histogram, Trend Analysis and Analog peak displays. • Mixed mode scanning, e.g., Trend, Histogram and Analog peaks in multiple-windows • Simultaneous real-time display of graphical and tabular trend analysis data. • Peak-height identification. • Real-time background subtraction. • Automatic mass-scale alignment. • Statistical analysis of data in real time • Partial pressure ratios • Quantitative or normalised composition analysis • Visualisations In this article we focus on a special approach to qualitative and quantitative residual gas analysis. 1.1 Mass spectra as vectors The fragment patterns of atoms and molecules that may coexist in a vacuum system are composed of a sequence of number pairs, as shown by the CH4 example in Table 1.32 Looking at this annotation from another perspective, one can see that the sequence of numbers (Table 1 left) represents only a fraction of all possible m/u components (only from m/u = 12 to m/u = 17). The complete "picture" of the CH4 standard fragmentation pattern is presented in Table 1, on the right. The same is true for all the other constituents of a vacuum-chamber atmosphere. The fragmentation pattern can be rewritten in vector form as follows: GCH4 = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.4, 7.7, 15.6, 85.8, 100, 1.2) (1) 12 Materiali in tehnologije / Materials and technology 43 (2009) 1, 11-21 I. BELIČ: THE PSEUDO-GRADIENT ALGORITHM FOR RESIDUAL GAS ANALYSIS Table 1: CH4 fragmentation pattern Tabela 1: Fragmentacijski vzorec CH4 m/u Normalized amplitude 12 2.4 13 7.7 14 15.6 15 85.8 16 100 17 1.2 m/u Normalized amplitude 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 2.4 13 7.7 14 15.6 15 85.8 16 100 17 1.2 This is a classical vector annotation, where the number of all the used m/u components defines the dimensionality of the vector space. Any peak in a mass spectrum of a mixture of gases may consist of a combination of molecular ions and/or fragment ions. The contributions add linearly and the peak height for the discrete m/u is equal to the sum of the individual peak heights that would be produced if each constituent were alone in the system. This is the classical presumption of superposition.32 St =1 sj (2) J The sum in (2) is taken over all gases currently present in the vacuum chamber. The consecutive gas is marked by the index J; Si is the total peak height at the mass number i; sj is the peak height contribution from gas J at the m/u number i. sij is related to the fragmentation pattern, the analyzer sensitivity, and the partial pressure of gas J by the equation: sj = Ajpjgj (3) Where Aj is the analyzer's sensitivity to the gas J; Pj is the partial pressure of gas J; the principal peak of the standard fragment pattern is always set to 1. Other peaks at m/u = i represent the ratio to the principal peak and are denoted by gj. In a formal vector annotation the mass spectrum is a weighted sum of the constituent fragment patterns, such as: S = X WJGJ ; Wj = ajPj (4) S is the vector of the total peak heights; Wj is the weight, valid for the J-th constituent fragment pattern; and vector Gj, is the j-th gas-standard fragment pattern. However, the equation exactly models the ideal situation, where all the constituent components of the atmosphere are known a priori. In practice one must always expect the unknown substances to be present in the vacuum system and contribute to the mass spectra. Another problem that additionally complicates the interpretation of mass spectra is the inevitable noise produced by the instrumentation equipment. Both components add to the so-called noise. The equation can be therefore rewritten as: S = X WjGj + N ; Wj = ap■ N = Su + N (5) Where N is the vector representing the noise. The overall noise N is composed of two components: Su - the spectral components of unknown constituents, and Ne -the electrical noise. 1.2 The ambiguity of mass spectra The equations (4) and (5) formally represent the synthesis of the mass spectra. On the other hand, from the practical point of view, we are interested in a reverse process, i.e., analysis. The basic question of mass spectrometry is which constituents are composing the given mass spectra and in what quantities. From the mathematical theory, it is very well known that such a task is only achievable under the very strict condition of orthogonality, which must be fulfilled for all the constituent vectors. If this is not the case, the result can still be achieved, but it is never unique. Materiali in tehnologije / Materials and technology 43 (2009) 1, 11-21 21 I. BELIČ: THE PSEUDO-GRADIENT ALGORITHM FOR RESIDUAL GAS ANALYSIS Table 2: The example of mass-spectra ambiguity Tabela 2: Primer večličnosti masnega spektra Standard Fragmentation Patterns m/u Constituent Constituent Constituent A B C 1 1 2 1 2 1 1 2 Case 1 Multiplication 2 1 3 factors 1 m/u Constituent Constituent Constituent A B C 1 2 2 3 2 2 1 6 Point T 7 9 Case 2 Multiplication 1 1.333 3.333 factors 2 m/u Constituent Constituent Constituent A B C 1 1 2.666 3.333 2 1 1.333 6.666 Point T 7 9 Let us suppose we have three constituent gases, and that we observe only two m/u values: m/u = 1 and m/u = 2. The experiment to demonstrate the basic ambiguity of the mass spectra is shown in Table 2. For this purpose, the standard fragmentation patterns for three virtual gases were set. The next step is to generate the mass spectra that contains the three gases, For this, the multiplication factor for each constituent must be set and the mass spectra is created, according to equation (4), taking into account that all the sensitivity factors a,- are set to 1. The result of the mass-spectra synthesis is the actual spectra, or "target" point T, which represents the superposition of all three constituent gases, multiplied by their factors. The next step is the inverse process, meaning that we have the mass spectra - point T, and we seek the multiplication factors that produce that mass spectra. The figures in Table 2 clearly show that the same mass spectra can be achieved by the factors 2, 1, 3, and 1, 1.333, 3.333 as well. The experiment is graphically illustrated in Figure 1. Figure 1: The illustration of mass-spectra ambiguity Slika 1: Ilustracija večličnosti masnega spektra 12 Materiali in tehnologije / Materials and technology 43 (2009) 1, 11-21 I. BELIČ: THE PSEUDO-GRADIENT ALGORITHM FOR RESIDUAL GAS ANALYSIS When the mass spectra are considered to be vectors, the ambiguity in its interpretation can be clearly demonstrated. 1.3 Interpretation of mass spectra for residual gas analysis - general considerations It is obvious that from knowing only the resulting mass spectra it is next to impossible to reconstruct the proper contribution for each constituent spectrum. The first step in the mass-spectra analysis is the identification of the residual gases that constitute the observed spectrum. It is to be expected that the major residual constituents in vacuum systems are inorganic gases, such as H2O, H2, CO2, CO, O2, N2, Ar, Ne, etc. Most inorganic gas molecules found in vacuum systems are composed of 2, 3, or 4 atoms. Their fragment patterns are therefore simple. These gases are relatively easy to identify in the overall mass spectrum. There are some simple rules at the basis of the experimental work with mass spectrometry. These rules are extremely important for the understanding of mass-spectra properties. As such they can provide the valuable additional information needed for a more accurate mass-spectra identification.32 • The spectra of simple inorganic gas molecules generally have even mass numbers. For these inorganic gas molecules a simple ionization is more probable than fragmentation. • The parent peaks of almost all inorganic gases are the largest. They are found at an even mass number. • Mass spectra should not contain large quantities of highly reactive gases (F2, O2, etc.). Generally these gases are most easily adsorbed by the vacuum system walls. In cases where the mass-spectra analysis gives large quantities of such gases, the algorithm should be capable of suppressing their quantities (or to signal possible erroneous events, e.g., vacuum-system leaks) • If the mass peak at m/u = 14 (mostly N+) is larger than the mass peaks at m/u = 12 and 16 (C+ and O+ from CO) then an air leak is very possible. • The noble gases, He, Ne, Ar, etc., are usually not the dominant residual gases. They have a highly un-reactive nature and they can scarcely be found in the atmosphere. Like with the atmosphere, in vacuum systems where gas reactivity is important for efficient pumping, noble gases (especially Ar and Ne) can be observed in appreciable quantities. • A m/u = 1 peak greater than a few percent of m/u = 2 is often a sure indication of significant amounts of water vapor in the system. Please note that except for water, H+ is not a significant fragment of any gas usually seen in a vacuum system. Even with the standard spectra for H2O, the m/u = 1 is not plotted. The partial pressures of gases like H2O, CO, CO2, etc. can provide useful information about the progress of vacuum-system bakeouts. • A mass spectrum with a series of peaks that are separated by A m/u = 14 or A m/u = 15 indicates that there are hydrocarbons present in the vacuum system. • Organic gases are unstable, thus the parent peaks may not appear in the spectrum. Fragments often represent the dominant mass peaks. • Fragments with odd m/u numbers are expected to be the most populated. • The m/u peaks 57, 55, 53; or 43, 41, 39; or 29, 27; or combinations of these are a sure indication of organic species in the vacuum system. If the highest m/u of each of these series is the largest peak then the organic is of the saturated type, (forepump oil). If a lower mass number in the series is the largest, this is generally an indication of some degree of unsaturation (Multi-bonded carbon-carbon) and can be caused by some polymeric substance. 1.4 Descent techniques One of the numerous methods available to solve the problem of mass-spectrum identification is the use of the descent method. There are, in fact, several descent methods. In our research, the classical gradient descent method has been altered to the so-called pseudo-gradient descent method. We are well aware that any used method cannot overcome the basic problem of mass spectra: the ambiguity, therefore, is that one cannot expect the results of the proposed method to be flawless. Our aim is to find a method that will provide results that are good enough for practical use. Descent techniques33 are generally used for the solution of unconstrained minimization problems. In our case we are trying to find the combination of standard fragment patterns that yield the target, i.e., measured spectra. The standard fragment patterns are formalized as vectors, and we are seeking the set of multiplication factors, i.e., the weight that in the weighted sum forms the target vector. The distance from the calculated weighted sum to the target vector is measured by means of the classical Euclidean distance. The distance function is also often called the error function. This is the classical optimization problem. The unconstrained minimization problem is min D(w, x). Here, a value of the variable vector w = (w1, ..., wn)r is sought that minimizes the objective, in our case the distance function D(w, x). The distance is calculated in m-dimensional Euclidean space, spanned over the unit vector x = (x1, ..., xm)r. The problem min D(w, x) is a special case of the general nonlinear programming or optimization problem. For the sake of simplicity, the distance function D(w, x) is often denoted by D(w), bearing in mind that the distance function is primarily changed by w, but is calculated in the space defined by x. Materiali in tehnologije / Materials and technology 43 (2009) 1, 11-21 21 I. BELIČ: THE PSEUDO-GRADIENT ALGORITHM FOR RESIDUAL GAS ANALYSIS A new descent direction is generated for each iteration. The descent iteration may involve the evaluation of the first and possibly the higher order derivatives of the objective distance function. Each step of the descent process yields a considerable improvement of the objective function. The descent techniques involve a series of iterations that generally consist of three parts: 1) Determination of the direction of descent sk, 2) Determination of the descent length Ak, 3) Calculation of the descent step wk+1 = wk + Xk wk. The descent direction is an n-dimensional vector s = (s1, ... , Sn)T. It exists in the same space as the weights do and represents the direction in which weights should be changed in order to decrease the distance function. At the k-th iteration the direction vector sk originates at the current point wk. It points in a descent or "downhill" direction, i.e., the value of the objective function decreases from wk to a point at some distance in that direction. A unit vector sk is said to be the descent direction, with respect to the objective function D(w) at wk, if there is a Ap > 0, such that for all A satisfying Xp> X > 0 we have D(wk+') = D(wk + Isk ) < D(xk ) (6) If D(w) is differentiable, sk is a descent direction if lim D(wk + Xsk ) < D( xk ) _ dD(wk + Xsk ) X— 0 X dX = ( sk )T VD(wk ) < 0 (7) k-1 k -1 wk _ w0 +£X's' _ w0 +£ Aw' (10) l_0 l_ 0 At the k-th iteration a matrix AWk is defined by KWk _[Aw0,Aw1,...,Awk-1 ] (11) That is, the columns of AWk are the k descent steps Aw0, Aw1, Awk-1 preceding Awk. The choice of the locally steepest direction as a descent direction leads to the steepest descent techniques. A locally steepest direction is obtained if the descent step Awk minimizes Ad _f dD( w k,x k ) Awk dw. (12) The distance between two points x1 and x2 in the m-dimensional space is defined as D(x1, x2) _[(x1 - x2)T A(x1 - x2)]1 (13) Where the VD(w) denotes the gradient of the objective function D(w) evaluated at the point wk. If D(w) is differentiable, the product (sk)rVD(w) of the directions sk and the gradient VD(w) is, by definition, the directional derivative of D(w) in the direction sk evaluated at wk. If this directional derivative exists and is negative, then sk is a descent direction. The descent step-length is a scalar. It is a measure of the distance along the descent direction sk between two successive iteration points wk and wk+1. In other words, at the k-th iteration a step of length Ak is taken from point wk to the point wk+1. The Descent Iteration A typical descent iteration can be summarized in the following steps: 1) Compute a descent direction sk = (s1k,..., s/)T 2) Compute a descent step-length k 3) Perform a descent step to obtain a new point wk+1 _ wk + Xk sk (8) The k-th descent step is defined as follows Awk _ wk+1 - wk _ Xk sk (9) A sequence of k descent steps leads from a starting point w0 to a point wk given by A is a positive definite m x m symmetric metric matrix. The definition of positive definiteness ensures that D(x1, x2) > 0 for any nonzero x1 and x2. In general, it is not necessary to use the same unit of distance along the different coordinate axes. The choice of I, m x m identity matrix as a metric, i.e., A = I, leads to the first-order steepest descent technique. Rescaling of the variables, e.g., x = Bx, is equivalent to introducing a new metric matrix relative to the old coordinate system x. It is not necessary to use the same metric matrix throughout the whole iterative process of a descent technique. 2 EXPERIMENTAL WORK Our aim is to design an algorithm that is able to perform qualitative and quantitative mass-spectra analysis. It is not possible to undertake the task without the formation of a virtual, simulated environment where various aspects of mass-spectra analysis can be addressed. In real vacuum systems, it is very expensive and time-consuming work to create a large amount of different situations to test primarily the algorithms. We want to establish the situation where the creation of analyzed spectra is completely under our control. Undivided attention can therefore be placed on the studied algorithms. All the problems are then concentrated on algorithms alone. When in a situation where the behaviour of the algorithm is very thorough, it can be implemented on real vacuum systems, giving the chance to assess its behaviour in real circumstances. At this stage of the implementation of the algorithm in "real life" is a process where the majority of problems is due to the used equipment, while the algorithm's behaviour is known, tested and understood. Such an approach is not a very common one, and it is almost general practise that two problems are studied at X=0 12 Materiali in tehnologije / Materials and technology 43 (2009) 1, 11-21 I. BELIČ: THE PSEUDO-GRADIENT ALGORITHM FOR RESIDUAL GAS ANALYSIS once, i.e., the mass spectra in a real situation and the algorithm. 2.1 The environment The environment that binds together the generation of the mass spectra, the various algorithms for the mass-spectra identification, the experiments database and the evaluation procedures was established. The first step is to create the environment where static mass-spectra analysis can be performed. By static, it is meant that the mass spectrum is created and the algorithm to analyze the spectrum is activated. During the analysis time, the mass spectrum is fixed. Our intention is to create the environment where mass spectrum is generated as the time variant spectrum, i.e., dynamic. The environment (Figure 2) enables the study of behaviour of different algorithms in noisy conditions. Figure 2: The environment for the generation of mass spectra, algorithms for the mass-spectra constituents analysis, the experiments database and evaluation Slika 2: Okolje za generiranje masnih spektrov, algoritmi za analizo, podatkovna baza za shranjevanje podatkov poskusov in njihovih obdelav rivu r 1 H 2 He 1 0.0500 0.0000 2 1,0000 0,0000 3' 0,0000 0,0000 4 0,0000 1,0000 5' 0,0000 0,0000 0000 0000 0000 0000 ch4 nh3 "0,0000 0.0000 0,0000 0,0000 0,0000 0,0000 10 11 12 6 0,0000 0. 7 0.0000 0. 8 0.0000 0. 9 0,0000 0. 10 0.0000 0. 11 0.0000 o, 12 0.0000 0, 13 0.0000 0. 14 0.0000 0, 15 0.0000 0. 1g 0.0000 0, 17 0.0000 0. 18 0.0000 0. 19 0.0000 0. 20 0,0000 0. 21 0,0000 0. 22 0.0000 0. 23 0.0000 0. 24 0.0000 0. 25 0,0000 0. 0,0000 0,0000 OOOO 0,0000 0,0000 0,0000 0,0000 0.0000 0,0000 0,0000 0,0000 0,0000 0000! 0.0000 0.0000 0000 0.0000 0,0000 0000 0,0240 0,0000 00001 0,0770 0,0000 " 0.1560 0.0220 0.8580 0.0750 1,0000 0,8000 ooool 0,0120 1.0000 0000' 0,0000 0.0040 0000 0,0000 0.0000 0000 0,0000 0.0000 .0000 0.0000 .0000 0.0000 ,0000 0.0000 .0000 0,0000 ,0000 0,0000 0,0000 0.0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0.0000 0,0000 0,0000 0,0000 0.0000 0,0000 0,0000 0.0000 0.0000 0,0000 0,0000 0.0000 0,0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0,0110 0,0000 0,2300 0.0000 1,0000 0.0000 0,0010 0,0000 0,0030 1,0000 0,0000 0,0030 0,0000 0.0990 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 :2h2 c2h4 n2 co c2h6 0,0000 0,0000 0.0000 0,0000 0.0000 0.0000 0,0000 0,0000 0,0000 0,0000 0.0000 0,0000 0,0000 0,0000 0,0000 0.0000 0,0000 0,0000 0,0000 0,0000 0.0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0.0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000. 0,0000 0,0000 0,0000 0.0000 0,0000 0.0000 0,0000 0,0000 0.0000 0,0000 0.0000 0,0000 0,0000 0.0250 0,0210 0,0000 0.0450 0,0000 0.0560 0,0350 0.0000 0,0000 0,0000 0,0020 0.0630 0.0720 0.0060 0,0340 0.0000 0.0000 0.0000 0.0000 0,0460 0,0000 0,0000 0,0000 0,0090 0,0000 0,0000 0,0000 0.0000 0,0000 0.0000 0,0000 0,0000 0.0000 0,0000 0.0000 0,0000. 0,0000 0,0000 0,0000 0.0000 0,0000 0,0000 0,0000 0,0000 0.0000 0,0000 0,0000 0,0000 0,0000 0.0000 0.0000 0.0000 0,0000 0.0000 0.0000 0,0000 0,0000 0,0000 0.0000 0,0000 0.0560 0,0370 0,0000 0,0000 0,0000 0,2010 0,1170 0,0000 0,0000 0,0420 NO ch40 0,0000 0.0000 0,0000 0.0000 0,0000 0,0000 0,0000 0.0000 0,0000 0,0000 0.0000 0.0000 0,0000 0,0000 0,0000 0,0000 0.0000 0,0000 0,0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0750 0.0000 0.0240 0.0000 0,0150 0.0000 0,0000 0.0000 0,0000 0.0190 0,0000 0.0000 0,0000 0.0000 0,0000 0.0000 0.0000 0.0000 0,0000 0,0000 0,0000 0.0000 0^0000 0,0000 Figure 3: The standard mass-spectra pattern database (section) Slika 3: Podatkovna baza standardnih masnih spektrov (izrez) Figure 4: The structure of the experiments database Slika 4: Struktura podatkovne baze poskusov The core of the environment is the database, including the standard fragment patterns for molecular ions and/or fragment ions (Figure 3). In our experiments, 47 gases, with m/u ratios from 1 to 47 formed the database. However, by no means is the system limited to these figures. The database leaves the user to freely add or remove the standard fragmentation patterns. The database is a vital element of the environment since it provides the standard fragment patterns as vectors. The data is used for the generation of the mass spectra, which is done in two different ways. One is random generation, where the multiplication factors Wj from equation (4) are chosen randomly. Another option is the user-defined selection of Wj, where the user can make the selection of gases to compose the vacuum-system atmosphere. The third option makes it possible to add noise to the generated spectrum -equation (5). Once the spectrum is generated it can be edited by the user. The spectrum-generation process results in the vector annotation of mass spectra. As such it is prepared to be analyzed by various algorithms. Each experiment is saved in the so-called experiments database, which consists of separate files, each for a separate experiment. Figure 4 depicts the structure of the experiments database. Each experiment file holds the data of the generated spectrum and the results of its analysis by the algorithm. The following data and the graphical representations describe the experiment: 1) Each experiment starts with the random generation (Figure 2) of virtual vacuum system atmosphere composition in terms of multiplication factors w. Multiplication factors are stored, and shown in Materiali in tehnologije / Materials and technology 43 (2009) 1, 11-21 21 I. BELIČ: THE PSEUDO-GRADIENT ALGORITHM FOR RESIDUAL GAS ANALYSIS Figure 5: The data stored in the experiment file Slika 5: Podatki, shranjeni v datoteki, ki pripadajo enemu poskusu Figure 5 in rightmost table (the column entitled COMPOSITION). The same data is graphically represented in Figure 6, lower graph. Please note that the randomly generated data is presented as the left (blue) bars in a bar graph. 2) Once the randomly generated composition is known, the mass spectrum is formed (Equation 4), where standard fragmentation patterns (Figure 3) and the corresponding weights are combined to form the spectra (the column COMPOSITION in the middle table of Figure 5). The same data is graphically shown in Figure 6 middle bar graph. Again the generated spectrum is shown by the left (blue) bars. These data form the so called "target" for the mass spectrum analysis algorithm. 3) At this stage the spectrum generation process is complete the data is stored and graphically presented. The reverse process of spectrum analysis can commence. The pseudo gradient algorithm (or any other) works on a spectrum (column COMPOSITION in middle table of Figure 5) and seeks the multiplication factors w that best fit the target spectrum. The calculated weights are stored in column CALCULATION in rightmost table of Figure 5. The same data is graphically presented in lower bar graph (the right (red) bars) of Figure 6. Similarly the calculated spectrum is stored in column CALCULATION in middle table of Figure 5, and graphically presented in middle bar graph (again the right (red) bars). 4) The data that describe the convergence of the used algorithm (the leftmost table of Figure 5) consisting of consequent iteration counter (column ITERATION), and the current distance to the given target spectrum (column DISTANCE TO TARGET) is stored. The same data is graphically represented in Figure 6 - upper graph. 5) The lower right table in Figure 5 summarizes the five mass spectra constituents where the errors produced by the tested algorithm were absolutely the largest. 6) In addition the number of iterations required to fulfil the preset algorithm termination criteria, and the overall RMS error for all mass spectra constituents (Figure 5 upper right part) are presented. In this stage, the system is prepared to generate and analyze static mass-spectra vectors. The design of the environment is such that a dynamic module can easily be added in order to generate and simultaneously analyze the time profiles of the mass spectra. 2.2 The pseudo-gradient descent algorithm The classical gradient approach requires the calculation of the first derivative of the distance to the target function D(w, x). The distance function in the D(., x) is the standard Euclidean distance (13), unfortunately the dependence D(w, . ) is more complicated. In our approach we have decided to calculate the differences produced by the small changes of w rather than produce the necessary step directions from the formal first derivative of the function. Therefore, the formal gradient descent algorithm becomes the pseudo-gradient algorithm. • Read the data from the standard fragment pattern database. • Calculate increments for all gases (These are the differences used for the calculation of pseudo gradients. Increments mean that each m/u component of the standard fragment pattern is multiplied by a small value - 0.001). • Define the memory data space for the error function. For each iteration, the error value represents the distance from the current mass spectrum produced as the result of current weight values, and the target spectrum. The error function is the dependence of the error and the iteration counter. • Read the target spectrum. • Set the initial weights for all possible gases included in the database to 0.01 - this is the starting point where the gradient descent starts. • Calculate the first distance from the spectra that uses the initial weights and the target spectrum. • Calculate the step-length A* for the current iteration. The algorithm uses the variable step-length k, which is set to 1/20 of the current distance (Equation 13). This step is necessary to ensure the smooth convergence of the algorithm. • Start the iteration loop, which is executed until the actual distance remains larger than 0.1 or the repetition counter remains under 3000. o Calculate the differences in distance that are caused by making very small differences - increments in the gas factors - weights. Both directions are 12 Materiali in tehnologije / Materials and technology 43 (2009) 1, 11-21 I. BELIČ: THE PSEUDO-GRADIENT ALGORITHM FOR RESIDUAL GAS ANALYSIS Figure 6: The graphical representation of mass-spectra analysis data Slika 6: Grafična predstavitev podatkov analize masnih spektrov probed and the change that produces the reduction of distance to the target is maintained. Special care must be taken during reduction, while the weights for each constituent gas must always remain positive. The lowest possible weight value is therefore 0. o Keep only incremental changes that produce a reduction of the distance to the target. o Find the direction of downhill change - the pseudo gradient (12). o Use the step-length and direction vector to form the new, closer set of weights. o Calculate the new distance (6). o Calculate the new step-length 2k+1, again as 1/20 of the actual distance to the target. • As the algorithm executes iterations, the data structures (Figures 5 and 6) are consequently filled with data. 3 RESULTS AND DISCUSSION The mass spectrum is the linear superposition of the individual peak heights of the constituent gases. The standard fragmentation patterns of the constituent gases are not orthogonal; therefore, the algorithms that are used for the mass-spectra identification produce ambiguous results. The experiment was designed to probe the pseudogradient descent algorithm for 1000 randomly generated spectra. The pseudo gradient algorithm shows excellent convergence. In all 1000 examples the convergence was without any detected instabilities. For all the performed tests the error function is a monotonously decreasing one. It is important to note that the algorithm seeks the values for weights that produce the mass spectrum as close as possible to the given spectra. This means that the end spectrum is always as near as possible to the preset error tolerance in the m/u "space", while the actual weight values can sometimes produce serious errors in the weight space. The average starting error for all 1000 tests was 23.84, and the error tolerance to stop the algorithm was arbitrarily set to 0.1. A total of 142 tests out of 1000 did not fulfil the pre-set condition of error tolerance in 3000 iterations. All the others did the job in an average 802 iterations. The average error for the 142 failed tests was 0.14. The problem of 142 unfinished tests could be easily solved by moving the number of allowed iterations from 3000 to some higher value. The arbitrarily set error tolerance is also very easy to change - according to the needs posed by the concrete problem. Figure 7 plots the root-mean-square errors (of the calculated weights) for all the constituent gases and for all 1000 tests. Since the mass spectrum analysed by the tested algorithm is always exactly known (it is generated by the mass-spectrum generator), the error of the performed analysis can always be calculated (not only assessed, as is the case during the analysis of real data). The root-mean-square error (RMS) is calculated by equation 14. Figure 7: Root-mean-square error for the calculated weights of the constituent gases Slika 7: Srednjekvadratna napaka izračunanih uteži plinov, ki sestavljajo atmosfero v vakuumskem sistemu Materiali in tehnologije / Materials and technology 43 (2009) 1, 11-21 21 I. BELIČ: THE PSEUDO-GRADIENT ALGORITHM FOR RESIDUAL GAS ANALYSIS Figure 8: Mass-spectra standard fragment patterns - the black squares represent the presence of a peak Slika 8: Standardni masni spektri - črni kvadrati predstavljajo prisotnost posameznega vrha v spektru Where Ej represents the RMS error for the j-th constituent gas, the index i is the test counter that runs from 1 to 1000, Wj is the weight of the j-th constituent gas (the result of the analysis) during the i-th test, wtj is the target weight (of the generated spectrum) of the j-th constituent gas during the i-th test. The value for N is, in our case, 1000. From Figure 7 we can conclude that there is a group of constituent gases where analysis gives excellent results, almost without noticeable errors. Such gases are: H2, He, CH4,NH3, H2O, Ne, H2S , A, C3H6, C2HCl3. The group of constituent gases with a "medium" error would consist of C2H2, C2H4, NO, CH4O, O2, C3H8, C2H4O, C2H6O, NO2, CH2O2, C4H10, C3H6O. The largest errors are detected for N2, CO, C2H6, CO2, N2O. The reason for the errors is mainly in the mentioned mass-spectra ambiguity. Putting it another way: the standard fragmentation patterns for the gases overlap, so it is very hard to distinguish which is which. In Figure 8 the existence of the peak in the fragment pattern is symbolized as a black square. The constituent gasses that can be recognised precisely are shown with a grey background, while the dotted area represents the gases with the poorest results. 4 CONCLUSION The presented work is a foundation for the comprehensive study of mass-spectrum analysis techniques. It introduces several new ideas in the field of mass spectrometry. The most important concept is the virtual environment that provides the mass-spectrum generator, the space for the various algorithms that can test various approaches to the mass-spectrum identification, the database of the achieved results, which is extremely important for the comparison of different algorithms, the backbone for the dynamic mass-spectra generation and analysis. The environment also makes it possible to analyse the immunity to noise for all algorithms. The main purpose of the virtual environment is the controlled mass-spectra data, which allows an exact evaluation of the errors produced by the studied algorithm. Normally, such algorithms are developed and tested directly on "live" data, which combines the problems that originate from the measurements environment with those produced by the algorithm, often without a real chance of dividing the two. The approach with the virtual environment makes it possible to study problems regarding the algorithms alone, prior to applying them to the "live" environment. Such an approach also makes it possible to run numerous tests in a relatively short time, which would never be possible with real vacuum systems. The pseudo-gradient descent algorithm for mass-spectrum identification was proposed and tested. The results are promising, taking into account that additional work will be needed to overcome the mass-spectrum ambiguity. The steps that will be tested are: 1) Introduction of "common knowledge" regarding the mass spectrometry and identification techniques. 2) Finding out how does the selection of the initial point influence the algorithm's performance. 3) Introduction of several steps in the identification process - after the first run of the algorithm the constituent gases that can be detected with the highest precision should be removed and the algorithm should be run again for the remaining gases. This step should be repeated several times. Another problem is the study of a dynamic mass spectrum, i.e., the analysis of mass-spectrum time 12 Materiali in tehnologije / Materials and technology 43 (2009) 1, 11-21 I. BELIČ: THE PSEUDO-GRADIENT ALGORITHM FOR RESIDUAL GAS ANALYSIS profiles. The virtual environment is prepared to enable such studies. 5 REFERENCES 1 K. Schulz, K. Schlenz, S. Malt, R. Metasch, W. Römhild, J. Dreßler, D. W. Lachenmeier, Journal of Chromatography A, 1211 (2008), 1-2, 113-119 21. S. Sheoran, A. R. S. Ross, D. J. H. Olson, V. K. Sawhney, Plant Science, 176 (2008), 99-104 3 J. Chena, X. Li, C. Suna, Y. Pana, U. P. r Schlunegger, Talanta, 77 (2008), 152-159 4 A. Cingöz, F. Hugon-Chapuis, V. Pichon, Journal of Chromato-graphy A, 1209 (2008), 95-103 5 http://www.matrixscience.com/ 6 H. M. Santosa, C. Mota, C. Lodeiroa, I. Mouraa, I. Isaacb, J.L. Capelo, Talanta, 77 (2008), 870-875 7 https://products.appliedbiosystems.com/ 8 http://www.thermo.com/com/cda/product/detail/ 0,1055,1000001009250,00.html 9T. Murakamia, T. Kawasakia, A. Takemuraa, N. Fukutsua, N. Kishia, F. Kusud, Journal of Chromatography A, 1208 (2008), 164-174 10 http://www.scientific-computing.com/products/product_details.php? product_id=207 11 S. Shia, Y. Zhaob, H. Zhouc, Y. Zhanga, X. Jianga, K. Huanga, Journal of Chromatography A, 1209 (2008), 145-152 12 http://www.chem.agilent.com 13 http://scientific-computing.com 14 http://www3.appliedbiosystems.com 15 L. Lionetto, A. M. Lostia, A. Stigliano, P. Cardelli, M. Simmaco, Clinica Chimica Acta, 398 (2008), 53-56 16 M. Mezcua, C. Ferrer, J. F. Garci'a-Reyes, M.J. Marti'nez-Bueno, M. Sigrist, A. R. Fernandez-Alba, Food Chemistry, 112 (2009), 221-225 17 K. M. Robinson, J. T. Morr, J.S. Beckman, Archives of Biochemistry and Biophysics, 423 (2004), 213-217 18 M. Y. Zhanga, N. Kagana, M. A. Sungb, M. M. Zaleskab, M. Monaghanb, Journal of Chromatography B, 874 (2008), 51-56 19 W. C. Chau, J. Wu, Z. Cai, Chemosphere, 73 (2008), S13-S17 20 X. Deng, G. Gao, S. Zheng, F. Li, Journal of Pharmaceutical and Biomedical Analysis, 48 (2008), 562-567 21L. Qi, J. Cao, P. Li, Q. Yu, X. Wen, Y. Wang, C. Li, K. Bao, X. Ge, X. Cheng, Journal of Chromatography A, 1203 (2008), 27-35 22 H. Qu, B. Li, X. Li, G. Tu, J. Lü, W. Sun, Microchemical Journal, 89 (2008), 159-164 23 E. Vanlaere, K. Sergeant, P. Dawyndt, W. Kallow, M. Erhard, H. Sutton, D. Dare, B. Devreese, B. Samyn, P. Vandamme, Journal of Microbiological Methods, 75 (2008), 279-286 24 R. Wihlborg, D. Pippitt, R. Marsili, Journal of Microbiological Methods, 75 (2008), 244-250 25 J. Radjenovic, S. Péreza, M. Petrovič, D. Barceloa, Journal of Chromatography A, 1210 (2008), 142-153 26 R. Oprean, L. Oprean, M. Tamas, R. Sandulescu, L. Roman, Journal of Pharmaceutical and Biomedical Analysis, 24 (2001), 1163-1168 27 http://www.sisweb.com/software/ms/nist.htm 28 http://eu.wiley.com/WileyCDA/Section/id-301546.htm 29 A. Calcatelli, M. Bergoglio, D. Mari, Vacuum, 81 (2007), 1538-1544 30 http://www.hidenanalyticalinc.com 31 http://www.extrel.com 32 M. J. Drinkwine, D. Lichtman, Partial pressure analyzers and analysis, American Vacuum Society, Milwaukee, 1979 33 J. L. S. Samuel, Iterative methods for nonlinear optimization problems. Prentice HALL, Englewood Cliffs, 1972 Materiali in tehnologije / Materials and technology 43 (2009) 1, 11-21 21