Classification of weathered crude oils using multimethod chemical analysis, statistical methods and SIMCA pattern recognition

Classification of weathered crude oils using multimethod chemical analysis, statistical methods and SIMCA pattern recognition

MarinePollutionBulletin Marine Pollution Bulletin, Volume17, No. 8, pp. 366-373, 1986. Printed in Great Britain. 0025-326X/86 $3.00+0.00 0 1986Perga...

729KB Sizes 0 Downloads 6 Views

MarinePollutionBulletin Marine Pollution Bulletin, Volume17, No. 8, pp. 366-373, 1986.

Printed in Great Britain.

0025-326X/86 $3.00+0.00 0 1986PergamonJournalsLtd.

Classification of Weathered Crude Oils Using Multimethod Chemical Analysis, Statistical Methods and SIMCA Pattern Recognition K. URDAL, N. B. VOGT, S. P. SPORSTOL, R. G. LICHTENTHALER, H. MOSTAD, K. KOLSET, S. NORDENSON and K. ESBENSEN* Department of Analytical Chemistry, Center for Industrial Research, P.B. 350, 0314 Blindern, Oslo 3, Norway. *Norwegian Computing Center, Forskningsveien I b, 0314 Blindern, Oslo 3, Norway.

As part of a continuous research programme on oil spill identification new methods in computerized statistical pattern recognition are being investigated. Twenty six artificially weathered crude oils have been analysed for NI,V,N-Ci7 Alkane,Pristane,N-Cls Alkane,Phytane,Sulphur, Triterpanes and Steranes. These parameters, 47 in all, have been screened to investigate the possibility of classification and distinction between the different sources. Four different statistical methods were considered: absolute ranking ('peak-index-fit'), T-test, K nearest neighhour (KNN) and soft independent modelling of class analogy (SIMCA). The results show that all groups of oils may be separated by geographic origin using a 'peak-indexfit' ranking, a simple T-test and KNN. SIMCA is the only method capable of distinguishing between sources within the same geographical region. Based on the experience with these methods a statistical multimethod approach for identification of unknown oils from fingerprint data bases is suggested.

Oil spill identification is important for both judicial and environmental reasons (Bentz, 1978). Recently, the Nordic countries agreed on a Nordic standard for a chemical multimethod approach to identifying oil spill sources (NT CHEM 001). This involves chemical analyses by Infrared Spectroscopy, High Resolution Gas Chromatography (HRGC), X-ray fluorescence and Atomic Absorption, alternative Inductively Coupled Plasma emission spectrometry (ICP). The choice of chemical analytical methods have in part been based on the experience of and the methods reported used by the Institute of Petroleum Oil Pollution (Anon, 1974) and by the US Coast Guard (Anon, 1977). The use of High Pressure Liquid Chromatography (HPLC) and HRGC combined with low resolution Mass Spectrometry have been suggested as supplement in specially difficult cases. 366

At our laboratory all the methods mentioned have been used successfully on several occasions to identify sources of oil spilled along the Norwegian coast. This has been done by visual comparison of suspected oil source fingerprints and the fingerprint of the spill oil analysed. With the development of off-the-shelf computer programmes for multivariate analysis of complex data sets several approaches to identifying oils using statistical methods and pattern recognition have been reported (Chien & Killeen, 1977). Infrared analyses combined with multivariate normal probability analysis has been used to classify unweathered oils and lubricating oils (Mattson et al., 1977). Infrared analysis combined with the log-ratio method and the vector method have been used to identify weathered light fuel oils (Anderson et al., 1980). Sjostrom & Kowalski (1979) used seven trace elements and five different multivariate techniques (LLM, LDA, Bayesian, KNN and SIMCA)* to analyse 40 classes of crude oils with 10 samples in each class. These data have previously been analysed statistically by Duewer et al. (1975). These classes consisted of one oil sample and nine with different artificial weatherings performed on this oil sample. Duewer et al. (1975) found SIMCA and KNN to give superior results to quantitative matchings. Sjostrom & Kowalski (1979) reported that the SIMCA and the Bayesian classification rule were found to be best for classification, although SIMCA is the only method used in this investigation which can deal with outliers, i.e. level-two problems (Albano et aL, 1978), in multivariate analysis (Sjostrom & Kowalski, 1979). Clark & Jurs (1979) analysed four oil types by gas chromatography, before and after weathering, and used 13 parameters with Bayesian discriminant analysis and the non-parametric linear learning machine method. They found that the normal alkanes from C16 through * L L M , L i n e a r L e a r n i n g machine; LDA, Linear Discriminant Analysis.

Volume17/Number8/August1986 C25, pristane, phytane and the unresolved background were not invariant through weathering processes. All these investigations suggest that use of multivariate analysis is a powerful tool when wanting to identify or classify a weathered oil to a source. The problem is also often not only one of sample classification, level 1 pattern recognition, but also involves level 2 pattern recognition for outlier identification and ranking. SIMCA principal component analysis has been found to be a powerful method capable of handling these types of problems (Sjostr, m & Kowalski, 1979), although fuzzy clustering has recently been used to identify sources of polycyclic aromatic hydrocarbons (Thrane, 1984). At the same time statistical analysis and classification should not be based on one multivariate method alone (Massart & Kaufmann, 1983). Previous investigations using pattern recognition methods have relied on using data from one chemical analytical method. One chemical analytical method alone may not give enough information to handle the complex changes which occur during weathering (Bentz, 1976). The complexity of crude oil composition and the weathering processes therefore suggests that several chemical analytical and statistical classification methods, using different classification strategies, should be investigated for use in identifying oil spill sources from weathered oils. The approach suggested in this work combines several chemical analytical methods, trace elements (Ni, V), analysed by ICP, pristane/phytane, triterpanes and steranes, analysed by HRGC-MS, and sulphur analysed by XRF (Sporstol & Lichtenthaler, 1983) with four different data-analytical methods which independently in a four stage confirmative statistical analyses classify weathered oils into groups.

TABLE 1 The crude oilsthat havebeen analysed,withcountryof origin. Crudeoil Country Crude oil Country StatfjordB Norway SovietExport Soviet Statfjordblend Norway Topped Romaskino Soviet StatfjordA* Norway Romaskino Crude Soviet ValhallA Norway Nigerian Light Nigeria EkofiskA Norway Nigerian Medium Nigeria EkofiskB Norway NigerianForcados Nigeria Edda Norway Es Sider Libya EldfiskB Norway Iranian Heavy Iran Teesidepipeline U.K. IranianLight Iran Forties U.K. Murban Abu-Dhabi Brent U.K. StatfjordA# 1 Norway Beryl U.K. StatfjordA#2 Norway Flotta U.K. StatfjordA# 3 Norway Buchan U.K. StatfjordA#4 Norway Gorm Denmark StatfjordA#5 Norway

*This StatfjordA crudeoil has not bccnweathered. been analysed by ICP. Sulphur has been analysed by X-ray fluorescence. The triterpane and sterane identifications and quantifications are done by computerised HRGC-MS and have been based on retention times and masses m/z 191 and m/z 217 respectively, a typical m/z 191 and m/z 217 fragmentogram is given in Fig. 1, identifying the compounds selected for statistical analysis. For the 16 triterpanes and 25 steranes data have been normalized on the base-peak for m/z 191 and mYz 217 respectively. Table 2 contains selected typical examples of the data matrix applied in the SIMCA analysis. Quality assurance is an important part of chemical analyses. All the methods used at our laboratory are continuously controlled through participation in international intercalibration programmes and by internal quality assurance tests. The reproducibility of the analyses, for n =5 samples, used in this paper ranges between 1% for Cl7/pristane ratio and 13.6% for the Ni content.

M a t e r i a l s and M e t h o d s

Statistical Analysis Standardized Weathering of Crude Oils The crude oils used in the present work were supplied from several sources (Table 1). The artificial weathering is based on a US Coast Guard method (Anon, 1977). Sunlight is simulated by a UV lamp and wind by a fan. Crude oil (12 g) is added on top of water in a petri glass dish with a radius of 18 cm. The thin oil film is then subjected to radiation from a Osram-Vitalux lamp (300 W) at a distance of 20 cm for 6 h. At the same time evaporation is aided by blowing air over the dish at a speed of 2.6 m s-~ (Anderson etal., 1980).

Chemical Analysis The inorganic chemical variables analysed were the elements Ni, V, S and the organic variables were the C J p r i s t a n e ratio, the Cms/phytane ratio, the ~.-triterpanes/~-steranes ratio, 16 triterpanes and 25 steranes. The methods of chemical analyses have previously been reported by Sporstol & Lichtenthaler (1983). The chemical raw-data on which this paper is based may also be found in the same report. Nickel and vanadium have

Four different approaches of statistical interpretation have been investigated. These are: 1. A simple absolute ranking equivalent to a 'peakindex-fit' using summed absolute differences between 47 parameters. 2. A t-test statistic ranking where the number of parameters having a significant difference (t-test, p---0.999) between two oils are summed and used as a ranking. If one parameter shows a difference the oils are considered to be distinguishable. For the t-test the differences are expressed as numbers between 0 and 47, 47 being the largest number of possible different parameters using the NT CHEM 001 multimethod approach. The absolute ranking list is calculated by dividing the absolute parameter difference between the reference oil and the actual 'unknown' oil on the mean value of the reference oil and the 'unknown' oil. Only those parameters having differences found to be statistically different (t-test, p = 0 . 9 5 ) are used. The number obtained for each parameter are then summed over all parameters. The oil with the largest difference to the reference oil is set to 10 and the others are expressed as numbers between 0 and 10, relative to this, see Table 2. From the t-test statistics 30 parameters were selected 367

Marine Pollution Bulletin 17

16e'01 12

191 8

7

!

39.9-

31

'

'

23

I/. 15 .'

'

I

'

30111

2e

217

27

'

'

I

'

'

38

371

3

29

'

5

'

55s25

'

/,2 ~3

I

3233

~00 5213e

'

~

I

'

'

58:20

'

'

," ~

I

18

'

~

'

I

'

_

'

22

'

'

I

'

'

'

'

I

'

'

'

'

I

~

'

'

ft s 15

'

64118

!

671M

TII~

~tN

Fig. 1 Mass fragmentograms of masses m/z=191 for triterpanes and m/z~217forsteranes. Numbers identify peaks used as paranieters. TABLE 2 Content of sulphur and vanadium and ratios of C~7/pristane, C,/phytane and ~-tdterpanes/~:steranes for typical samples of artificially weathered crude oils. The values for triterpanes and steranes used are obtained from HRGC-M8 and are normalized by dividing on the basepeak, i.e. 100% peak, in mass fragmentograms for the two masses 191 and 217 respectively. Variable/Oil % Sulphur ppm Vanadium C i7/Pristanc C,/Phytane ~-Triterpane/}'-Sterane

Statfjord

Iranian

Soviet

Nigerian

0.41 1.28 1.81 2.28 2.59

2.04 44.80 2.99 2.25 3.53

2.06 22.70 2.13 1.60 3.55

0.25 3.47 0.39 1.08 14.04

which had the best separation power for most of the crude oils analysed. These were then used for both the KNN and the SIMCA analysis. These parameters (variables) were Sulphur, Vanadium, the C J p r i s t a n e ratio, the Ci/phytane ratio, the ~-triterpanes/~steranes ratio, 12 triterpanes and 12 steranes. 3. K five nearest neighbour (K5NN). This allows the euclidian distance between objects placed in a multidimensional space to be investigated (Duewer et al., 1975; Sjostrom & Kowaiski, 1979; Kvalheim, 1984). 4. SIMCA pattern recognition using principal components (Wold et al., 1984) in both the unsupervized and supervized, i.e. disjoint mode (Albano et al., 1978). The mathematical basis has been described by Wold & Sj~strom (1977) and Wold et al. (1984), and a 'easy-tounderstand' graphical description may be found in Albano et al. (1978). Applications within environmental analysis may be found in Sjostrom & Kowalski (1979) and in Grahl-Nielsen et aL (1983) and Vogt (1984). This pattern recognition technique allows classes (groups of samples) to be identified in the multivariate space and includes the possibility of determining sample outliers (Wold et al., 1984) by using an approximate F-test to 368

compare sample residual standard deviation to class standard deviation (Wold & Sjostrom, 1977). A ranking list may be made by using the residual standard deviation of each sample when compared to a class as a measure of distance to class (Knutsen & Vogt, 1985). Variable importance in describing classes or distinguishing (discriminating) between classes may be found either from variable loading plots (Vogt & Knutsen, 1985) or by determining parameter modelling and discrimination powers (Wold & Sj~strom, 1977). In the present work the Statoil crude oils were used as a reference group (class) and residual standard deviations of all oils compared to this class used as basis for a ranking list.

Results Table 3 shows the results from the SIMCA classification, the absolute ranking and the t-test when comparing all oils to the Statt]ord A weathered crude oil. The SIMCA ranking has been used as reference. In Table 4 a KNN distance ranking where the five closest neighbours to four of the oils have been calculated. Statistical M e t h o d s

The absolute ranking test and the t-test statistical analysis was used on all 47 parameters. Table 3 shows that in order to differentiate between the Statfjord crude oils, as reference group, and most of the crude oils from other geographical regions the summed absolute distance is useful. Although not included in the present paper the method allows weathered crude oils from the different geographical regions, i.e. Soviet oils vs North Sea oils, to be distinguished from each other. Distinction

Volume 17/Number 8/August 1986 TABLE 3 Ranking lists for distances from the Statfjord crude oil by using SIMCA, absolute distance and the t-test method. The SIMCA method has been used as reference for tabulation. The value given for SIMCA ranking is the residual standard deviation (RSD) of the sample to the class model of the total Statfjord class. The class residual standard deviation for the Statfjord class using 95% confidence interval is 1.4286. The object numbers are the same as those used in the Fig. 2. Only the Statfjord samples are found to be members of this class. See text for description of absolute distance calculation. The n is the number of paramcters used to calculate the Index. This is the number of parameters which differ with 95% confidence level. The t-test numbers are the number of parameters found to differ between Statfjord weathcred crude oil and the crude oil of interest using a 99.9% confidence Icvcl. SIMCA Object namc

Absolute Dist. Object #

RSD

Index

n

3 4 27 30 29 28 2 31 12 13 23 6 11 16 9 7 5 10 20 22 21 8 14 19 15 25 17 18 26 24

0.2973* 0.3350* 0.3880* 0.4052* 0.4380* 0.5834* 0.7760* 0.9000* 2.6292 3.1005 4.8355 5.8298 6.5343 6.6190 6.9688 6.9918 7.4409 8.9152 10.2798 10.4025 11.6476 11.7451 12.2418 20.8015 22.9441 28.2214 32.3056 38.6459 58.6036 62.0982

0.27 0.86 0.00 0.00 0.00 0.00 0.00 0.00 0.61 0.81 5.36 3.41 5.86 5.67 2.91 3.43 5.66 3.75 10.00 8.68 9.96 3.67 5.39 6.38 5.27 9.47 8.20 7.18 9.84 9.05

2 6 (1 0 0 0 0 0 4 6 19 14 22 20 11 15 23 13 31 26 28 16 20 21 19 29 24 24 29 29

Stat. bl. Statfj. A(uw)t Statfj. A# 1 Statfj. A#4 Statfj. A#3 Statfj. A#2 Statfj. B Statfj. A#5 Grent Beryl Es Sider Ekofisk A Forties Gorm EIdfisk B Ekofisk B Valhall Teeside Nig. light Nig. Forcados Nig. medium Edda Flotta Romask. crude Buchan lran. light Soviet Export Topp. Romaskin. Murban lran. heavy

&test (I 0 0 0 0 0 0 0 0 0 2 3 4 6 2 3 5 3 11 9 11 3 2 2 3 6 5 2 9 5

*These samples have been used to construct the Statfjord class in SIMCA. tuw is unweathered Statfjord A.

between oils from the same area, i.e. between North Sea oils is somewhat problematic. The t-test method with 99.9% confidence is found to be useful for the same type of distinctions, but also does not distinguish between oils from the same geographical region. Brent and Beryl oils are not distinguishable from the Statfjord crude oils. The possibility of false conclusion, i.e. rejecting H 0 when it is in fact true (type I error (Zar, 1974)), is - 4.7% (Wold et al., 1983) for the t-test method with 47 parameters and 99.9% confidence level. Both methods do allow oils from the same geographical source to be identified. From analysis comparing all oils against each other it has been found that it is necessary to use most of the parameters if the absolute ranking ('peak-index-fit') and the t-test methods are selected for identifying sources of oils. Multivariate Methods

The KNN approach to clustering is illustrated in Table 4. Four crude oils were analysed for their five nearest neighbours to see how this method performed in classification. It is seen from Table 4 that for the four oils

TABLE 4

For KNN the distance described is from the 'unknown' crude oil to the one mentioned using Euclidian distance. "Unknown Oil'

Neighbour

Distance

Statfjord A#5 Statfjord A#5 Statfjord A#5 Statfjord A#5 Statfjord A#5

to to to to to

Statfjord A # 4 Statfjord A # 3 Statfjord A#1 Statfjord B Statfjord A # 2

0.2474 0.2981 0.3138 0.3419 (I.3453

Nigerian Medium Nigerian Medium Nigerian Medium Nigerian Medium Nigerian Medium

to to to to to

Nigerian Light Nigerian Forcados Gorm Valhall A Ekofisk A

0.4719 0.6499 1.4353 1.5106 1.6674

Topped Topped Topped Topped Topped

to to to to to

Soviet Export Romaskino Crude Iranian Light Iranian Heavy Flotta

0.341(I 0.6153 0.7759 0.8191 1.0260

to to to to to

Iranian Light Topped Romaskino Soviet Export Romaskino Crude Buchan

0.4976 0.8191 0.8916 1.1762 1.3210

Iranian Iranian Iranian Iranian Iranian

Romaskino Romaskino Romaskino Romaskino Romaskino Heavy Heavy Heavy Heavy Heavy

selected there is no problem in identifying samples with same geographical origin. The increase in KNN distance, between samples of same geographical origin and the next closest neighbours, found for the Nigerian Medium case, is illuminating. There is a substantially larger distance between samples that do not have the same origin than those which have the same geographical origin. This suggests that the KNN method may be used to select closest neighbours to an unknown oil and thereby identify possible sources. The results from the SIMCA analysis are illustrated in Figs 2 and 3 and Table 3. Figure 2a shows a principal component plot of a SIMCA analysis of all the 30 weathered crude oils, including five parallels of weathered Statfjord A oil. Table 3 shows the object numbers as they appear in Figs 2 and 3. From this a visible inspection of Fig. 2a shows Iranian, Nigerian, Soviet/ Romaskino, Statf]ord/Ekofisk, the one Libyan (Es Sider) and the one Murban oil to be separated into six groups. In addition Flotta and Buchan seem to form a group and the one Forties sample is separated from all others. The other North Sea oils are seen to be grouped in the lower part of the principal component plot. Figure 2b is a variable loading plot of the variables which have been used to construct the principal components in Fig 2a. Triterpanes are marked with Tand steranes with St. Since too few samples are present for most of the groups, disjoint principal component models have only been made for two North Sea crude oil groups of samples and the Statfjord group of samples. Figure 3a is a principal component plot of the North Sea oils (objects 2-16 and 27-31). The visual interpretation of this plot shows that Gorm and Valhall (objects 5 and 16) form a group, that Flotta and Buchan (objects 14 and 15) form a group and that the Forties sample (object 11) is separated from all the others. The other North Sea oils are grouped together. To investigate any subgroupings in the other North Sea oils objects 11, 14 and 15 were left out and a new principal component construction made. Figure 3b shows that the oils grouped together in Fig. 3a 369

Marine Pollution Bulletin

(o}

~

C2

IRANIAN CRUDES

26 ABU DHABI CRUDE

~

OVIE 1"CRUDES

LIBYA CRUDE

PC1

CRUDES

/

/

STATFJOR

,3/2,.

"~'NORTH SEA CRUDES

132

(b)

ST

ST ST

( ~

T

T

nClv/PRISTANE

nCls/PHYTAN

ST

ST T

ST ST ST

S - SULPHUR V- VANADIN

T- TRITERPANES ST- STERANES

Fig. 2 Principal component plots of weathered crude oils. 30 parameters have been used. (a) Object score plot of all 30 oils. (b) Variable loading plot. ( t - T - triterpanes, St m Steranes.)

370

Volume 17/Number 8/August 1986

(a)

%

Pc2

LHALL

DFISK B -ZSIDE PIPELINE

PC1

13 BERYL

11 FORTIE~

STATFJORD

12 BRENT 4 STATFJORD (NOT WEATHERED) 8 EDDA

(b)

PC 2

EDDA 8

BRENT 12

BERYL 13

PCl

GORM5~.~"

Fig. 3 (a) Principal component object score plot of North Sea oils. (b) Principal component object score plot North Sea oils except samples (objects) 11, 14 and 15.

371

Marine Pollution Bulletin

second is to reduce, if possible, the number of chemical analytical methods used in a routine oil-spill identification scheme while at the same time retaining the possibility to identify analytical errors and uniquely classify a spill to a suspected source. Principal component analysis is a powerful tool in this type of work. Interpretation of variable loading plots give information on variable correlation and may be used to rank variables according to their importance in the principal component analysis and their ability to distinguish between objects and classes. Variable loading plots (Wold et al., 1984), using the Fig. 2a objects (see Fig. 2b) show that vanadium and sulphur carry the same information (are correlated) in separating Iranian oils from the other oils in this class. Nickel has been found to correlate with vanadium. The ~-Triterpane/~-Sterane ratio is seen to be responsible for the classification of Nigerian crudes. The variable loading plot also suggests that there is redundant information in the triterpane and sterane data-matrix. This is seen by the close correlation between several of these variables (encircled) in the variable loading plot. It is also interesting that the C,7/Pristane and C,8/Phytane Discussion ratios do not participate in the classification of the oils. Artificial weathering conditions will never be iden- This agrees with the results reported by Clark & Jurs tical to the conditions oils are subject to on the sea. Still, (1979) indicating that the normal alkanes and pristane, the changes occurring within oils subject to artificial phytane do riot represent identification parameters. In order to cope with the many oil spills occurring weathering will reflect those occurring under natural conditions. The US Coast Guard method (Anon, 1977) around the world databases for different oil products as allows for both photo-chemical, dilution and evapora- well as their weathered products have been made tive processes to affect the oil. The method must be (Duewer et al., 1975; Anon, 1977; Sporstal & Lichtenconsidered a relatively mild weathering process simulat- thaler, 1984). Such databases may be applied as method ing conditions encountered in the oceans at higher development and screening tools, but oil spill source identification should finally rest on comparison of latitudes. The multicomponent composition of oils and the spilled oil to the suspected source. Confirmation of complex changes which occur in oils that are weathered identity should be made using several chemical and are such that great care should be taken during chemical data-analytical methods in combination. To refine the combined chemical analytical and statisanalyses so that possible erratic results arising from chemical analytical problems may be identified. This is tical approach with respect to simplicity of use and possible if several independent chemical analytical uniqueness in classification of weathered crude oils methods known to supply correlated information are when comparing them to suspected sources we have used. Very subtle differences in chemical analytical data developed a databank for the 26 weathered oils with may, when classifying large numbers of oils, lead to oils data from the above mentioned methods. This databank being erroneously classified, thus several independent is presently being used to compare several multivariate data-analytical methods should be used together. The statistical methods such as KNN in level 2 type pattern digitization of analytical data should be accomplished recognition problems, establishing 'typical-nearestwith great emphasis on quality control so that errors in neighbour' distances, and more advanced cluster chemical analyses are not transferred to the data base. analysis methods such as fuzzy clustering (Gunderson & The relative differences between oils will change with Thrane, 1985) in combination with principal compothe degree of weathering. Biomarkers, such as triter- nent analysis. The results so far suggest that panes and steranes, have been suggested to be among identification of spilled oil sources is best based on a the more stable organic compound groups, although multi-chemical and multi-statistical approach requiring extreme conditions and long exposure will lead to these direct comparison of the spilled oil to the suspected compound groups being degraded also (Goodwin et al., source. 1983; Aquino-Neta et al., 1983). The possibility of reducing the number of chemical variables to those Proposed approach to oil spill identification necessary must be considered important when wanting Statistical methods rely on analyses of several parallel to establish databases of many crude oils. We are therefore presently investigating the information content of samples to obtain measures of representative values and the different chemical analytical methods using principal variance. From these parameters the confidence level of component analysis. The intention of this is twofold. classification and identity of sample may be assessed. The first is to identify chemical variables which are not, With data-analytical methods such as principal compoor very little, affected by weathering processes. The nent analysis or other multivariable methods all relevant may be separated. The Statfjord oils are all grouped together, the same is the case with the Ekofisk and its satellite rigs, together with the Teeside pipeline consisting mainly of Ekofisk oil, Brent and Beryl (objects 12 and 13) have been visually separated from the Statfjord oils in the SIMCA principal component plot. The Edda sample (object 8), although also a Ekofisk satellite is seen to be separated from the other Ekofisk samples. Gorm and Valhall (objects 5 and 16) are both oils from the south-east part of the North Sea basin and are therefore grouped together. To verify the visual interpretation of Fig. 3a, Table 3 shows that the Stati]jord oils may be considered as one class with all other oils as outliers at a 95% confidence level. The unweathered Statfjord sample (object 4) is found to be a member of the Statfjord class. This shows that the SIMCA method is capable of classifying the artificially weathered oils from different geographical regions, as well as from different sources within the same region.

372

Volume 17/Number 8/August 1986

available data is used to classify samples. Including several independent statistical methods and multivariate data analysis will provide the necessary quantitative interpretation possibility for classification in oil-spill identification schemes. We propose that routine oil-spill identification schemes should include several chemical analytical methods and that several, independent, mathematical and statistical methods should be used to analyse the data available. If databanks for weathered oils are available the chemical analytical data from the spill oil should be compared to these to narrow the number of suspects and possibly identify special characteristics which might simplify the search for a source. The dataanalytical methods should finally be used to compare the multicomponent chemical data of suspected sources to that of the oil-spill.

Conclusion Classification of artificially weathered crude oils from different geographical regions may be based on simple statistical methods, i.e. univariate analysis. Differentiation, and identification, of oils from different sources within the same geographical region, e.g. North Sea oils, requires use of advanced multivariate data-analytical methods. Combining chemical analytical data from several methods and four independent statistical methods allows an oil spill identification approach where artificially weathered crude oils may be uniquely classified. The absolute ranking ('peak-index-fit'), the t-test and the KNN are easy to perform in searching large numbers of possible Oil spill sources in a database. They are all based on different mathematical strategies for determining classification or ranking. The different ways of data analysis are confirmative when used together. The SIMCA method is then the final reference method for identification allowing a fourth way of data analysis to confirm the others.

We thank suppliers of oils. F. Oreld, B. Dirdal, K. Vadum, B. Enger and I. von-Heimburg are thanked for chemical analytical help. Comments and suggestions put forward by Otto GrahI-Nielsen at the University of Bergen, Norway are gratefully acknowledged.

Anderson, C. E, Killeen, T. J., Taft, J. B. & Bentz, P. (1980). Improved identification of spilled oils by infrared spectroscopy. Environ. Sci. Technol. 14, 1230-1234. Anon. (1974). Marine Pollution by Oil. Institute of Petroleum Oil Pollution Analysis Committee. Applied Science, Essex. Anon. (1977). Oil spill identification system. In US Coast Guard Report Number: CG-D-52-77/task no. 4243.3, June 1977. pp. 207. Albano, C., Dunn, W. J., Edlund, U., Johansson, E., Norden, B.,

Sjostrom, M. & Wold, S. (1978). Four levels of pattern recognition. Anal. Chim. Acta Comput. Tech. Optim. 103, 429-443. Aquino-Neto, E R., Trandel, J. M., Restle, A., Connan, J. & Albrecht, E A. (1983). Occurrence and formation of tdcyclic and tetracyclic terpanes in sediments and petroleums. In Advances in Organic Geochemistry, 1981 (M. Bjorey et al., eds). Wiley, New York. Bentz, A. P. (1976). Oil spill identification. Anal. Chem. 48, 545A472A. Bentz, A. P. (1978). Who spilled the oil. Anal. Chem. 50, 655A-658A. Chicn, Y. T. & Killecn, T. J. (1977). Workshop on pattern recognition applied to oil identification. Coronado, California. Nov. 11-12, 1976. I EEE catalog No. 76CH 1247-6C. Clark, H. A. & Jurs, P. C. (1980). Classification of crude oil gas chromatograms by pattern recognition techniques. Anal. Chem. 51, 616-623. Duewer, D. L., Kowalski, B. R. & Schatzki, T. F. (1975). Source identification of oil spills by pattern recognition analysis of natural elcment composition. Anal. Chem. 47, 1573-1583. Goodwin, N. S., Park, E J. D. & Rawlinson, A. P. (1983). Crude oil biodegradation under simulated natural conditions. Advances in Organic Geochemistry, 1981(M. Bjorkey et al. eds). Wiley, New York. Grahl-Nielsen, O., Kvalheim, O. & Oygard, K. (1983). SIMCA multivariate data-analysis of blue-mussel components in environmental pollution studies. Anal. Chim. Acta. 150, 145-152. Gunderson, R. W. & Thrane, K. (1985). Monitoring polycyclic aromatic hydrocarbons: An environmental application of fuzzy C-varieties pattern recognition. In: Environmental Applications of Chemometrics. (J. J. Breen & P. E. Robinson, eds). ACS Syrup. Ser. 292. pp. 130-147. Knutsen, H. & Vogt, N. B. (1985). A supplementary approach to identifying feeding patterns of lobsters. Parts I and If. J. Exp. Mar. Biol. Ecol. 89, 109-119, 121-134. Kvalheim, O. (1984). Manual for SIMCA program. University of Bergen, Norway. 36 p. (in Norwegian). Massart, D. Luc & Kaufman, L. (1983). The interpretation of analytical chemical data by the usc of cluster analysis. Chemical Analysis. Vol. 65. Wiley, New York. Mattson, J. S., Mattson, C. S., Spencer, M. J. & Starks, S. A. (1977). Multivariate statistical approach to the fingerprinting of oils by infrared spectroscopy. Anal. Chem. 49, 297-302. Sjostrom, M. & Kowalski, B. (1979). A comparison of five pattern recognition methods based on the classification results from six real data bases. Anal. Chim. Acta. 112, 11-30. Sporstol, S. P. & Lichtenthaler, R. G. (1983). Identification of oil spills. Centre for Industrial Research Report Number: 83 06 (J6-1. pp. 170. Thrane, K. (1984). Application of cluster analysis to identify sources of airborne polycyclic aromatic hydrocarbons. Presented at the 77th APCA Annual Meeting and Exhibition, June 24-29, 1984 in San Fransisco, USA. Norwegian Air Research Institute (NILU) report NILU F: 20/84.84-16-6. Vogt, N. B. (1984). Surface microlayer analysis-A multivariate approach. Cand. scicnt, thesis, Dep. of Organic Chemistry, University of Bergen, Norway. (In English). Vogt, N. B. & Knutsen, H. (1985). SIMCA pattern recognition classification of five infauna taxonomic groups using non-polar compounds analysed by high resolution gas chromatography. Mar. Ecol. Prog. Ser. 26, 145-156. Wold, S. & Sjostrom, M. (1977). SIMCA, a method for analysing chcmical data in terms of similarity and analogy. In Chemometrics, Theory and Application (B. Kowalski, ed.). American Chem. Soc. Syrup. Set. 52. Wold, S. AIbano, C., Dunn IIl, W. J., Esbensen, K., Hellberg, S., Johansson, E. & Sjostrom, M. (1983). Multivariate analytical chemical data evaluation using SIMCA and MACUE In Pattern Recognition in Chemistry. Scientific Symposium, Matrafured. Akademiai Kiado, Budapest. Wold, S., AIbano, C., Dunn, W. J., Edlund, U., Esbensen, K., Geladi, P., Hellberg, S., Johansson, E., Lindberg, W. & Sjostrom, M. (1984). Multivariate data analysis in chemistry. In Proceedings NATO Adv. Study Inst. on Chemometrics, Cosenza, Italy (B. Kowalski, ed.). Reidel, Dordreeht. Zar, J. H. (1974). In BiostatisticalAnalysis. Prentice Hall, New Jersey.

373