Most of the chemical tools we use to study fossils produce a data spectrum, which is just a set of measurements made over a series of observations. For example, think of the output from an x-ray diffraction analysis – the pattern of peak counts for each two-theta value is a spectrum. Other instruments that produce spectra include x-ray fluorescence, isotope ratio mass spectrometry, Fourier transform infrared spectroscopy and so on. We have become adept at reducing these data spectra to a single or few values of interest – calcite has an intense two-theta value at 29.4 under irradiation from a copper source – and we tend to throw away the rest of the spectrum. While there are study questions that can be addressed with point values (“Is there calcite in my fossil clam shell?”), there are other questions that could be better answered while working with the whole spectrum. Is there more than just calcite in my fossil shell? Does the calcite have an unusual chemistry? Is my fossil shell different to your fossil shell? These are questions that we can address by looking at variation between spectra, which we can do with principal component analysis.
I am going to switch study systems and talk about fossil bones and x-ray fluorescence. I will work this up as a practical tutorial of sorts, and talk about some data from South African fossils that I collected a couple of years ago. There are a couple of advantages in working with these data – one, this is a real world scenario with a tangible result – two, the spectral data are available online so you can perform your own principal component analyses on fossil data.
Principal component analysis (PCA) on spectra from fossil bones.
Premise: Stable isotopes in fossils are an incredibly rich source of information about ancient animals. Isotopes in fossil teeth and bones can tell us what an animal ate and how frequently it visited a source of water. Unfortunately, teeth and bones can become altered when they are buried, and the important biological information in the isotopic compositions can be lost. It would be nice to know ahead of time whether or not a fossil bone is altered, and one approach to assessing alteration is to study the burial history of the fossil bone or tooth. In this scenario we are going to examine the burial history of a fossil by studying the elemental composition of the fossil surface – changes in the environment over time will have changed the chemistry of the fossil. We will study fossils from two different Pleistocene locations in the Western Cape of South Africa, and the end goal is to decide which location has the best-preserved fossils.
Samples: Fossil teeth and bones from two localities were analysed. These bones were the horn cores of Pleistocene antelope – springbok (Antidorcas marsupialis), eland (Taurotragus oryx) and a relative of sable (Hippotragus sp.).
Data collection: Fossil teeth and horn cores from each site were analysed with a portable x-ray fluorescence instrument. We will use the energy spectra that the instrument produced, rather than the elemental ratios.
Swartklip fossil location, outside of Cape Town South Africa
Elandsfontein fossil location, outside of Cape Town South Africa
Results and Discussion: A principal component analysis (PCA) identifies the sources of variation in a dataset, which it sees as individual ‘components’. The largest source of variation is principal component one, the second largest source of variation is principal component two, and so on. By separating out sources of variation, PCA provides us with two very important sets of data about the samples we have analysed.
The first set of data are the ‘score values’ (or eigenvalues). The spectrum from each sample is reduced to a single ‘score’ value for each principal component. Samples will have similar score values when they have similar amounts of a particular variable . So, we can look at a distribution of score values and see which samples are more similar to which other samples by how closely positioned they are. Lets take this image here:
These are the score values from principal component one and principal component two, for our fossil XRF data. The first thing to notice is that all of the Swartklip samples have similar PC1 values. That is, the fossils from the two sites can be separated by the variation that is being pulled out by principal component one. The fossils from both sites have a range of PC2 values, so there is variation at both sites in whatever this component is. So, what is the source of variation along principal component one that is separating these fossil sites?
More soon, including the R code for pre-treating spectral data, performing PCA, and presenting PCA results…