Multiproxy Calibration

While studies have demonstrated that well-chosen regional paleoclimatic reconstructions can act as surprisingly representative surrogates for large-scale climate, multiproxy networks nonetheless, appear to provide the greatest opportunity for large-scale paleoclimate reconstruction and climate signal detection. There is a rich tradition of multivariate statistical calibration approaches to paleoclimate reconstruction, particularly in the field of dendroclimatology where the relative strengths and weaknesses of various approaches to multivariate calibration have been well-studied. Such approaches have been applied to regional dendroclimatic networks to reconstruct regional patterns of temperature and atmospheric circulation or specific climate phenomena such as the Southern Oscillation. Largely because of the inhomogeneity of the information represented by different types of indicators in a true 'multiproxy' network, we found conventional approaches (e.g. Canonical Correlation Analysis or 'CCA' of the proxy and instrumental datasets) to be relatively ineffective. Our approach to climate pattern reconstruction relates closely to statistical approaches which have recently been applied to the problem of filling in sparse early instrumental climate fields, based on calibration of the sparse sub-networks against the more widespread patterns of variability that can be resolved in shorter datasets. We first decompose the 20th century instrumental data into its dominant patterns of variability, and subsequently calibrate the individual climate proxy indicators against the time histories of these distinct patterns during their mutual interval of overlap. One can think of the instrumental patterns as 'training' templates against which we calibrate or 'train' the much longer proxy data (i.e., the 'trainee' data) during the shorter calibration period which they overlap. This calibration allows us to subsequently solve an 'inverse problem' whereby best estimates of surface temperature patterns are deduced back in time before the calibration period, from the multiproxy network alone.

Implicit in our approach are at least three fundamental assumptions. We assume (1) that the indicators in our multiproxy trainee network are linearly related to one or more of the instrumental training patterns. In the relatively unlikely event that a proxy indicator represents a truly local climatic phenomenon which is uncorrelated with larger-scale climatic variations, or represents a highly non-linear response to climatic variations, this assumption will not be satisfied. We further assume (2) that a relatively sparse but widely distributed sampling of long proxy and instrumental records may nonetheless sample most of the relatively small number of degrees of freedom in climatic patterns at interannual and longer timescales. Regions not directly represented in the trainee network may nonetheless be indirectly represented through teleconnections with regions that are. The El Nino-Southern Oscillation (ENSO), for example, exhibits global-scale patterns of climatic influence, and is an example of a prominent pattern of variability which, if captured, can potentially describe variability in regions not directly sampled by the trainee data. Finally, we assume (3) that the patterns of variability captured by the multiproxy network have analogues in the patterns we resolve in the shorter instrumental data. This latter assumption represents a fairly weak 'stationarity' requirement--we don't require that the climate itself be stationary. In fact, we expect that some sizeable trends in the climate may be resolved by our reconstructions. We do, however, assume that the fundamental {\em spatial patterns} of variation which the climate has exhibited during the past century are similar to those by which it has varied during past recent centuries. Studies of instrumental surface temperature patterns suggest that such a form of stationarity holds up at least on multidecadal timescales, during the past century. The statistical cross-validation exercises we describe later provide the best evidence that these key underlying assumptions hold.

We isolate the dominant patterns of the instrumental surface temperature data through principal component analysis (PCA). PCA provides a natural smoothing of the temperature field in terms of a small number of dominant patterns of variability or 'empirical eigenvectors'. Each of these eigenvectors is associated with a characteristic spatial pattern or 'Empirical Orthogonal Function' (EOF) and its characteristic evolution in time or 'Principal Component' (PC). The ranking of the eigenvectors orders the fraction of variance they describe in the (standardized) multivariate data during the calibration period. The first 5 of these eigenvectors describe a fraction =0.93 (ie, 93%) of the of the global-mean ('GLB'), temperature variations, 85% of the northern hemisphere-mean ('NH') variations, 67% of the NINO3 index, and 76% of the non trend-related ('DETR') NH variance (see 'Methods--statistics' section for a description of the statistic used here as a measure of resolved variance). A sizeable fraction of the total multivariate spatiotemporal variance ('MULT') in the raw (instrumental) data (27%) is described by these 5 eigenvectors, or about 30% of the standardized variance (1=12%, 2=6.5%, 3=5%, 4=4%, 5=3.5%). Figure 2 shows the EOFS of the first 5 eigenvectors. The associated time histories of these patterns (PCs) and their reconstructed counterparts ('RPC's) are discussed in the next section. The first eigenvector, associated with the significant global warming trend of the past century, describes much of the variability in the global (GLB=88%) and hemispheric (NH=73%) means. Subsequent eigenvectors, in contrast, describe much of the spatial variability relative to the large-scale means (ie, much of the remaining multivariate variance 'MULT'). The second eigenvector is the dominant ENSO-related component, describing 41% of the variance in the NINO3 index. This eigenvector exhibits a modest negative trend which, in the eastern tropical Pacific, describes a 'La Nia'-like cooling trend, which opposes warming in the same region associated with the global warming pattern of the first eigenvector. The third eigenvector is associated largely with interannual-to-decadal scale   variability in the Atlantic basin and carries the well known temperature signature of the North Atlantic Oscillation (NAO) and decadal tropical Atlantic dipole. The fourth eigenvector describes a primarily multidecadal timescale variation with ENSO-scale and tropical/subtropical Atlantic features, while the fifth eigenvector is dominated by multidecadal variability in the entire Atlantic basin and neighboring regions that has been widely noted elsewhere.

We calibrate each of the indicators in the multiproxy data network against these empirical eigenvectors at annual mean resolution during the 1902-1980 training interval. While the seasonality of variability is potentially important--many extratropical proxy indicators for example reflect primarily warm-season variability--we seek in the present study to resolve only annual mean conditions, exploiting the seasonal climatic persistence, and the fact that the mutual information from data reflecting various seasonal windows should provide complementary information

regarding annual mean climatic conditions. Following this calibration, we apply an overdetermined optimization procedure to determine the best combination of eigenvectors represented by the multiproxy network back in time on a year-by-year basis, with a spatial coverage dictated only by the spatial extent of the instrumental training data. From the reconstructed PCs or 'RPC's, spatial patterns and all relevant averages or indices can be readily determined. The details of the entire statistical approach are described in the 'Methods--calibration' section.

The skill of the temperature reconstructions (ie, their statistical validity) back in time is established through a variety of complimentary independent cross-validation or 'verification' exercises (see 'Methods--verification' below). We summarize the key results of these experiments below (a Table detailing the quantitative results of the calibration and verification procedures is provided in Nature's on-line supplementary information):

(1) In the reconstructions from 1820 onward based on the full multiproxy network of 112 indicators, 11 eigenvectors are skillfully resolved (# 1-5,7,9,11,14-16) describing ~$70-80% of the variance in NH and GLB mean series in both calibration and verification (verification is based here on the independent 1854-1901 dataset which was witheld, as described in the 'Methods--verification' section). Figure 3 shows the spatial patterns of calibration , and verification and the squared correlation statistic r, demonstrating highly significant reconstructive skill over widespread regions of the reconstructed spatial domain. 30% of the full spatiotemporal variance in the gridded dataset is captured in calibration, and 22% of the variance is verified in cross-validation. Some of the degradation in the verification score relative to the calibration score may reflect the decrease in instrumental data quality in many regions before the 20th century rather than a true decrease in resolved variance. These scores thus compare favorably to the 40% total spatiotemporal variance that is described by simply filtering the raw 1902-1980 instrumental data with 11 eigenvectors used in calibration, suggesting that the multiproxy calibrations are describing a level of variance in the data reasonably close to the optimal 'target' value. While a verification NINO3 index is not available from 1854-1901, correlation of the reconstructed NINO3 index with the available Southern Oscillation index (SOI) data from 1865-1901 of r=-0.38 (r2=0.14) compares reasonably with its target value given by the correlation between the actual instrumental NINO3 and SOI index from 1902-1980 (r=-0.72). Furthermore, the correspondence between the reconstructed NINO3 index warm events and Quinn and Neal's historical El Nio chronology back to 1820 (see 'Methods--verification') is significant at the 98% level.

(2)The calibrations back to 1760 based on 93 indicators, continue to resolve at least 9 eigenvectors (1-5,7,9,11,15) with no degradation of calibration or verification resolved variance in NH, and only slight degradation in MULT (calibration ~27%, verification ~17%). Our reconstructions are thus largely indistinguishable in skill back to 1760.

(3)The network available back to 1700 of 74 indicators (including only 2 instrumental or historical indicators) skillfully resolves 5 eigenvectors (1,2,5,11,15) and shows some significant signs of decrease in reconstructive skill. In this case, about 60-70% of NH variance is resolved in calibration and verification, and about 14-18% of MULT in calibration, and 10-12% of MULT in verification. The verification r of NINO3 with the SOI is in the range of r =~-0.25 to -0.35, which is statistically significant (as is the correspondence with the Quinn historical chronology back to 1700) but notably inferior to the later calibrations. In short, both spatial patterns and large-scale means are skillfully resolved, but with significantly less resolved variance than in later calibrations.

(4)The network of 57 indicators back to 1600 (including 1 historical record) skillfully resolves 4 eigenvectors (1,2,11,15). 67% of NH is resolved in calibration, and 53% in verification. 14% of MULT is resolved in calibration, and 12% of MULT in verification. A significant, but modest level of NINO3-scale variability is resolved in the calibrations.

(5) The network of 24 proxy indicators back to 1450 resolves 2 eigenvectors (1,2) and ~40-50% of NH in calibration and verification. Only ~10% of MULT is resolved in calibration and ~5% in verification. There is no skillful reconstruction of ENSO-scale variability. Thus spatial reconstructions are of marginal usefulness this far back, though the largest-scale quantities are still skillfully resolved.

(6) The multiproxy network of 22 indicators available back to 1400 resolves only the first eigenvector, associated with 40-50% of resolved variance in NH in calibration and verification. There is no useful resolution of spatial patterns of variability this far back. The sparser networks available before 1400 show little evidence of skill in reconstructing even the first eigenvector, terminating useful reconstruction at the initial year 1400 AD.

(7) Experiments using trainee networks containing only proxy (i.e., no instrumental or historical) indicators establish the most truly independent cross-validation of the reconstruction since there is in this case neither spatial nor temporal dependence between the calibration and verification datasets. Such statistically significant verification is demonstrated at the gridpoint level (calibration and verification resolved variance ~15% for the 'MULT' statistic), at the largest scales (calibration and verification resolved variance ~60-65% for NH) and the ENSO-scale (90-95% statistical significance for all verification diagnostics). In contrast, networks containing only the 24 long historical or instrumental records available back to 1820 resolve only ~30% of NH in calibration or verification, and the modest multivariate calibration and verification resolved variance scores of MULT (~10%) are artificially inflated by the high degree of spatial correlation between the instrumental 'multiproxy' predictor and instrumental predictand data. No evidence of skillful ENSO-scale reconstruction is evident in these latter reconstructions. In short, the inclusion of the proxy data in the 'multiproxy' network is essential for the most skillful reconstructions. Certain sub-components of the proxy dataset however (e.g., the dendroclimatic indicators) appear to be especially important in resolving the large-scale temperature patterns, with notable decreases in the NH scores obtained if all dendroclimatic indicators are witheld from the multiproxy network. On the other hand,   the long-term trend in NH is relatively robust to the inclusion of dendroclimatic indicators in the network, suggesting that potential tree growth trend biases are not influential in multiproxy pattern reconstructions. The network of all combined proxy and long instrumental/historical indicators provide the greatest cross-validated estimates of skillful reconstruction, and are used in obtaining the reconstructions described below.

Revised:  April 28, 1998