NOAA NCDC National Climatic Data Center
NOAA Paleoclimatology Program, NCDC Paleoclimatology Branch  
Paleoclimatology Navigation Bar Bookmark and Share
NOAA National Environmental Satellite, Data, and Information Service National Oceanic and Atmospheric Administration NOAA National Climatic Data Center U.S. Department of Commerce Paleo Home Data Paleo Projects Paleo Perspectives Education and Outreach About Paleo Program Site Map

Tree-ring carbon isotope data and drought maps
for the U.S. Southwest

Gridded Interpolation Methodology
To examine spatial variability of any field, it is typical to fit a function of the form:

Y = f (Latitude, Longitude) + error

Where Y is the dependent variable (here it is the isotope variable). The errors are assumed to be Normally distributed with a mean of 0 and standard deviation s and, independent. Traditional statistical methods fit a linear function that minimizes the squared errors, known a linear regression which is of the form:

Y = a*Latitude + b*Longitude + c + error

The model parameters a and b are estimated from the data by minimizing the mean squared errors. The theory behind this approach, the procedures for parameter estimation, and hypothesis testing are very well developed (e.g., Helsel and Hirsch, 1995; Rao and Toutenburg, 1999) and are widely used. However, they do have some drawbacks:

(i) the assumption of a Gaussian distribution of the errors and the variables and
(ii) fitting a global relationship (e.g., a linear equation in the case of linear regression) between the variables. If the linear model is found inadequate, higher order models (quadratic, cubic, etc.) have to be considered, which can be difficult to fit in the case of short data sets. Also if the variables are not normally distributed, which is often the case in practice, suitable transformations have to be obtained to transform them to normal distribution. All of this can make the process unwieldy. Thus, a more flexible framework would be desirable.

Local estimation methods (also known as nonparametric methods) provide an attractive alternative. In this, the function f is fitted to a small number of neighbors in the vicinity of the point at which an estimate is required. This is repeated at all the estimation points. Thus, instead of having a single equation that describes the entire data set, there are several 'local fits', each capturing the local features. This provides the ability to model any arbitrary features (linear or nonlinear) that the data exhibits.

There are several approaches for local functional estimation, including kernel-based (Bowman and Azzalini, 1997), splines, K-nearest neighbor (K-NN) local polynomials (Rajagopalan and Lall, 1998; Owosina, 1992); and locally weighted polynomials (Loader, 1999). For an overview of nonparametric functional estimation methods and their application to hydrologic problems see Lall (1995). Of these methods, the Locally Weighted Polynomial regression (LWP) is simple and robust, and has been used in a variety of hydrologic and hydroclimate applications with good results - for streamflow forecasting on the Truckee and Carson river basins (Grantz et al., 2005), salinity modeling on the upper Colorado river basin (Prairie et al., 2005), forecasting of Thailand summer rainfall (Singhrattna et al., 2005), and spatial interpolation of rainfall in a watershed model (Hwang, 2005).

The locally weighted polynomials, henceforth referred to as LOCFIT, is used in this application, as they are computationally efficient, easy to implement and robust. Furthermore, with the availability of the powerful 'locfit' library (Loader, 2004) in the statistical software R the implementation is made easy. This has been successfully used for salinity and flow modeling (Prairie, 2005, 2006), streamflow forecasting (Grantz, et al., 2005; Regonda et al., 2006) and in other hydrologic applications.

The implementation steps of LOCFIT are as follows:
For any point of interest (i.e., a given latitude and longitude), say, x*
(i) K = a*N (where a = (0,1] and N = number of observations) nearest neighbors (K-NN) from the observational data are identified. Where a is the fraction of the observational data.
(ii) A polynomial of order P is fit to the identified K-NN
(iii) The fitted polynomial is used to estimate the value of the dependent variable, Y(x*), at x*
(iv) The residuals from the polynomial fitted to the K-NN are used to obtain the standard error variance (sle2) of the estimate (Loader, 1999, page 29-30).
(v) Repeat (i) through (iv) for all points of interest.

The polynomial coefficients are estimated by minimizing the weighted mean squared errors - as opposed to the mean squared errors in the traditional linear regression. The K-nearest neighbors are weighted based on their proximity to x* with highest weights to the nearest neighbors and zero weights to the farthest. Any weight function can be used to provide the weights and the approach is insensitive to the choice of the weight function. Notice that if K is set to N (i.e., all the available observation data), if P is set to 1, and if all the neighbors are given equal weights, this approach collapses to the traditional linear regression. Thus, the local polynomial approach offers a general framework with the traditional linear regression model being a subset.

The two parameters of the approach, K and P have to be identified for a given observation data. This is obtained using the generalized cross validation (GCV) function. The combination of K and P that minimizes the GCV function is chosen as the best set of parameters for the LOCFIT. The GCV function is defined as,

GCV (K,P) = (Σ(i-1 to N) ei 2/N) / ((1-m/N)2)

Where ei is the model residual, e is the residual error (i.e. the difference between the observed and fitted value), N is the number of data points, and m is the degrees of freedom of the fitted polynomial (Loader, 1999, page 31).

The best model parameters are then used to estimate the value of the dependent variable at any desired point.

Bowman, A. and A. Azzalini (1997)
Applied smoothing techniques for data analysis.
Oxford, UK.

Grantz, K., B. Rajagopalan, M. Clark, and E. Zagona (2005)
A technique for incorporating large-scale climate information in basin-scale ensemble streamflow forecasts
Water Resour. Res., 41, W10410, doi:10.1029/2004WR003467.

Helsel, D. R., and R. M. Hirsch (1995)
Statistical Methods in Water Resources
Elsevier Science, Amsterdam.

Hwang, Y. (2005)
Impact of input uncertainty in ensemble streamflow generation
Ph.D., thesis, Univ. of Colorado at Boulder, Colorado.

Lall, U. (1995)
Recent advances in nonparametric function estimation: Hydraulic applications
Rev. Geophys., 33, 1093-1102.

Loader, C. (1999)
Local Regression and Likelihood
Springer, New York.

Owosina, A. (1992)
Methods for assessing the space and time variability of groundwater data
M.S. Thesis, Utah State University, Logan, Utah.

Prairie, J., B. Rajagopalan, T. Fulp, and E. Zagona (2005)
Statistical nonparametric model for natural salt estimation
J. Envir. Engi., 131, 130-138.

Prairie, J., Rajagopalan, B., Fulp, T., and Zagona, E. (2006a).
"Modified K-NN Model for Stochastic Streamflow Simulation."
Journal of Hydrologic Engineering, 11(4), 371-378.

Rajagopalan, B., and U. Lall (1998)
Nearest Neighbour Local Polynomial Estimation of Spatial Surfaces
Spatial Interpolation Comparison Contest
Journal of Geographic Information and Decision Analysis, 2(2), 48-57, 1998.

Rao, C. R., and H. Toutenburg (1999)
Linear models: least squares and alternatives,
Springer, New York.

Singhrattna, N., B. Rajagopalan, M. Clark and K. Krishna Kumar (2005)
Forecasting Thailand Summer Monsoon Rainfall
Int. J. Climatology, 25, 649-664.
Dividing Line
Privacy Policy information Open Access Climate Data Policy link USA logo Disclaimer information
Dividing Line
Downloaded Monday, 16-Jan-2017 12:16:47 EST
Last Updated Wednesday, 20-Aug-2008 11:24:07 EDT by
Please see the Paleoclimatology Contact Page or the NCDC Contact Page if you have questions or comments.