Tool Bar

A Brief Study on Selection of the Optimum
Grid Box Size For Analysis of USHCN Data


NCDC | Contents | Climate | Research | USHCN | Search | Help

to the top Introduction

    The USHCN data sets have been widely distributed throughout the research community for use in climate change studies since their development in the late 1980's. Recently these data sets have been used by the Climate Monitoring Branch (CMB) at NCDC in producing monthly reports on the current state of the climate in the United States. Analyses are performed using the Climate Analysis System (CAS), a software package that provides numerous options for the analysis and visualization of climate anomalies. In creating USHCN time series for the analysis of U.S. mean temperatures, two widely accepted grid box sizes (5° x 5° and 2.5° x 2.5°), have been used. However, their use was based on common practice. Until this time, no effort had been made in determining an optimum grid box size for analysis of USHCN temperature data given the spatial distribution of the USHCN network of stations.

    For more information on the USHCN data sets refer to the USHCN page.

to the top Objectives of this Study

    There were three primary objectives for determining the most appropriate grid box size.

    (1) Select a grid box size small enough to ensure optimum weighting of all parts of the domain.

      The availability of station data is typically not sufficient to ensure an even distribution of stations throughout a network. But by averaging station anomalies within regions of similar size (grid boxes) and then calculating the average of all the grid box averages, a more representative region-wide anomaly can be calculated. This makes grid box averaging superior to simply taking the average of all stations in the domain. A network of 1000 stations could theoretically have 700 stations in the eastern half of the domain and 300 stations in the western half. A simple average of the stations could easily create a bias in the domain-wide average to those stations in the east. In the same way, grid box averages taken within large grid boxes increases the chances for producing a bias to the area of the grid box that has a denser station network. A simple example will help illustrate this point. Suppose one 5° x 5° grid box contained 50 stations, with 35 stations in the eastern side of the box and 15 in the other side. If during one month, the stations in the eastern side were 0.1°C anomalously warm while the western side was 0.1°C anomalously cold, the grid box average would be 0.04°C above average. But if the grid box size was reduced to a 5° x 2.5° degree grid, the average of the two grid boxes would be a more representative 0°C anomaly.

    (2) Ensure no grid boxes in the interior of the domain have fewer than two and preferably at least 3 stations, and also reduce to a minimum the total number of grid boxes in the domain having 1 or 2 stations.

      Areas without a dense station network may have grid boxes containing only one or two stations if the grid box size is too small. In these grid boxes, data from one station is weighted more heavily, which increases the potential for introducing unwanted biases in grid box averages and the subsequent calculation of the domain-wide average. For grid boxes containing three or more stations, a bad value for a single station will have less impact on the grid box average, which would in turn less severely impact the average for the domain.

    (3) Ensure the within-grid-box variability is reduced to a minimum while still achieving objectives 1 and 2.

      The temperature difference between each station in a grid box and the grid box average temperature increases as the size of the grid box increases. As the distance between stations increase, changes in latitude, elevation, and proximity to bodies of water create greater variations in climate. We calculated the average temperature difference between each station and it's grid box average as well as the standard deviation of the differences to help us select the optimum grid box size. This helped us determine if, on average, the effect of increased latitudinal distances would create larger mean and standard deviations than the same increase in the East/West direction so that a reduction in the North/South size of the grid box might be more beneficial than reducing the East/West size.

to the top Assessment for Objectives 1 and 2

    For this analysis, we used the Climate Analysis System (CAS), a software package developed at NCDC that provides numerous analysis and visualization options for climate data. The CAS provided calculations of the number of stations in each grid box of the USHCN domain using a matrix of grid box sizes. Starting with a grid box size of 2° x 2° and increasing the size by 0.5 degrees in one direction at a time, CAS provided files containing the number of stations in each grid box as well as maps of the spatial distribution of all station counts. We performed a visual inspection of each map and also created frequency plots of the station counts to determine which grid box sizes best met objectives 1 and 2.

    The station count distribution for the largest grid box size (5° x 5°) is shown in the adjacent plot. With a maximum station density of 68, and an average density of 20 stations per grid box, it was determined to be too large to ensure grid box averaging would always adequately capture the spatial variability. 5° Grid Box
    larger image
    2.5° Square Grid Box
    larger image
    The 2.5° x 2.5° grid boxes yielded better results. The average station density per grid box was six stations, with a maximum station count of 24. But although there were no boxes with zero stations, there were 13 grid boxes with only one station and nine grid boxes with two stations, six of these in the interior of the domain. Based on objective two, it was determined this grid box size was not the best available.
    By extending the grid box size in the East/West direction, we were able to maintain a good station density average (8 stations per grid box) while ensuring all grid boxes in the interior of the domain had more than one station. 2.5x3.5 Grid Box
    larger image

    This 2.5° x 3.5° grid appeared to best meet the requirements of objectives 1 and 2. There were only eight grid boxes containing two stations and only four grid boxes with as few as one station. No grid box in the interior of the domain has only one station. There were two grid boxes in the interior with only two stations which, although not ideal, was not considered cause for rejection of this grid.

to the top Assessment for Objective 3

    The method used in evaluating objective three involved finding the average absolute difference and standard deviation of single stations with respect to grid box averages for each grid box from 2° x 2° to 5° x 5° in one degree increments. Data from the USHCN was used to calculate the grid box average monthly temperatures for the current year/month, the period of record, the station average monthly temperature for the period of record, and each station's monthly mean temperature. We then calculated the station anomalies and the grid box average anomalies from the stations in each grid box.

    After calculating the anomalies, the difference between each grid box average anomaly and the anomalies for each station in the grid box were calculated. This was done for every grid box in the domain over all years. The differences (more than 100,000 total differences) were then averaged into a domain-wide mean difference. The standard deviation of these differences was also calculated.

    After these calculations were performed for each grid box size, the means and standard deviations were compared to determine how well, on average, the stations in each grid box represented the grid box means. It was necessary to know how these means and standard deviations changed as the size of the grid box was reduced latitudinally as well as in the East/West direction. Because stations in the same latitude bands tend to share a more similar climate, we wanted to know if there would be some advantage in reducing the size of the grid boxes in the North/South direction more than in the East/West direction.

    An example of the results for one month (January) are shown in the two tables below. (Results were similar for other months, but differences were less in the summer months due to lower thermal gradients.) As expected, reducing the grid box size resulted in a reduction of the station minus grid box means and standard deviations. The mean difference for the 2° x 2° grid box size was 0.36C less than that for the 5° x 5° grid box size, about a 30% reduction. Also, a one degree decrease latitudinally reduced the mean difference and standard deviation slightly more than a one degree longitudinal decrease. This indicated that it might be more advantageous to use a grid box size wider in the East/West direction. This slight improvement is not necessarily related to a more similar climate among stations within the same latitude band. It may have more to do with the fact that in the mid-latitudes, the distance for one degree latitude is about 20% greater than one degree longitude.
    Average of the Station minus Grid Box Means (January)
    5 Deg Lon 4 Deg Lon 3 Deg Lon 2 Deg Lon
    5 Deg Lat 1.29 1.26 1.22 1.18
    4 Deg Lat 1.23 1.20 1.14 1.10
    3 Deg Lat 1.16 1.12 1.07 1.02
    2 Deg Lat 1.11 1.06 1.00 0.93

    Average of the Station minus Grid Box Std Dev (January)
    5 Deg Lon 4 Deg Lon 3 Deg Lon 2 Deg Lon
    5 Deg Lat 1.20 1.19 1.15 1.12
    4 Deg Lat 1.17 1.15 1.09 1.05
    3 Deg Lat 1.10 1.07 1.02 0.98
    2 Deg Lat 1.06 1.03 0.95 0.89

to the top Conclusion

    The overall assessment of the results from these two analyses indicated that although there may be more than one acceptable grid box size for analysis of USHCN data, it was clear that the best grid box size is much closer to 2.5° x 2.5° than 5° x 5°. After carefully reviewing the data from each analysis discussed above, we recommend the 2.5° x 3.5° degree grid box size for use in analyses involving USHCN temperature data. This grid box size is the size of choice for analyses related to the development of monthly monitoring reports within the Climate Monitoring Branch at NCDC.

to the top References

Top of Page Top of Page

Project Scientist: David Easterling,
Downloaded Sunday, 20-Apr-2014 17:06:18 EDT
Last Updated Tuesday, 20-Oct-2009 11:23:56 EDT by
Please see the NCDC Contact Page if you have questions or comments.