National Climatic Data Center
U.S. Department of Commerce

The USHCN Version 2 Serial Monthly Datasets

National Oceanic and Atmospheric Administration

National Climatic Data Center




Introduction

Since 1987, the National Oceanic and Atmospheric Administration’s (NOAA’s) National Climatic Data Center (NCDC) has used observations from the U.S. Historical Climatology Network (USHCN) to quantify national- and regional-scale temperature changes in the conterminous United States (CONUS). To that end, USHCN temperature records have been “corrected” to account for various historical changes in station location, instrumentation, and observing practice. The USHCN is actually a designated subset of the NOAA Cooperative Observer Program (COOP) Network the USHCN sites having been selected according to their spatial coverage, record length, data completeness, and historical stability. The USHCN, therefore, consists primarily of long-term COOP stations whose monthly temperature records have been adjusted for systematic, nonclimatic changes that bias temperature trends.

U.S. Cooperative Observer Network

Figure 1. Distribution of U.S. Cooperative Observer Network stations in the CONUS. U.S. HCN version 2 sites are indicated as red triangles.

The first USHCN datasets were developed at NOAA's NCDC in collaboration with the Department of Energy's Carbon Dioxide Information Analysis Center (CDIAC) in a project that dates to the mid-1980s (Quinlan et al. 1987). At that time, in response to the need for an accurate, unbiased, modern historical climate record for the United States, personnel at the Global Change Research Program of the U.S. Department of Energy and at NCDC defined a network of 1219 stations in the contiguous United States whose observation would comprise a key baseline dataset for monitoring U.S. climate. Since then, the U S HCN dataset has been revised several times (e.g., Karl et al., 1990; Easterling et al., 1996; Menne et al. 2009). The three dataset releases described in Quinlan et al. 1987, Karl et al., 1990 and Easterling et al., 1996 are now referred to as the USHCN version 1 datasets. These version 1 datasets contained adjustments to the monthly mean maximum, minimum and average temperature data that addressed potential changes in biases (inhomogeneities) in data from USHCN stations that were documented in NCDC’s station history archives. The documented changes that were addressed include changes the time of observation (Karl et al. 1986), station moves and instrument changes (Karl and Williams, 1987; Quayle et al., 1991). Apparent urbanization effects were also addressed in version 1 with a specific urban bias correction (Karl et al. 1988).

In 2007, USHCN version 2 serial monthly temperature data were released and updates to the version 1 datasets were discontinued. Relative to the version 1 releases, version 2 monthly temperature data were produced using an expanded database of raw temperature values from COOP stations, a new set of quality control checks, and a more comprehensive homogenization algorithm. The version 2 temperature dataset and processing steps are described in detail in Menne et al. (2009) and more briefly below.

In October 2012, a revision to the version 2.0 dataset was released as version 2.5. The version 2.5 processing steps are essentially the same as in version 2.0, but the version number change reflects modifications to the underlying database as well as coding changes to the pairwise homogenization algorithm (PHA) that improve its overall efficiency. These modifications are listed in Table 1 below. Details regarding the PHA modifications are provided in NCDC Technical Reports GHCNM-12-01R (Williams et al., 2012a) and GHCNM-12-02 (Williams et al. 2012b).

Version 2.0Version 2.5
Database construction and quality control Monthly mean maximum and minimum temperatures (and total precipitation) were calculated using three daily datasets archived at NCDC (DSI-3200, DSI-3206 and DSI-3210). The daily values were first subject to the quality control checks described in Menne et al. 2009 and only those values that passed the evaluation checks were used to compute monthly average temperatures. Monthly averages were computed only when no more than 9 daily values were missing or flagged by the quality checks. Monthly values calculated from the three daily data sources then were merged with two additional sources of monthly data (DSI 3220 and the USHCN version 1) to form a more comprehensive dataset of serial monthly temperature and precipitation values for each HCN station. Duplicate records between data sources were eliminated and values from the daily sources were used in favor of values from the two monthly sources. DSI 3200 was used in favor of the USHCN v1 database. . Monthly values were subject to a separate suite of checks as described in Menne et al. 2009 Monthly mean maximum and minimum temperatures (and total precipitation) were calculated using GHCN-Daily (Menne et al. 2012). The daily values are first subject to the quality control checks described in Durre et al. (2010). Only those values that pass the GHCN-Daily QC checks are used to compute the monthly values. Further, a monthly mean is calculated only when nine or fewer daily values are missing or flagged.

Monthly values calculated from GHCN-Daily are merged with the USHCN version 1 monthly data to form a more comprehensive dataset of serial monthly temperature and precipitation values for each HCN station. Duplicate records between data sources were eliminated and values from GHCN-Daily are used in favor of values from the USHCN version 1 raw database. USHCN version 1 data comprise about 5% of station months, generally in the earliest years of the station records.

Monthly mean temperature values are then subject to an addition set of monthly QC tests as described in Lawrimore et al. (2011).
Pairwise Homogenization Algorithm (PHA) Version Number 52d (source code) 52i (source code)
Re-processing frequency The raw database was constructed in 2006 using the sources described above (and in Menne et al. 2009 ) and updated thereafter with monthly values computed from GHCN-Daily.

The temperature data were last homogenized by the PHA algorithm in May 2008. Since May 2008, more recent data have been added to the homogenized database using the monthly values computed from GHCN-Daily (but without re-homogenizing the dataset).
The raw database is routinely reconstructed using the latest version of GHCN-Daily, usually each day. The full period of record monthly values are re-homogenized whenever the raw database is re-constructed (usually once per day)
Data format Six-digit station identification number One data value flag (see the version 2 readme.txt file for details). Eleven-digit station identification number similar to that used in GHCN-Daily. A network code of ‘H’ is used and the last six-characters of the id are the coop identification number. Three flags accompany each monthly values (data measure flag, data quality flag, data source flag) as in GHCN-Monthly version 3 [see the version 2.5 readme.txt file for details)
Version Control/Time Stamping Data files labeled with the latest available data month (e.g., 9641C_yyyymm.dataset.element.gz; where yyyy=year and mm=month; dataset=raw, tob, or F52; and, element=max. min, avg, or pcp) Data files are all of the format ushcn.element.latest.dataset.tar.gz where element=tmax, tmin, tavg, or prcp; and, dataset=raw, tob, or FLs.52i. The data will untar/uncompress into a directory called ushcn.version.date where version=2.5.0 and date=the year, month and day (yyyymmdd) that the data were last reprocessed and updated.

A brief summary of version 2 processing steps is provided below. A more comprehensive summary, including discussions of the sources and magnitude of bias in the raw (unadjusted) data, is provided in Menne et al. (2009) . An assessment specifically addressing the reliability of the USHCN temperature trends in light of station siting concerns is also provided below and in more detail by Menne et al. (2010). Details of the pairwise homogenization algorithm and its evaluation against synthetic benchmark datasets with realistic bias-error scenarios are provided in Williams et al. (2012c). A comparison of the USHCN v2 homogenization algorithm (version 52d) and other approaches to homogenization is provided in Venema et al. (2012). A comparison of adjusted and unadjusted USHCN temperature trends with a number of atmospheric reanalysis datasets is described in Vose et al. (2012).


Version 2 Monthly Temperature Homogenization Processing Steps

The data from each HCN station were subject to the following quality control and homogeneity testing and adjustment procedures.

Quality Evaluation and Database Construction

See Table 1 for an overview of the database construction and quality control steps.

Time of Observation Bias Adjustments

After the quality control of the monthly database, monthly temperature values are adjusted for the time-of-observation bias (Karl et al. 1986; Vose et al., 2003). The Time of Observation Bias (TOB) arises when the 24-hour daily summary period at a station begins and ends at an hour other than local midnight. When the summary period ends at an hour other than midnight, monthly mean temperatures exhibit a systematic bias relative to the local midnight standard (Baker, 1975). In the U.S. Cooperative Observer Network, the ending hour of the 24-hour climatological day typically varies from station to station and can change at a given station during its period of record. The TOB-adjustment software uses an empirical model to estimate and adjust the monthly temperature values so that they more closely resemble values based on the local midnight summary period. The metadata archive is used to determine the time of observation for any given period in a station's observational history. The TOB is only of concern with regard to temperature trends because there has been a widespread conversion from afternoon to morning observation times in the USHCN (and in the COOP Network more generally). As discussed in Menne et al. (2009) , these changes in the time of observation have led to a broad-scale, spurious reduction in mean temperatures, one that is simply caused by the conversion in the daily reading schedule of the Cooperative Observers.

Homogeneity Testing and Adjustment Procedures

Following the TOB adjustments, the homogeneity of the TOB-adjusted temperature series is assessed. In the USHCN version 1 temperature monthly datasets, homogeneity adjustments were performed using the procedure described in Karl and Williams (1987). This procedure was used to evaluate non-climatic discontinuities (artificial changepoints) in a station's temperature or precipitation series caused by known changes to a station such as equipment relocations and changes. Since knowledge of changes in the status of observations comes from the station history metadata archive maintained at NCDC, the original U.S. HCN homogenization algorithm was known as the Station History Adjustment Program (SHAP).

Unfortunately, station histories are often incomplete so artificial discontinuities in a data series may occur on dates with no associated record in the metadata archive. Undocumented station changes obviously limit the effectiveness of SHAP. To remedy the problem of incomplete station histories, the version 2 homogenization algorithm addresses both documented and undocumented discontinuities.

The potential for undocumented discontinuities adds a layer of complexity to homogeneity testing. Tests for undocumented changepoints, for example, require different sets of test-statistic percentiles than those used in analogous tests for documented discontinuities (Lund and Reeves, 2002). For this reason, tests for undocumented changepoints are inherently less sensitive than their counterparts used when changes are documented. Tests for documented changes should, therefore, also be conducted where possible to maximize the power of detection for all artificial discontinuities. In addition, since undocumented changepoints can occur in all series, accurate attribution of any particular discontinuity between two climate series is more challenging (Menne and Williams, 2005).

The U.S. HCN version 2 "pairwise" homogenization algorithm(PHA) addresses these and other issues according to the following steps, which are described in detail in Menne and Williams (2009). At present, only temperature series are evaluated for artificial changepoints.

  1. First, a series of monthly temperature differences is formed between numerous pairs of station series in a region. Specifically, difference series are calculated between each target station series and a number (up to 40) of highly correlated series from nearby stations. In effect, a matrix of difference series is formed for a large fraction of all possible combinations of station series pairs in each localized region. The station pool for this pairwise comparison of series includes U.S. HCN stations as well as other U.S. Cooperative Observer Network stations.
  2. Tests for undocumented changepoints are then applied to each paired difference series. A hierarchy of changepoint models is used to distinguish whether the changepoint appears to be a change in mean with no trend (Alexandersson and Moberg, 1997), a change in mean within a general trend (Wang, 2003), or a change in mean coincident with a change in trend (Lund and Reeves, 2002) . Since all difference series are comprised of values from two series, a changepoint date in any one difference series is temporarily attributed to both station series used to calculate the differences. The result is a matrix of potential changepoint dates for each station series.
  3. The full matrix of changepoint dates is then "unconfounded" by identifying the series common to multiple paired-difference series that have the same changepoint date. Since each series is paired with a unique set of neighboring series, it is possible to determine whether more than one nearby series share the same changepoint date.
  4. The magnitude of each relative changepoint is calculated using the most appropriate two-phase regression model (e.g., a jump in mean with no trend in the series, a jump in mean within a general linear trend, etc.). This magnitude is used to estimate the "window of uncertainty" for each changepoint date since the most probable date of an undocumented changepoint is subject to some sampling uncertainty, the magnitude of which is a function of the size of the changepoint. Any cluster of undocumented changepoint dates that falls within overlapping windows of uncertainty is conflated to a single changepoint date according to
    1. a known change date as documented in the target station's history archive (meaning the discontinuity does not appear to be undocumented), or
    2. the most common undocumented changepoint date within the uncertainty window (meaning the discontinuity appears to be truly undocumented)
  5. Finally, multiple pairwise estimates of relative step change magnitude are re-calculated (as a simple difference in mean) at all documented and undocumented discontinuities attributed to the target series. The range of the pairwise estimates for each target step change is used to calculate confidence limits for the magnitude of the discontinuity. Adjustments are made to the target series using the estimates for each shift in the series.
Estimation of Missing Values

Following the homogenization process, estimates for missing data are calculated using a weighted average of values from highly correlated neighboring values. The weights are determined using a procedure similar to the SHAP routine. This program, called FILNET, uses the results from the TOB and homogenization algorithms to obtain a more accurate estimate of the climatological relationship between stations. The FILNET program also estimates data across intervals in a station record where discontinuities occur in a short time interval, which prevents the reliable estimation of appropriate adjustments.

Urbanization Effects

In the USHCN version 1 dataset, the regression-based approach of Karl et al. (1988) was employed to account for the impact of urban heat islands on USHCN stations trends. In contrast, no specific urban correction is applied in version 2 because the change-point detection algorithm effectively accounts for any "local" trend at any individual station. In other words, the impact of urbanization and other changes in land use is likely small in HCN version 2. Figure 2 - the minimum temperature time series for Reno, Nevada - provides anecdotal evidence in this regard. In brief, the black line represents the unadjusted data, and the blue line represents fully adjusted data. The unadjusted data clearly indicate that the station at Reno experienced both major step changes (e.g., a move from the city to the airport during the 1930s) and trend changes (e.g., a possible growing urban heat island beginning in the 1970s). In contrast, the fully adjusted (homogenized) data indicate that both the step-type changes and the trend changes have been effectively addressed through the change-point detection process used in HCN version 2.

U.S. Cooperative Observer Network

Figure 2. (a) Mean annual unadjusted and fully adjusted minimum temperatures at Reno, Nevada. Error bars indicating the magnitude of uncertainty (±1 standard error) were calculated via 100 Monte Carlo simulations that sampled within the range of the pairwise estimates for the magnitude of each inhomogeneity; (b) difference between minimum temperatures at Reno and the mean from its 10 nearest neighbors.


Photographic documentation of poor siting conditions at stations in the USHCN has led to questions regarding the reliability of surface temperature trends over the conterminous U.S. (CONUS). To evaluate the potential impact of poor siting/instrument exposure on CONUS temperatures, The Menne et al. (2010) compared trends derived from poor and well-sited USHCN stations using both unadjusted and bias-adjusted data. Results indicate that there is a mean bias associated with poor exposure sites relative to good exposure sites in the unadjusted USHCN version 2 data; however, this bias is consistent with previously documented changes associated with the widespread conversion to electronic sensors in the USHCN during the last 25 years Menne et al. (2009) . Moreover, the sign of the bias is counterintuitive to photographic documentation of poor exposure because associated instrument changes have led to an artificial negative ("cool") bias in maximum temperatures and only a slight positive ("warm") bias in minimum temperatures.

Adjustments applied to USHCN Version 2 data largely account for the impact of instrument and siting changes, although a small overall residual negative (“cool”) bias appears to remain in the adjusted USHCN version 2 CONUS average maximum temperature (see also Fall, S. (2011)). Nevertheless, the adjusted USHCN CONUS temperatures are well aligned with recent measurements from the U.S. Climate Reference Network (USCRN). This network was designed with the highest standards for climate monitoring and has none of the siting and instrument exposure problems present in USHCN. The close correspondence in nationally averaged temperature from these two networks is further evidence that the adjusted USHCN data provide an accurate measure of the U.S. temperature.

The Menne et al. (2010) results underscore the need to consider all changes in observation practice when determining the impacts of siting irregularities. Further, the influence of non-standard siting on temperature trends can only be quantified through an analysis of the data which do not indicate that the CONUS average temperature trends are inflated due to poor station siting.

Four sets of USCHN stations were used in the Menne et al. (2010) analysis. Set 1 includes stations identified as having good siting by the volunteers at surfacestations.org. Set 2 is a subset of set 1 consisting of the set 1 stations whose ratings are in general agreement with an independent assessment by NOAA’s National Weather Service. Set 3 are those stations with moderate to poor siting ratings according to surfacestations.org. Set 4 is a subset of set 3 consisting of the set 3 stations whose ratings are in agreement with an independent assessment by NOAA’s National Weather Service. For further information, please see Menne et al. (2010). The set of Maximum Minimum Temperature Sensor (MMTS) stations and Cotton Region Shelter (Stevenson Screen) sites used in Menne et al. (2010) are also available (see the "readme.txt" file as described below for a description of the station list format). Access to the unadjusted, time of observation adjusted, and fully adjusted USHCN version 2 temperature data is described below.


Data Access

U.S. HCN version 2.5 monthly data are available via ftp at ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/v2.5/. Users are requested to cite the version number and timestamp of the dataset when USHCN v2.5 data are used for analysis (as well as Menne et al. 2009). Please see the "readme.txt" file in the v2.5 directory for information on downloading and reading the v2.5 data. Information about the processing of the USHCN v2.5 data is provided in the "status.txt" file. U.S. HCN version 2 monthly data will continue to be updated through 2012 and will be available in static form thereafter; however, users are encouraged to make the transition to version 2.5 as soon as possible.

Pairwise Homogeneity Adjustment Software

The automated pairwise bias algorithm (PHA) software (Menne and Williams 2009) version 52i used to detect and adjust for documented and undocumented inhomogeneities in the U.S. HCN version 2.5 monthly temperature dataset is available via ftp at ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v3/software. PHA version 52d used to adjust the v2 dataset is available at ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/v2/monthly/software. Please refer to the README text file in these directories for guidance on how to download, uncompress, compile and run the pairwise homogenization software. The "tar/g zipped" file contains all of the necessary software to run the pairwise homogenization procedure. A simulated test dataset is included with the software along with a file of the expected output that can be used to verify proper execution of the code.


References