NOAA Logo National  Environmental Satellite, Data, and Information Service. National Climatic Data Center National Climatic Data Center, U.S. Department of Commerce
Integrated Global Radiosonde Archive

Data Integration

a. Data Comparisons
A set of intersource data comparisons was performed to check for any inconsistencies in station number assignments or widespread systematic discrepancies among data sources. Using all elements at the five mandatory levels between 850 and 300 hPa, the data for each station in any one source were compared with the data for all other stations in every other source. Taking into account differences in processing procedures among the various data sources, two overlapping station records are considered to closely match each other if a significant percentage of the differences between pairs of values fall within the similarity thresholds listed in Table 1. One would expect to find such a match when data for the same station (e.g., 72210) are available from different sources, but not for two entirely different stations. Yet the latter situation does occur on occasion. For example, for station 72210 in the core GTS data and station 72211 in the U.S. data source, 99.7% of compared values are "identical" during the overlapping period of 1992-1995. Based on an examination of station history information and the various sources of data, such cases were handled either by excluding one or both station records from further processing or, as in the aforementioned example, by reassigning one of the records to the station number of the other.

The comparison results further reveal a number of cases in which overlapping records for a particular station from different sources are less similar than might be anticipated or desirable. For example, for approximately one quarter of all stations compared, the percentage of similarity is less than 90% for at least one data element. Such relatively low similarities tend to be more common during the 1950s and 1960s than in later years. The disparities imply that the integration of different data sources can result in spurious shifts and additional noise in the resulting data set. As a result, the construction of a single merged archive from multiple sources necessitates the development of merging procedures that minimize the risk of introducing such undesirable characteristics.

top of page icon
b. Station Selection and Data Merging
The core IGRA station network consists of land-based stations with data in the NCDC real-time GTS (Figure 1) since these are the stations with the most reliable location information. This network is supplemented with identifiable stations that no longer report observations, but significantly enhance the spatial coverage during the historical record (Figure 2). Given this combined network, the selection of data sources to be used takes place on a station-by-station basis. For any particular station, the core GTS data are used as the base record and supplemented with only those sources for which the percentage of similar values equals at least 90% for each data element in all possible comparisons. Any new source whose record does not provide a period of overlap for comparison with at least one other source is excluded from that particular station’s record. Once the sources to be used for a station have been selected, their data are merged on a sounding-by-sounding basis. When soundings with the same time stamp are available from multiple sources, the sounding with the largest number of values is chosen. The same procedure is also used to eliminate multiple occurrences of soundings for the same station and time within any one data source, which may arise from transmission or processing errors. Since some data sources report the nominal observation time (e.g., 00 UTC) as the observation hour, while others report the hour closest to the launch time (e.g., 23 UTC), the sounding with the largest number of values is also retained when identical soundings appear consecutively within two hours of each other. Allowing for differences in data processing, two soundings from different sources are considered identical if at least 90% of the absolute differences between values at levels common to both soundings fall within the previously defined similarity thresholds (Table 1). Consecutive soundings that meet these criteria of similarity and whose time stamps are more than 2 hours apart are discarded (i.e., the duplication of their data is considered erroneous.)

Two additional procedures are then applied to the merged dataset. Firstly, with the purpose of identifying cases in which identical soundings are reported simultaneously at more than one station, the mandatory-level 850-to-300 hPa data of concurrent soundings from all stations are compared. Approximately 60000 soundings (0.2%) were identified as interstation duplicates and removed from the dataset. Secondly, composite records were created for a number of stations whose radiosonde observations were reported under two or more station numbers over time. Many such changes in station number occurred without a discernible change in station location and were the result of changes in the numbering system used by the WMO (e.g., at Canadian stations in 1977). The compositing procedure merges the records of such stations into one record, which is then assigned the station number of the most recent station. In addition, at stations in the contiguous United States during the 1990s, radiosonde observations were moved from one site to another site close enough to reflect the same regional atmospheric conditions (Elliott et al. 2002). The records of such stations are also combined, as long as they are located within 150 km of each other and their periods of record do not overlap. The 151 composite stations are identified in the IGRA station list, and the dates and times of the first and last soundings of each original station record and the corresponding composite record are listed in an auxiliary documentation file. Users engaged in climate change studies are advised to consider the potential impact of the compositing on their specific analysis, particularly when the emphasis is on the planetary boundary layer.

top of page icon

Elliott, W.P., R.J. Ross, and W. Blackmore, 2002: Recent changes in NWS upper-air observations with emphasis on changes from VIZ to Vaisala radiosondes. Bull. Amer. Meteor. Soc., 83, 1003-1017.
Created by
Downloaded Tuesday, 21-Oct-2014 16:33:17 EDT
Last Updated Wednesday, 20-Aug-2008 12:09:11 EDT by
Please see the NCDC Contact Page if you have questions or comments.