The quality of radiosonde data is compromised by a variety of observation, transmission, and processing problems (Schwartz and Doswell 1991; Gandin et al. 1993; Gaffen 1994). In general, quality assurance procedures for sounding data rely on principles of internal consistency, basic physical relationships, and/or statistical methods (Kahl et al. 1992; Eskridge et al. 1995; Loehrer et al. 1996; Collins 2001a,b). Some approaches employ a decision-making algorithm that takes into account the results of multiple tests, while others apply a sequence of independent checks. Since the performance and complexity of the decision-making approach are highly dependent on the number and types of checks applicable to any particular data point, the sequential approach is more straightforward to evaluate when working with a dataset with variable temporal and spatial resolution. Consequently, a sequential approach is employed in IGRA.
To account for the variety of errors that may be present, the IGRA quality assurance system consists of a series of specialized algorithms that are applied successively. Each successive check makes a binary decision on the quality of a value, level, or sounding; either the data item passes the check and remains available, or it is identified as erroneous and thus set to missing. As discussed in Peterson and Vose (1997), this approach relieves the end-user from the burden of determining the meaning of quality flags. However, for users interested in making their own binary decision based on our quality assessment results, record-keeping files listing erroneous values are provided by the authors upon request. For all checks, the thresholds used to identify erroneous values were selected based on a careful evaluation of both summary statistics and specific examples of the values identified as unrealistic.
The IGRA quality assurance procedures can be grouped into seven general categories (Table 1):
- Fundamental "sanity" checks
- Checks on the plausibility and temporal consistency of surface elevation
- Internal consistency checks
- Checks for the repetition of values
- Climatologically-based checks
- Checks on the vertical and temporal consistency of temperature
- Data completeness checks
The first four categories eliminate gross errors that might compromise the performance of subsequent algorithms. The climatology and temperature consistency checks identify outliers based on station-specific climatological parameters and are applicable only when sufficient data are available for computing the required statistics. Although all variables are quality-assured, temperature, pressure, and geopotential height receive somewhat greater scrutiny in order to facilitate operational climate monitoring activities at NCDC.
a. Fundamental sanity checks
Each data source undergoes two sanity checks, the first being a basic plausibility check to determine whether the date, observation hour, launch time, and data values in each sounding fall within certain gross plausibility limits (Table 2). The date and time limits identify instances of invalid days of the month (e.g., April 31), invalid times of day, and soundings with a missing observation hour. Soundings with such invalid dates or times are excluded from further processing. The data limits are chosen so as to remove values that clearly exceed all known world extremes, such as temperatures less than -120°C or greater than 70°C. Overall, 0.25% of all date/time stamps as well as 0.025% of all data values were found to be implausible.
The second sanity check, which focuses on "duplicate" data, identifies cases in which two or more data levels within a sounding have identical pressure values or, if no pressure is reported, identical heights. Such cases of level duplication are addressed by removing any data values that differ among the duplicate levels and combining the remaining data into one level. For example, a sounding may contain two 500-hPa levels, one with geopotential height, temperature, and dewpoint depression and one with geopotential height, wind direction, and wind speed. If the geopotential height values at the two levels are identical, the data from the two levels are combined into one level containing all variables. If, however, the two geopotential height values do not agree, then the geopotential heights are removed from both levels, and the remaining values are combined into one level from which only geopotential height is missing. Of the more than 30 million soundings processed, approximately one quarter contained duplicate levels, with an average of three such levels per sounding. Discrepancies in data values, however, were found only at a few percent of these duplicate levels.
b. Checks on surface elevation
Surface observations are frequently included in a sounding as a "surface level" identified by a special level type indicator. The height of such levels generally originates either from the source of the sounding data or from various station lists used during initial processing at NCDC. The accuracy and temporal consistency of these heights can thus be compromised by errors in the original data sources or station lists, by processing problems, or by the integration of multiple sources reporting different elevations for the same station in time. Consequently, it was necessary to develop procedures for the removal of gross errors and unrealistic temporal variations in surface level heights.
The two surface elevation checks involved the computation of "monthly median elevations" as well as the inspection of elevation time series for unrealistic spikes or jumps. First, isolated errors were removed and intersource discrepancies were reduced by replacing the surface level height in each sounding with the monthly median elevation generated from all available soundings for the corresponding station, year, and month. Next, each station’s time series of monthly median elevations was examined for unrealistic features, periods with implausible elevations were identified, and the respective surface level heights were set to missing. In inspecting the elevation time series, features considered unrealistic included any combination of the following characteristics: significant (>50 m) discontinuities or spikes in the time series, inconsistencies with corresponding time series of surface pressure, and a large discrepancy with either the elevation reported in WMO Publication 9 Volume A (WMO 2004) or the elevation of the nearest grid point in the Global One-kilometer Base Elevation (GLOBE) dataset (NGDC 2004).
An example of a station with implausible and temporally inconsistent elevations is shown in Figure 1. This station, Atyran, Kazakhstan, has a WMO elevation of -28 m, a GLOBE elevation of -37 m, and a mean surface pressure of 1018 hPa. Thus, the monthly median elevations around 3000 m up to the early 1960s, around 500 m in the mid-1970s, and around 10 000 m in 1982 are grossly inconsistent with the remainder of the time series as well as with the other sources of station elevation. Consequently, Atyran's surface level heights during these months were set to missing in IGRA.
As a result of the procedures described above, the insertion of the monthly median elevation resulted in a change from the original surface level height in approximately 3% of all soundings with a designated surface level. Based on the time series inspection, the surface level height was removed from an additional 1% of surface levels. Since these procedures require both manual inspection and the availability of data for an entire month, they are not part of the system that updates the archive on a daily basis. In update mode, the height of a surface level is set to the station’s most recent known elevation, and internal consistency checks are used to remove any grossly erroneous elevations.
c. Internal consistency checks
The internal consistency checks developed for IGRA address cases of physical inconsistency among different variables or among values of one variable at different levels within a sounding. For instance, two algorithms evaluate the physical consistency of pressure and geopotential height. Another series of checks ensures that a sounding contains at most one valid surface level and no below-surface levels. Additional checks include one that compares the release time to the reported observation hour and one that evaluates wind direction when the wind speed is 0.
The first algorithm comparing pressure and geopotential height is similar to a hydrostatic check (Gandin 1988) but is independent of the temperature profile within the sounding examined. In this "hypsometric check," the range of plausible pressure values for any given height is determined from the hypsometric equation using the extreme values of the average temperature of the atmospheric layer between the surface and the level in question. The extremes of the layer-average temperature are computed using the lapse rates from the 1976 U.S. standard atmosphere and assuming surface temperatures of -60°C for the cold extreme and 60°C for the warm extreme. Given these parameters, the hypsometric check removes gross inconsistencies, such as 30-hPa levels with geopotential heights of 0 and surface levels with geopotential heights of 3000 m (Figure 2). Such inconsistencies were found at 0.09% of the approximately 800 million levels in the dataset.
Although the hypsometric check removes gross inconsistencies between pressure and height, it does not guarantee the monotonic increase of geopotential height with decreasing pressure. To ensure that this basic relationship holds true in all soundings, a second algorithm, the "height sequence check," compares the changes in pressure and height between all possible pairs of levels within a sounding. In this iterative multi-step procedure, the height of each pressure level k is compared with the height of every level j having a higher pressure. If the geopotential height of level k is found to be less than or equal to the geopotential height of level j, the numbers of violations for levels j and k are each incremented by 1. Once all possible pairs of levels within the sounding have been compared, the level with the largest number of violations is removed. This process is then repeated until no more violations are found. Based on the height sequence check, approximately 0.003% of the levels in the dataset were removed.
Following the hypsometric and height sequence checks, each sounding is inspected for the existence of multiple surface levels. In soundings in which more than one surface level remains, all such levels are deleted. When a level containing only height and wind values is located at the same elevation as the surface pressure level, the two levels are merged into one surface level. Of the 28 million soundings processed, approximately 55% contained a valid surface pressure level, 8.4% required the merging of surface pressure and wind levels, and 0.04% contained multiple surface levels. In addition, a one-time manual inspection of the historical records of surface pressure and temperature was aimed at identifying gross shifts or inconsistencies in the two variables. This analysis revealed unrealistic features that prompted the removal of surface levels for 1968-1970 at former Soviet Union stations as well as for 1967-1972 and 1992-1997 at Chinese stations.
Several of the data sources contain levels whose pressure or geopotential height is below the surface pressure or elevation of the station. In general, these "below-surface" levels consist of data that have been extrapolated from the surface down to any mandatory pressure that happens to fall below the surface. When extrapolated levels are flagged as such in the source dataset, they are automatically excluded from IGRA. However, because some extrapolated levels are not correctly labeled and because transmission errors can also produce below-surface levels, an additional check identifies all types of below-surface levels. Specifically, a pressure level is considered to fall below the surface if its pressure is higher than the pressure of the surface level or its geopotential height is less than the height of the surface level. In a sounding without a valid surface level, any pressure level whose geopotential height is at least 10 m below the median elevation of the current month is removed. Based on these thresholds, 0.05% of the levels processed were identified as below-surface levels.
An examination of the data revealed the necessity for two additional simple consistency checks. In the check comparing the observation hour of a sounding with the corresponding reported launch time, soundings are deleted if the launch time deviates by more than three hours from the observation hour. Differences of such magnitude were identified in approximately 0.25% of all soundings. Another check removes wind direction and speed when the speed is equal to 0 and the direction is neither 0 nor 360 degrees, a condition found at 0.16% of all levels.
d. Checks for the repetition of values
The next set of checks looks for runs of values in time and in the vertical. A run is defined as the repetition of a value over a certain number of consecutive soundings or levels, ending with a change to another non-missing data value; the absence of a value in a sounding or level does not interrupt a run.
The following four checks are applied:
(1) a check for runs in surface pressure, surface- and mandatory-level temperature, and mandatory-level geopotential height that extend over more than 15 consecutive soundings;
(2) an hour-specific (e.g., 00 UTC) runs-in-time check analogous to check (1);
(3) a procedure that looks for temperatures of the same value extending across at least five consecutive surface/mandatory levels or across at least five significant levels in a sounding;
(4) a pairwise vertical run check that identifies the repetition of the same value in either temperature and dewpoint depression or wind direction and speed over at least five consecutive pressure or height-only levels.
Among the more interesting runs identified are cases of 40 consecutive 1000-hPa surface levels, -7.5°C temperatures at nine consecutive mandatory levels between 850 and 30 hPa in a sounding, ten 24.4°C temperatures at significant levels between 937 and 429 hPa, and 0 wind speed and direction throughout an entire sounding.
The manual inspection of extremely long runs also revealed the existence of several peculiar data problems. These problems consist of excessively frequent occurrences of certain temperature or geopotential height values within specific geographical regions, periods, data sources, and atmospheric levels. In the most egregious case, mandatory levels at and above 100 hPa (as well as at 1000 hPa) contained an unusually high number of 7.1°C temperatures at many stations during November and December 1967. All such values were eliminated by specifically designed checks as they might otherwise seriously impact the quality of IGRA data. All in all, the various procedures for identifying excessive repetition of values removed approximately 0.02% of all data values.
e. Climatological checks
A two-tiered set of climatological checks removes geopotential height, temperature, and pressure values that deviate by more than a certain number of standard deviations (STDs) from their respective long-term means. In the first tier, the climatological means and STDs are calculated for the entire period of record for each station and pressure level, whereas in the second phase, the climatological statistics are stratified by time of year and time of day. Due to their less stringent data requirement, the tier-1 checks can be applied to a larger number of data values than the tier-2 checks. On the other hand, the tier-2 checks allow for the use of tighter thresholds in the identification of outliers because their STDs do not reflect the seasonal and diurnal variations included in the tier-1 statistics. Furthermore, the tier-2 statistics are not computed until after the tier-1 checks have been applied and thus are based on a cleaner set of data.
The means and STDs of surface pressure and temperature as well as mandatory-level geopotential height and temperature are calculated using biweight statistics as described by Lanzante (1996). The biweight statistics tend to be more resistant to outliers that may be present in data that have not undergone advanced quality assurance. For the tier-1 checks, a mean and STD are produced as long as at least 120 values are available for a given station, level, and variable during the station’s period of record. For the tier-2 checks, statistics are calculated for 45-day windows centered on each day of the year and in 3-hour windows, provided that at least 150 values are available for any station, level, and variable in a given time interval. The means and STDs at other pressure levels (e.g., significant levels) are derived as needed by interpolating linearly with respect to the logarithm of pressure between the nearest adjacent mandatory levels. Recognizing that actual changes in temperature with height are not always linear, we compared the statistics derived by linear interpolation with those computed using all available levels (mandatory and significant) in 1-hPa slabs throughout the troposphere and stratosphere. Visual inspection of the two types of climatological profiles at a set of 87 globally distributed stations (Lanzante et al. 2003) revealed few significant differences, suggesting the linearity assumption is viable from a quality assurance perspective.
To choose thresholds for labeling values as outliers, we visually compared, for all stations, the time series prior to the climatological checks to those following the application of the tier-1 and tier-2 checks, using various thresholds between 3 and 7 STDs. We subjectively identified thresholds such that the algorithms neither remove a disproportionate number of values within the normal range of variability nor fail to remove a significant number of points that are clear outliers. In the tier-1 check, a threshold of 6 STDs was chosen for all three variables. For the tier-2 check, a threshold of 5 STDs was chosen for geopotential height, temperature, and below-normal surface pressure, and a threshold of 4 STDs was selected for above-normal pressure. (The asymmetric thresholds for above- and below-normal surface pressure were set in recognition of the fact that high-pressure anomalies tend to be smaller in magnitude than low-pressure anomalies.) These thresholds resulted in the removal of approximately 0.1% of all pressure, temperature, and geopotential height values by the tier-1 and tier-2 checks.
f. Additional checks on temperature
The inspection of various temperature time series and soundings revealed that the climatological check alone is incapable of satisfactorily removing all outliers without also removing realistic extremes. Figure 3 and Figure 4 show examples of a time series and a sounding with outliers that are clearly erroneous when viewed in context with other temperatures within their temporal and vertical vicinity. However, to address outliers that pass the climatological checks but are vertically or temporally inconsistent, additional vertical and temporal consistency checks were developed specifically for temperature. These procedures are described briefly here and in more detail in a separate paper (in preparation).
The supplemental vertical consistency checks for temperature employ z-score profiles derived from the tier-2 climatological means and STDs. For instance, an entire temperature profile is eliminated if it is judged to be grossly abnormal in terms of either its median z-score or its median absolute level to level z score difference. Additional checks remove one or more temperatures from a profile if the z scores are clearly inconsistent with either the entire profile or values at adjacent levels. When applied to IGRA, the procedures together identified 0.08% of all temperatures as vertically inconsistent.
Two temporal consistency checks are also applied to surface and mandatory-level temperatures. These checks are based on z scores derived using the overall mean and STD for any station and level, provided that at least 120 such values remain following the climatological and vertical consistency checks. The first identifies outliers that differ by more than two STDs from all other temperatures within +-22.5 days, while the second variant uses a difference threshold of one STD and time window of 2.5 years on either side of the potential outlier. Both variants examine only those temperatures whose absolute z score is greater than 2.5 and require that temperatures be available on at least half of the days in the time window. The temporal consistency checks together removed approximately 0.004% of the temperatures from IGRA.
g. Checks for data completeness
The IGRA quality assurance process also ensures that the dataset adheres to certain minimum requirements for completeness. For example, each station must have at least 100 soundings. An "isolated sounding check" eliminates groups of fewer than three soundings surrounded by at least 31 days without data, groups of fewer than 15 soundings surrounded by gaps of three months (92 days), and groups of fewer than 28 soundings flanked by gaps of half a year (182.5 Days). Within a sounding, wind speed and direction must always appear together, and a dewpoint depression may exist only if it is accompanied by a temperature at the same level. A pressure level is retained if it contains valid thermodynamic data and/or valid wind data. Levels with a height but no pressure are permitted to exist if they contain valid wind data. A sounding may consist of any combination of pressure levels and height-wind levels as long as there is at least one non-surface level.
Collins, W.G., 2001a: The Operational Complex quality control of radiosonde heights and temperatures at the National Centers for Environmental Prediction. Part I: Description of the method. J. Appl. Meteor., 40, 137-151.
_______, 2001b: The Operational Complex quality control of radiosonde heights and temperatures at the National Centers for Environmental Prediction. Part II: Examples of error Diagnosis and correction from operational use. J. Appl. Meteor., 40, 152-168.
Eskridge, R. E., O.A. Alduchov, I.V. Chernykh, P. Zhai, A.C. Polansky, and S.R. Doty, 1995: A comprehensive aerological reference data set (CARDS): Rough and systematic errors. Bull. Amer. Meteor. Soc., 76, 1759-1775.
Finger, F.G., and F.J. Schmidlin, 1991: Meeting review: Upper-Air Measurements and Instrumentation Workshop. Bull. Amer. Meteor. Soc., 72, 51-56.
Gaffen, D.J., 1994: Temporal inhomogeneities in radiosonde temperature records. J. Geophys. Res., 99, 3667-3676.
Gandin, L.S., 1988: Complex quality control of meteorological observations. Mon. Wea. Rev., 116, 1137-1156.
_______, L.L. Morone, and W.G. Collins, 1993: Two years of op erational comprehensive hydrostatic quality control at the National Meteorological Center. Wea. Forecasting, 8, 57-72.
Kahl, J.D., M.C. Serreze, S. Shiotani, S.M. Skony, and R.C. Schnell, 1992: In-situ meteorological sounding archives for Arctic studies. Bull. Amer. Meteor. Soc., 73, 1824-1830.
Lanzante, J.R., 1996: Resistant, robust and nonparametric techniques for analysis of climate data: Theory and examples, including applications to historical radiosonde station data. Int. J. Climatol., 16, 1197-1226.
_______, S.A. Klein, and D.J. Seidel, 2003: Temporal homogenization of monthly radiosonde temperature data. Part I: Methodology. J. Climate, 16, 224-240.
Loehrer, S.M., T.A. Edmands, and J.A. Moore, 1996: TOGA COARE Upper-Air Sounding Data Archive: Development and quality control procedures. Bull. Amer. Meteor. Soc., 77, 2651-2672.
National Geophysical Data Center (NGDC), Cited 2004: Global Land One-kilometer Base Elevation (GLOBE). [Available online from NGDC GLOBE Project .]
Peterson, T.C., and R.S. Vose, 1997: An overview of the Global Historical Climatology Network Temperature Database. Bull. Amer. Meteor. Soc., 78, 2837-2849.
Schwartz, B.E., and C.A. Doswell III, 1991: North American rawinsonde observations: Problems, concerns, and a call to action. Bull. Amer. Meteor. Soc., 72, 1885-1896.
WMO, cited 2004: Weather Reporting, Observing Stations. Publ. 9. [Available online from WMO web-site .]