New Approach for Sampling Education Surveys

Research Project

The National Center for Education Statistics (NCES) is continually challenged by the problems and the opportunities as data gathering and data analysis evolve with the rapidity of technological change. Problems include rising rates of non-response and increasing need to reduce response burden. An alternative for the basic design of a survey or assessment was presented for consideration to NCES with the objective of remediating the problem of decreasing response rates at all levels and simultaneously providing robust estimates of the measured outcomes and unbiased variance estimators. The NCES charged the National Institute of Statistical Sciences (NISS) with convening a panel of technical experts to examine theoretical arguments in favor of the design and to consider whether this design is suited to a large-scale federal survey. Specific issues for the panel to consider were: i) the degree to which the proposed new methodology is truly novel; ii) the degree to which it is advantageous over current practices, especially in regard to accuracy of inferences and to variance estimation; and iii) the degree to which it is suited to a large-scale federal survey both in practical terms and in terms of magnitude of improvement over current designs.

Proposed Design Approach

The idea for the proposed design methodology comes from work in 1962 by J.N.K. Rao, H.O. Hartley and W.G Cochran who sought a computationally simple estimation process that would yield an exact variance formula and unbiased variance estimator. The design proposed here would consider the population as divided into two classes: responders and non-responders. Then each stratum (sample size m) would be partitioned into m/2 zones “so that values of sorting variables deemed as non-response predictors are well distributed across zones.” Equivalence groups of units are created by dividing each zone is divided into two groups completely at random. Following the Rao-Hartley-Cochran method a single sample is drawn using proportionate unequal probabilities so that at the final stage, one unit is sampled from each of the m groups (2 groups per zone). If a sampled unit is a non-responder, another unit is sampled; non-responders are replaced until either a respondent unit is drawn or the group is exhausted without response.

A novel application of this approach addresses the repeated surveying for multiple surveys by eliminating sample overlap. By creating the stratum - zone - group structure, any unit within a group that has already been sampled as part of another survey can be treated a fortiori as a non-respondent in this sampling scheme.

The intent of this approach is to mitigate the problems arising from non-response of sampled units. Since there are alternative methods to deal with non-response, theoretical, simulation and real/pilot study comparisons are needed. For the proposed design, both technical and practical aspects need further development. On the technical side, the underlying assumptions have not been rigorously stated. Neither has the justification for the estimation equations nor for the variance estimation been completely worked out.

The essential feature of the design appears to be a substitution procedure with a random component that is coupled to an estimation piece and to variance estimation by treating the groups as the sampled units. However, the group variation is not equivalent to the unit variation. Even a purely unconditional argument that unequal sampling probabilities are preserved under this scheme requires a technical proof. Of course, the number of (potential) non-respondents in the population does not change with the sampling design. So “solving/mitigating non-response” simply shifts those problems to problems of response bias and error.

Summary of Deliberations

Following presentation of the proposed method and discussion of a hypothetical application in the education setting, it was still not clear what problem this method would solve in the context of NCES studies, surveys and assessments. For NCES studies, non-response at the school level is dealt with at the outset for practical reasons. Studies are planned and launched at different times often without the possibility of coordination so that it is difficult to see how the proposed method would fit the context.

  1. It is highly dubious that substitution of group non-response for unit non-response would conform with federal standards for reporting statistical data.
  2. Benefits of the proposed method have not been convincingly demonstrated either theoretically or via simulation. Implementation of this method would be premature. Based on available information at this time it is not clear that after careful study the method will prove advantageous. Necessary steps to investigate the method and its properties are listed below.
  3. Technical development of the proposed method is incomplete. A complete technical formulation would include: i) explicit assumptions, ii) estimators and their properties, iii) variance estimators and their properties, iv) expected total “sample” sizes.
  4. If this is really rejective sampling at the final stage, then the theory should be linked to the extensive body of theory for rejective sampling and its properties.
  5. Simulation needs to be extensive to demonstrate the claimed properties in practice: improved variance estimation, reduced total “sample” size and robustness to misspecification. Unlike the simulation presented that was SRS not PPS, simulation studies should be based on a more realistic structure, unequal probabilities (as for PPS sampling), and the behavior of the variance estimator should be characterized.
  6. Calculation (simulation) and analysis of expected costs is an initial step in planning for implementation, accompanied by development of expected time-schedule for the method to apply in NCES context.
  7. The final step prior to implementation would be demonstration of the method via field test and validation of comparative advantages identified in 2, 3 & 4 above.
Project Goal: 

Specific issues for the panel to consider were:

i) the degree to which the proposed new methodology is truly novel;

ii) the degree to which it is advantageous over current practices, especially in regard to accuracy of inferences and to variance estimation; and

iii) the degree to which it is suited to a large-scale federal survey both in practical terms and in terms of magnitude of improvement over current designs.

Research Team: 

Michael Larsen, Professor, Department of Statistics & Director, Graduate Certificate in Survey Design & Data Analysis, George Washington University

Brian P. Rowan, Burke A. Hinsdale Collegiate Professor in Education, a Research Professor at the Institute for Social Research, and a Professor of Sociology, University of Michigan

Keith Rust, Vice President, Associate Director, Westat, Inc.

Joseph L. Schafer, Senior Mathematical Statistician, Office of Associate Director for Research and Methodology, United States Census Bureau

Elizabeth A. Stasny, Professor Emeritus, Department of Statistics, Ohio State University

Nell Sedransk, Director, National Institute of Statistical Sciences; Statistics Professor, North Carolina State University