Information Technology and Statistics Comment on the PITAC Report

Publication Date: 
April 9, 1999

1. Overview

The President's Information Technology Advisory Committee report Information Technology Research: Investing in Our Future, to which we refer as PITAC (Available on-line at www.ccic.gov/ac/report/) makes and justifies a series of rather specific recommendations for research covering a vast array of software and hardware practices and issues. These recommendations have major statistical ramifications that will play significantly in building the 21st Century Information Infrastructure.

The report pointedly notes that the future of our society depends heavily on information technology, and presents a compelling case that economic and social benefits of IT cannot be realized without massive, continuing investment in IT research and development.

PITAC frames the research needs through the pull of "Grand Challenge Transformations" in the way we communicate, deal with information, learn, transact business, carry on work, deal with health care, design and manufacture, conduct research, maintain our environment, and provide governmental services and information.

The research agenda advanced by PITAC has four principal themes addressing both the future information infrastructure and the people who use it:

  • Developing software that is reliable, adaptable and predictable;
  • Building data networks that are flexible and scalable, to meet widespread demands, as well as the software applications that exploit the capabilities of the hardware;
  • Creating high-end computing systems for researchers and industry, and to support critical national interests;
  • Understanding and evaluating socio-economic impacts of IT, especially on the workforce.

This agenda clearly calls for important research in computer science and social science. Equally essential for developing the new infrastructure are new statistical methods, created in collaboration with the other disciplines, to work in the staggeringly complex settings of the future. Scalable statistical techniques are needed to integrate vast amounts of data of disparate types (laboratory experiments, observational data and output of numerical simulation models) and qualities with each other and with models having a range of capabilities and structures Risks and uncertainties must be quantified. To date, none of these areas has received the necessary attention.

Formulating and addressing the details of the research are challenges to statistics research and researchers, and to the cross-disciplinary teams in which they join. These challenges must be taken on.

There is great potential through Centers (labeled in § 4.1 of PITAC as Expedition Centers), focused on specific disciplinary themes and making use of testbeds and simulations to explore new ideas and technologies. Establishing one Center, or a significant component of a Center, that emphasizes statistical aspects, and interacts closely with other Centers, would be an imaginative move to attain the goals of the report. Such a Center would forge the cross-disciplinary teams needed to do the research. In addition, it could stimulate the creation of innovative educational programs to produce scientists trained equally in statistics and computer science, whose role in the research will be central.

2. Specific Recommendations in PITAC

The examples below illustrate intimate connections between statistics and other sciences that are important for carrying out the PITAC recommendations.

Software. PITAC singles out complexity of large software systems as a key obstacle: such systems cannot be described, nor can their behavior (for example, change with time or in consequence of new requirements) be predicted. Absent ability to describe and predict, software is difficult-to-impossible to change or test, and scientifically based design of software remains a remote objective.

The necessary statistical response builds on but goes beyond paths laid out in the National Research Council report Statistical Software Engineering (National Academy Press, Washington, 1996). Data about software and software development organizations are obtainable through change management systems and other sources. Tools, visualizations and inference algorithms that cope with software change data, perfective maintenance (restructuring of "decayed" legacy code without adding new functionality, in order to facilitate future changes), and prediction of future changes and costs are being developed, but must be extended to allow for additional complexity. Scalable algorithms that characterize and quantify risk and uncertainty will be a principal contribution of statistics.

Data Networks. To scale the information infrastructure, PITAC calls for research to build and simulate models of large, complex geographically distributed systems, and to create tools for network measurement and management. Achieving these goals requires understanding data (IP) networks to the same depth and in the same detail as we understand circuit-switched telephone networks, in order to:

  • Develop strategies for instrumentation and data collection;
  • Construct basic characterizations and visualizaions that capture essential high-level temporal and spatial network behavior, as well as extreme behavior;
  • Quantify changes over time;
  • Build software tools to monitor and evaluate solutions to acute and chronic network operation problems, as well as monitor and maintain quality of service.

In conjunction with expertise on databases, network protocols and operations and visualization, statistical advances are needed to address these issues as well as cope with intrusions, inaccurate data, and large numbers of transactions.

Human Implications. Assessing socio-economic impacts and effects on the workforce/workplace stimulates a broad range of issues such as:

  • Defining, measuring and evaluating worker and management interactions in new information technology settings;
  • Quantifying the effects of electronic commerce, on both buyers and sellers;
  • Maintaining privacy while enabling essential access to government (or other) data, a central issue for the digital government of the next century. (The report of a May, 1997, Workshop on Research and Development Opportunities in the Federal Information Services is available on-line at www.isi.edu/nsf/prop.html.) 

Instrumenting and analyzing collected data is a prerequisite to meaningful study of how workers and management treat information and exchange knowledge in virtual organizations (tied together through network computer systems). Obtaining essential knowledge without impairing productivity or invading privacy needs techniques for instrumenting software and ways to combine the resultant data with those from targeted surveys of individuals.

Electronic commerce extends virtual organizations to encompass customers and suppliers, creating new kinds of buyer-seller relationships. Faced by global markets that change with startling rapidity, organizations place enormous value on information and time. To secure these competitive advantages calls for new tools to deal with data (such as individual transactions) of unprecedented scale and richness. One example is the use of these data to produce products with specific features for sale in small but otherwise overlooked "niche markets." Visualizations through Statistics can identify interesting features, as well as inference tools that, coupled with demographic and other databases, can predict whether the features will "sell."

An equally complex task will be to build and operate systems that enable use of confidential data without compromising privacy. This requires a response merging statistical science, computer science and social science to develop disclosure--risk--limited systems that preserve privacy by disseminating information and knowledge derived from data, rather than the data themselves. Key contributions must include models for disclosure risk that reflect such factors as the history of queries and user behavior, and statistical strategies to reduce risks that are estimated to be unacceptably high. NISS is currently initiating an NSF-funded project to develop Web-based tools for dissemination of statistical analyses based on confidential data, without violating confidentiality of individual data elements.

3. Conclusion

The PITAC report describes a huge, vital cross-disciplinary program that must engage statistical science. A great deal is at stake and strong efforts must be initiated to spur the research and develop the human resources required.

Acknowledgment

This commentary was prepared with the encouragement and advice of leaders in the statistics community, especially Stephen E. Fienberg, Maurice Falk University Professor of Statistics and Social Science, Carnegie Mellon University, and President, Institute of Mathematical Statistics; Sallie Keller-McNulty, Statistics Group Leader Los Alamos National Laboratory; and Jon Kettenring, Executive Director, Mathematical Sciences Research Center, Telcordia Technologies.