NISS Affiliates Workshop on Statistics and Counterterrorism

Saturday, November 20, 2004 - 7:30am to 2:30pm

Robert W. Anthony, Institute for Defense Analyses, Alexandria, VA

David Banks, Duke University

Vicki Bier, University of Wisconsin

Amy Braverman* and Ken Hurst, Jet Propulsion Laboratory

Howard Burkom 1, Yevgeniy Elbert* 2, Sean Murphy 1, Jacqueline Coberly 1,
Kathy Hurt-Mullen 3
1 The Johns Hopkins Applied Physics Laboratory, 2 Walter Reed Army Institute of Research,
3 Montgomery County Department of Health

Daniel B. Carr, George Mason University

Ron Fricker, RAND

Colin Goodall*, Arnold Lent, Sylvia Halasz, Deepak Agarwal, Simon Tse, Guy Jacobson
AT&T Labs -Research

Bernard Harris, University of Wisconsin-Madison

Nick Hengartner, Los Alamos National Laboratory

Karen Kafadar* and Edward J. Wegman, University of Colorado-Denver and George Mason University

Dan Latham*and Paul H. Smith, LexisNexis/Seisint

Xiaodong Lin, University of Cincinnai

Sinjini Mitra, Carnegie Mellon University

G. P. Patil, Penn State Center for Statistical Ecology and Environmental Statistics

Paul Pulliam, Research Triangle Institute

William Seltzer, Fordham University

Galit Shmueli, University of Maryland

Nancy L. Spruill, Office of the Under Secretary of Defense for Acquisition, Technology and Logistics

 

An Empirical Model of the Psychology of Deterrence: Reality Does Not Conform to Theory

Robert W. Anthony, Institute for Defense Analyses, Alexandria, VA

Extensive interview data obtained from incarcerated drug smugglers are analyzed to derive an empirical model of the psychology of deterrence. The model is calibrated and validated by independent data characterizing the effectiveness of various initiatives to stem the flow of cocaine from Andean countries to the U.S. Additional validation is provided by Coast Guard data on their policing of fishery restrictions, as well as accident data for the "extreme sport" of automobile driving during the period from 1900 to 1910. The model, inexplicably, takes a very simple mathematical form that is logically inconsistent with Expected Utility Theory and other prevalent formulations of the psychology of risk taking behavior. Findings are related to the problem of deterring terrorists, and alternative theories for explaining the observed smuggler behavior are discussed.

 

Feasibility Considerations in Three Kinds of Data Mining

DavidBanks, Duke University

This talk explores some of the problems that arise in prospective data mining to discover anomalies in public and administrative databases that may be associated with terrorist activity. We consider applications in biometrics, computer threats, and generic record linkage. There appear to be at least three different kinds of data mining problems in this area, and each presents unique features and research issues.


Game-Theoretic and Reliability Methods in Counter-Terrorism and Security

Vicki Bier, University of Wisconsin

In dealing with rare and extreme events (such as disasters or failures of highly redundant engineered systems), for which empirical data is likely to be sparse, classical statistical methods have been of relatively little use. Instead, reliability and risk analysis techniques are commonly used to decompose complex systems into elements for which larger amounts of empirical data may be available. However, the routine application of reliability and risk analysis by itself is not adequate in the security domain. Protecting against intentional attacks is fundamentally different from protecting against accidents or acts of nature (which have been the more usual focus of engineering risk analysis). In particular, an intelligent and adaptable adversary may adopt a different offensive strategy to circumvent or disable our protective security measures. Game theory provides a way of taking this into account analytically. Thus, security and counter-terrorism require a combination of techniques that have not usually been used in tandem. This paper discusses approaches for applying risk and reliability analysis and game theory to the problem of defending complex systems against attacks by knowledgeable and adaptable adversaries. The results of such work yield insights into the nature of optimal defensive investments in networked systems, to obtain the best trade-off between the cost of the investments and the security of the resulting systems.


Acquisition, Analysis and Dissemination of Earth Observing System Data Sets: Applications to Homeland Security

Amy Braverman* and Ken Hurst, Jet Propulsion Laboratory

Massive data sets are now commonplace in many domains including Earth Science remote sensing and Homeland Security. At first blush the two applications seem quite different, but the Earth Science paradigm may in fact provide a good model for dealing with massive Homeland Security data sets. Over the years NASA has developed a large, complex data processing and distribution system. Our current challenge is to implement and conduct science analysis within this framework where discrete chunks of data (e.g. individual orbits) must be dealt with as they pass through the system. We call this semi-streaming data analysis because, unlike truly streaming data, we receive one chunk at a time rather than one datum at a time. In this talk we discuss the NASA paradigm, and discuss how it might provide lessons for Homeland Security applications.


Public Health Monitoring Tools for Multiple Data Streams

Howard Burkom 1, Yevgeniy Elbert* 2, Sean Murphy 1, Jacqueline Coberly 1, Kathy Hurt-Mullen 3

1 The Johns Hopkins Applied Physics Laboratory, 2 Walter Reed Army Institute of Research, 3 Montgomery County Department of Health

As concerns have increased regarding both bioterrorism and novel natural infectious disease threats such as SARS and West Nile virus, advances in medical informatics are making a growing array of data sources available to the epidemiologist for routine, prospective monitoring of public health. The synthesis of this evidence requires tools that can find anomalies in single data streams as well as in various stream combinations while maintaining manageable false alarm rates. This presentation discusses the use of statistical hypothesis tests for this purpose.

Two testing problems from the complex data environment are considered. The first is a test of multiple hypotheses applied separately to parallel data streams, such as counts or rates of influenza-like illness from multiple treatment facilities or counties. The challenge is to monitor for anomalies without generating excessive alarms from multiple testing. We discuss combination methods, including the Simes modification to Bonferroni's Rule and False Discovery Rate analysis and present the results of applying them to authentic data streams.

The second problem is the combined application of multiple, disparate data sources to address a single null hypothesis, such as the absence of anomalies in gastrointestinal illness using syndromic counts of hospital admissions, physician office visits, and pharmacy sales. We discuss both multiple univariate approaches combining the results of separate testing and multivariate approaches treating the separate data types in a single algorithm. Multiple univariate methods based on rules of Fisher and Edgington are designed for application to independent time series, and their performance on surveillance data sources is presented. Results feature both a simulated set of streams and recent, actual data from a large county, including known outbreaks.

Truly multivariate methods offer the possibility of increased sensitivity from the consensus of faint evidence in separate data streams but must be proven sufficiently robust to customary variation in the cross-correlation among data streams to ensure that the advantage of this consensus signal is not overwhelmed by multivariate noise. We present a method using the regression control charts of Mandel and Hawkins and compare its results to multiple univariate results. For focused surveillance, this method may be used to build multisource matched filters based on plausible outbreak scenarios.

Concise combinations of these methods must be chosen to facilitate a systematic approach to the overall health surveillance problem.


Notions of Visual Analytics and Dynamically Conditioned Choropleth Maps

Daniel B. Carr, George Mason University

The general purpose of visual analytics is to harness the power of analysis algorithms and strategically couple this with the power of human perceptual and cognitive abilities in order to enhance and extend human analytic thought processes. Specific notions of visual analytics depend on the context. The National Visualization and Analytics Center (NVAC), established by the Department of Homeland Security, has enlisted an expert panel to define visual analytics in their context and establish a research agenda. The panel report is to appear next year. This talk presents a few approved slides about NVAC and visual analytics. The talk then proceeds with some personal notions of visual analytics that are embodied in shareware called CCmaps. The notions include dynamic statistical annotation, searches for suggestive multivariate slider settings, and visual analysis management features such as state-space snapshots for live restoration.


Comparing Univariate and Multivariate Methods for Syndromic Surveillance

Ron Fricker, RAND

Disease surveillance is critical for detecting and responding to natural emerging infections as well as biological terrorism. Because the first signs of an attack with a biological agent may be a change in disease patterns rather than reports of firmly diagnosed, specific diseases has led to the development of syndromic surveillance, the real-time gathering and analysis of data on patients seeking care with certain syndromes that may be early signs of bioterrorist activity.

This talk will compare and contrast the performance of a number of univariate and multivariate statistical methods useful for syndromic surveillance. Drawn from the Statistical Process Control literature, it will compare Shewhart's method with variants of Hotelling's T-squared method, and various univariate and multivariate generalizations of the CUSUM. One would expect multivariate methods to generally outperform univariate methods, but do they? And, if they do, is the gain in performance sufficient to warrant the additional complexity they entail? The comparisons are conducted both via simulation and then on real data from seven hospitals in the Washington D.C. area.


Performance-Critical Anomaly Detection

Colin Goodall*, Arnold Lent, Sylvia Halasz, Deepak Agarwal, Simon Tse, Guy Jacobson-- AT&T Labs-Research

In health surveillance, five areas are important to the success of performance-critical anomaly detection: 1) reliable health data that is also geo-temporally and demographically representative; 2) efficient and real-time large-scale information-processing capability; 3) comprehensive and tunable anomaly-detection algorithms; 4) a flexible platform for investigation and management of anomalies; 5) alert distribution and management.

The focus of this presentation is on anomaly detection methods, including a new adaptation of Bayesian shrinkage estimation. We monitor a stream of events for anomalous behavior relative to the historic pattern of counts. Each event is cross-classified into one of potentially millions of cells. We obtain reliable estimates of frequency ratios for all events under consideration, even the ones that are very rare. The methods are structured to follow hierarchies of data types, geographies (state, ZIP3, variable-radius disks, ZIP), and time.

A case-management capability is incorporated into the anomaly-detection system to provide an exploratory environment for epidemiologists, subject matter experts, data experts and statistical experts to investigate and manage anomalies. Cases (a single anomaly or a group of anomalies) may be produced very quickly, depending on the data flow. Finally, an alert-management tool is integrated with the case management tool to provide alert distribution to and responses from a wide array of responsible personnel.

For years, AT&T Labs technologies have been developed for mission-critical production applications in managing call volume, IP traffic, and network performance. These tools, including anomaly-detection techniques, have found practical, efficient, sustainable deployment throughout AT&T operations, with hundreds of desktops accessible at multiple network operation centers as well as at national and global operations centers. They allow rapid reaction, within minutes or hours, to patterns of aberrant or unauthorized use of network resources within data streams of billions of records daily. These operations and technologies are analogous to those that will be performed for time-critical, data-analytic-intensive health- related anomaly detection.


An Interval Estimation Procedure for Probabilities of Rare Events

Bernard Harris, University of Wisconsin-Madison

A procedure is described, which under specified technical conditions provides "approximately optimal" interval estimates for probabilities of rare events. The procedure has potential application to evaluation of the efficacy of security procedures using the PRA methodology.

 

Bionet and the Challenges of Academic Research in Homeland Security

Nick Hengartner, Los Alamos National Laboratory

This talk will present preliminary results on a bionet study to assess the benefits of mobile versus fixed collector design.


Using Graphical Displays to Monitor Internet Traffic Data for Potential Cyberattacks

Karen Kafadar* and Edward J. Wegman
University of Colorado-Denver and George Mason University

The analysis of massive, high-volume data sets stresses usual statistical software systems and requires new ways of drawing inferences beyond the conventional paradigm (optimal estimation of parameters from a hypothesized distribution), since the entire data set often cannot be read into the software system. Internet traffic data raise additional challenges: nearly continuous streams of observations from multiple computer systems that interact and exchange information in nondeterministic ways. These features invite cyber attacks, which can be introduced and spread rapidly, and which thus require methods that can detect very rapidly potential departures from ``typical'' behavior. This talk presents a variety of ``visual analytics'', graphical displays from which inferences about both ``typical'' and ``exotic'' behavior can be observed quickly by the novice user. Statisticians must be involved in the development of these displays, with attention to the needs and abilities of the people who will rely on them to detect cyberattacks. We describe components of internet traffic, propose some methods oF visualizing them, and illustrate these methods on data collected at a university network. Some open problems in studying high-volume data in general are mentioned.

 

Role for Statistical Analysis in Risk Management

Dan Latham* and Paul H. Smith, LexisNexis/Seisint

In this Information Age federal and state governments are using increasingly larger amounts of disparate, diverse and unverified data to derive sufficient, clear and accurate knowledge to enable informed decisions in risk management affecting human life and welfare. Statistics has always played a role in the analysis of data, especially large data pools, in order to derive credible information. We explore in this paper examples of the analysis difficulties found in the Law Enforcement and Intelligence Communities, and offer a problem set that could benefit from the most powerful and most reliable statistical methods.

 

Privacy Preserving Statistical Analysis and its Application to Counter Terrorism

Xiaodong Lin, University of Cincinnati

Information sharing is critical in the efforts of coordinating different federal agencies. Privacy and security considerations can prevent such processes, thus derailing further statistical analysis projects. In this talk we will introduce several secure multiparty computation protocols and show how to build complex privacy preserving statistical analysis models using them. We will discuss privacy preserving regression and record linkage procedures for both vertically and horizontally partitioned data, assuming that the agencies are "semi-honest". The applications of these procedures for homeland security will also be discussed.

 

The Role of Statistics in Biometric Authentication based on Facial Images

Sinjini Mitra, Carnegie Mellon University

In the modern electronic information age, there is an ever-growing need to authenticate and identify individuals for ensuring the security of a system. Based on a person's unique biological traits (physical or behavioral), the technology of biometric authentication is more reliable than the traditional PINs and ID cards and is less prone to fraud. The field of biometrics has been growing exponentially in recent years (especially after the attacks of September 11, 2001) and the rapidly evolving technology is being widely used in forensics for criminal identification in law enforcement and immigration. Moreover, the recent practice of recording biometric information (photo and fingerprint) of foreign passengers at all U.S. airports and also the proposed inclusion of digitized photos in passports show the growing importance of biometrics in U.S. homeland security.

The present paper reports some initial work on establishing a firmer statistical foundation for face authentication systems, and in verifying the accuracy of proposed methods in engineering and computer science, which are mostly empirical in nature. Given the wide usage and the sensitive nature of their applications today, it is imperative to have rigorous authentication systems where inaccurate results may have a drastic impact. We first present an existing authentication system based on a linear filter called the Minimum Average Correlation Energy (MACE, for short) filter. It produces impressive results, but despite its success it suffers from a number of drawbacks owing to its non-model-based framework. We describe simple statistical tools that can be used to develop its statistical properties and make it more rigorous and useful in practice. We then propose a model-based system built entirely in the frequency domain by exploiting the well-known significance of the phase component of an image spectrum, a novel approach to authentication. Some associated challenges are also discussed.


Surveillance Geoinformatics of Hotspot Detection, Prioritization, and Early Warning

G. P. Patil, Penn State Center for Statistical Ecology and Environmental Statistics

Geoinformatic surveillance for spatial and temporal hotspot detection and prioritization is a critical need for the 21st century. A hotspot can mean an unusual phenomenon, anomaly, aberration, outbreak, elevated cluster, or critical area. The declared need may be for monitoring, etiology, management, or early warning. The responsible factors may be natural, accidental or intentional, with relevance to both infrastructure and homeland security.

This presentation describes a multi-disciplinary research project based on novel methods and tools for hotspot detection and prioritization, driven by a wide variety of case studies of potential interest to several agencies. These case studies deal with critical societal issues, such as public health, ecosystem health, biosecurity, biosurveillance, robotic networks, social networks, sensor networks, wireless networks, video mining, homeland security, and early warning.

Our methodology involves an innovation of the popular circle-based spatial scan statistic methodology. In particular, it employs the notion of an upper level set and is accordingly called the upper level set scan statistic sytem, pointing to the next generation of a sophisticated analytical and computational system, effective for the detection of arbitrarily shaped hotspots along spatio-temporal dimensions. We also propose a novel prioritization scheme based on multiple indicator and stakeholder criteria without having to integrate indicators into an index, using Hasse diagrams and partially ordered sets. It is accordingly called poset prioritization and ranking system.

We propose a cross-disciplinary collaboration to design and build the prototype system for surveillance infrastructure of hotspot detection and prioritization. The methodological toolbox and the software toolkit developed will support and leverage core missions of several agencies as well as their interactive counterparts in the society. The research advances in the allied sciences and technologies necessary to make such a system work are the thrust of this five year project.

The project will have a dual disciplinary and cross-disciplinary thrust. Dialogues and discussions will be particularly welcome, leading potentially to well considered synergistic case studies. The collaborative case studies are expected to be conceptual, structural, methodological, computational, applicational, developmental, refinemental, validational, and/or visualizational in their individual thrust.

The following websites have additional information:

(1) http://www.stat.psu.edu/hotspots/

(2) http://www.stat.psu.edu/~gpp/


Designing Registries of Persons Exposed to Emergency Events

Paul Pulliam, Research Triangle Institute

Since 2002, the Agency for Toxic Substances and Disease Registry (ATSDR) has established two environmental exposure registry programs that will assemble and track cohorts of individuals exposed to emergency events. The World Trade Center Health Registry (WTCHR) is designed to assess the health effects of the World Trade Center (WTC) disaster of September 11, 2001. The WTC Registry will follow participants to evaluate short and long term physical and mental health effects including those resulting from exposures to the dust, fumes, airborne particulates, on 9/11 and in the ensuing weeks as the fires burned. Persons who may enroll in the Registry include those who were in lower Manhattan on September 11, 2001; residents south of Canal Street; school children and staff enrolled in schools south of Canal Street; and persons involved in rescue, recovery, clean-up, and other work at the WTC site or Staten Island Recovery Operations between September 11, 2001 and June 30, 2002. More recently, the Rapid Response Registry (RRR) is being designed as a public health response to an emergency chemical, radiological, or biological event that seeks to enroll individuals immediately following an event in order to help identify future needs of exposed cohorts and to facilitate investigations of possible health effects of exposures. Both of these registries face methodological issues in identifying and finding their cohorts, and in measuring possible coverage error associated with the registry. The populations exposed to the WTCHR and possible RRR events are sprawling and heterogeneous, including residents, responders, and other special populations at a time of high dispersion from the site of an event. Both registries face the challenge of maximizing the coverage of individuals eligible for the registry through multiple methods of enrolling potential registrants. This presentation will compare some of the solutions that these two registries have designed to identify and trace exposed individuals and to measure possible coverage error in establishing the registries.


Statistics and Counterterrorism: The Role of Law, Policy, and Ethics

William Seltzer, Fordham University

Patriotism and professional pride motivate many statisticians to demonstrate how statistics can also serve in the war on terrorism. However, professional responsibility, concerns over fundamental human rights (which for Americans is another way of characterizing patriotism), and even enlightened self-interest suggest that caution is warranted in the unrestrained use of statistical methods, outputs, and personnel in counterterrorism efforts.

Statistical outputs assisted in the destructiveness and collateral damage of General Sherman's march through the South in our own Civil War. Similarly, statistical methods, outputs, and personnel assisted in the round-up and detention of Japanese Americans in World War II and appear to be assisting in the targeting of Arab Americans today for investigation, detention, and deportation. In each of these examples the federal statistical system - particularly the decennial population census - has played a central role.

Given this background, the talk will explore the legal, policy, and ethical issues involved in such uses and specifically comment on topics such as the fragility of the federal statistical system (at least in terms of public confidence), fundamental differences between individual and statistical surveillance, and an explicit examination of the trade-offs between false positives and individual harm, on the one hand, and false negatives and the risk of societal harm, on the other.

The bottom line of the argument to be presented is that the involvement of statistics in support of counterterrorism needs to be selective rather than indiscriminant. The question at hand is how statistical methods and data, and the professional expertise of statisticians, can be used effectively and ethically to fight terror. The purpose of discussing the subject is to encourage each statistician and user of statistics involved to engage in a thoughtful consideration of the legal, policy, and ethical issues that arise.


Current and Potential Statistical Methods for Bio-Surveillance

Galit Shmueli, University of Maryland (with Stephen Fienberg, Carnegie Mellon University, and
Bernard Dillard, University of Maryland)

The goal of modern bio-surveillance systems is the rapid detection of a disease outbreak related to a "natural cause" or a bio-terror attack. To achieve this goal, data are routinely collected from multiple sources on multiple data streams that are considered to carry early signals of an outbreak. Such data tend to be frequent (at least daily), and can be vary widely within a data source, and even more so across data sources.

Current surveillance methods rely mostly on traditional statistical monitoring methods such as statistical process control and autoregressive time series models. However, these methods are not always suitable for monitoring non-traditional bio-surveillance data. Assumptions such as normality, independence, and stationarity are the backbone of such methods, whereas the types of data that are monitored in bio-surveillance almost always violate these assumptions.

Another challenge to current surveillance methods is to monitor multiple data streams (from within and across multiple sources) in a multivariate fashion. Most of the actual monitoring is typically done at the univariate level.

In this talk we discuss the suitability of statistical methods that are currently used in biosurveillance in light of the data characteristics, system requirements, and goal at hand. We then describe several univariate and multivariate monitoring methods that have evolved in other fields and are potentially useful in this context. We illustrate one type of these proposed methods using real data.


Giving the Warfighter the Tools for Counterterrorism

Nancy L. Spruill, Office of the Under Secretary of Defense for Acquisition, Technology and Logistics

DoD is pursuing several approaches in the acquisition, technology and logistics areas to provide the warfighter the tools needed for countering terrorism. I will briefly discuss several, including: 1) making the acquisition process more flexible; 2) using technology to meet warfighter needs faster and more cheaply; 3) making testing more realistic and faster; and 4) making logistics more reliable--through use of technology, such as universal ID and remote-frequency ID. The Department has also recognized the need for individual privacy in data sharing and I will briefly discuss recommendations from an external panel and the resulting DoD actions.

 

Event Type

Location

Stern School of Business, New York University, NYC, NY
United States