Greetings for the new year, 2018!
NISS has taken up the challenge of living up to its name: the National Institute of Statistical Sciences. It aims to reach beyond the traditional statistical sciences to become a catalyst for maintaining statistical ideas and thinking as part of the big data and data science movement. With the ever-increasing size of data available to analyze and explore, many of the traditional tools of the statistics profession have been supplanted and replaced by new algorithms in machine learning and artificial intelligence. These new tools have brought great advances to the analysis of large data sets for prediction and decision-making. However, challenges remain for statisticians to engage as active participants in defining the foundation of data science.
Forums for industry sectors
Another new focus for NISS is to organize gatherings, both meet-ups and workshops, which bring together representatives from various sectors. For example, HR representatives from industry could meet to share common challenges and recommendations found to be effective for hiring data scientists. Another example would bring CIOs together, from a particular industry sector, such as banking or insurance, to discuss best practices for organizing teams in structuring the division.
Workshops and training short courses
NISS is expanding its offering of short courses for members of affiliated groups, individual industry sectors, government agencies, and/or national laboratories. This past year we have sponsored several R & Spark workshops, which provided training on tools for analyzing very large distributed data repositories, without the need to combine the data into one central computer center or location. This strategy for analyzing data provides considerable cost savings and privacy protection. Privacy and confidentiality protection result by keeping the source data in its original location protected by its owners’ firewall. See the R & Spark article in this newsletter.
Conferences to address cutting-edge issues
NISS is planning for two conferences to bring together experts on data-enabled research and using evidence-based real-world experience to address public policy issues. The potential benefits from combining traditional survey techniques with the use of privacy protected administrative data sets is gaining support. Recent reports by the National Academy of Sciences, Engineering and Medicine (2017), Innovations in Federal Statistics: Combining Data Sources While Protecting Privacy, Washington, DC 20001, The National Academies Press, and a report by the bipartisan Commission on Evidence-Based Policymaking, The Promise of Evidence-Based Policymaking (2017), describe in detail the potential benefits and also propose strict guidelines for implementing these ideas. As a result, a bill, H.R. 4174 , has already been proposed in Congress to create an agency to implement these ideas. See Foundations for Evidence-Based Policymaking Act of 2017. This is truly a dramatic change from business as usual and will require strong advocacy to implement, and diligence to guarantee its implementation––while protecting privacy and confidentiality, and retaining the public’s confidence that their privacy is being preserved. This new proposal about evidence-based policy will certainly generate strong debate in both the legislative and executive branches, and require neutral and independent voices to provide reconciliation for the underlying benefits and concerns. Hopefully, NISS can contribute to this debate.
Curriculum needed to create data science programs
Working closely with our academic affiliates, the ASA, and an international team, NISS plans to organize several regional affiliate meetings to address the challenges and the need for evolving the curriculum to meet the demands of Industry and Government for training data scientists. Several statistics departments have been renamed statistics and data science, e.g. Carnegie Mellon University, University of Texas-Austin, and Yale. Other departments of Statistics have simply created new programs in Data Science, e.g. M.S. in Data Science at Harvard, a track within their MS Statistics at Stanford, or an option in their MPS in Statistical Science at Cornell. The Statistics profession needs to continue discussing these issues. At Penn State, an undergraduate major in data science was initiated as a joint program with three departments - Statistics, Computer Science and Information Science & Technology - in three different colleges.
There are many advocates for including data science and analytics in the statistical sciences driven primarily by the size and complexity of many available data sources. Knowledgeable persons have seen this evolving trend occurring long before the title data scientist became widespread. David Donoho in a presentation given at John Tukey’s Memorial conference laid out the history and early origins of the importance of Data based on Tukey’s work in the evolution of statistical research and practice. Leo Breiman, in an early paper nicely described the opposing perspectives of statisticians and computer scientists when analyzing data––the former fitting models to understand the process, the latter focusing on the success of prediction regardless of the underlying algorithm being used.
Post Docs working with NISS government affiliates
NISS continues its strong Post Doc program with the National Center for Education Statistics (NCES) and the National Agricultural Statistics Service (NASS). A session highlighting this ongoing research is scheduled for the SDSS meeting in Reston, VA, May 16-19, 2018. Presentations by Nell Sedransk joint with Andrew White, of NCES, will describe the program and two NISS postdocs will present their research working with NASS.
James L. Rosenberger
Director NISS