The American Statistical Association invites you to join us at the seventh annual Symposium on Data Science and Statistics in Richmond, VA, June 4–7, 2024. SDSS provides a unique opportunity for data scientists, computer scientists, and statisticians to come together and exchange ideas. SDSS 2024 will offer many occasions to learn about new data tools and methodologies, see data science in action, and network with experts. Sessions will center on the following six topic areas:
Computational Statistics * Practice and Applications * Data Visualization * Statistical Data Science (formerly Machine Learning) * Education * Software & Data Science Technologies
Featured Speakers
Data Journalism
Alyssa Fowers, The Washington Post
Alyssa Fowers is a graphics reporter at The Washington Post. Before becoming a journalist, she worked in data management and analysis for businesses and nonprofits.
Ethics and Fairness in Statistics and Data Science: A Panel Discussion
Caitlin Wylie, University of Virginia
Caitlin Wylie studies underrecognized work and workers in research communities. This includes technicians whose names and work are missing from publications, students who contribute broad knowledge and learning opportunities to research groups, and community members whose expertise about their homes enriches environmental research. She uses qualitative social research methods, including interviews and participant observation.
Elham Tabassi, National Institute of Standards and Technology
Elham Tabassi is a senior research scientist at the National Institute of Standards and Technology and associate director for emerging technologies in the Information Technology Laboratory. She also leads the institute’s Trustworthy and Responsible AI Program, which aims to cultivate trust in the design, development, and use of artificial intelligence technologies. As associate director for emerging technologies, Elham assists the National Institute of Standards and Technology’s leadership and management determine strategic direction for research, development, standards, testing, and evaluation of emerging technologies. She also coordinates interaction related to artificial intelligence with the US research community, US industrial community, international standards community, and federal agencies, as well as provides leadership within the institute.
Matthew D. Rotelli, Eli Lilly and Company
Matthew Rotelli is vice president for the bioethics program at Eli Lilly and Company, where he leads the company’s evaluation of bioethical considerations across the continuum of its research, development, and commercialization activities. He has led diverse disciplines to bring medicine to patients in oncology, immunology, cardiovascular, endocrine, and neuroscience indications and is passionate about making the drug development process more reliable, efficient, and trustworthy. Throughout his career, Rotelli has performed or directed statistical and pharmacokinetic/ pharmacodynamic work in all phases of clinical development, including commercialization, pharmacovigilance, and real-world evidence generation. He is a graduate of the Lilly Bioethics Leadership Academy and member of the American Statistical Association, American Society for Bioethics and Humanities, and Public Responsibility in Medicine and Research. He has also served on or been a member of many committees and working groups focused on pharmacometrics, clinical trials, and ethics.
Research Horizons for AI and CISE
Dilma Da Silva, Texas A&M University
Dilma Da Silva is a systems software researcher with primary research interests in operating systems, distributed computing, and computer science education. She is working on research projects focusing on streaming computing, cloud computing, cybersecurity, and autonomous vehicles and is passionate about broadening participation in computing. Da Silva is a professor and holder of the Ford Motor Company Design Professorship II of the department of computer science and engineering at Texas A&M University and interim director of the Texas A&M Cybersecurity Center.
We highlight the two NISS Special Events at this years SDSS:
NISS-FCSM: AI in Federal Government (Pt 1)
Conference: Symposium on Data Science and Statistics (SDSS) 2024
06/05/2024: 1:15 PM - 2:45 PM EDT - Special Event
Chair: David Matteson, Cornell University/NISS
Presentations
AI Guidelines, Best Practice, and Use-Cases at the National Center for Health Statistics/CDC
The National Center for Health Statistics/Centers for Disease Control and Prevention (NCHS/CDC) is developing guidelines and best-practices for the use of AI. There are many potential benefits of AI, including generative AI, for NCHS/CDC, including efficiency and resource savings through increased automation, and supported code-generation, synthesis and summarization of written material, and communication. However, risks of AI, most recently risks of generative AI, are regularly documented. Risks can cause agency harm through fabrication and hallucination, poor model performance, bias and discrimination, privacy and data security failure, and other legal and ethical risks that risk the credibility and integrity of the agency. Use-cases illustrate the opportunities and challenges of AI for two data processing tasks – including the identification of nonresponse for survey text responses and the differentiation of absence or presence of conditions and risk factors using clinical notes. This presentation will describe processes for the development of guidelines and best-practices for AI use with examples drawn from use-cases.
Presenting/First Author: Jennifer Parker, National Center for Health Statistics
One of the fundamental responsibilities of a statistical agency is to produce and publicly disseminate relevant, accurate, and credible statistical information. The scale and complexity of some of these data products (file size, number of variables, technical documentation), however, can hinder their direct use by non-technical audiences. Consequently, third parties will often repackage and share that information in myriad ways to make it more accessible and interpretable to the average person. The repackaging of statistical information by non-authoritative sources, however, may impact the integrity of the underlying statistics, calling their accuracy or credibility into question. Emerging technologies like mass-market Large Language Models (LLMs) and other generative artificial intelligence (AI) applications may provide an opportunity for statistical agencies to enhance their ability to disseminate statistics more directly to the average web user, but only if AI can properly and efficiently ingest and interpret the official statistics. The U.S. Department of Commerce, one of the world's largest producers of public data, has assembled a working group to help realize the benefits and mitigate the risks of AI models for finding, linking, and interpreting the Department's data. The goal is to advance dissemination standards for data and statistics from being machine-readable to being machine-understandable, capturing and conveying the information's context, structure, and meaning. This working group is currently drafting technical guidelines for publishing AI-ready open data. The Department of Commerce is interested in engagement from industry, academia, and other partners across the public data ecosystem. We will share the progress of the working group and elicit your feedback.
Presenting/First Author: Sallie Ann Keller, University of Virginia * CoAuthors: Michael Hawes, U.S. Census Bureau & Kenneth Haase, U.S. Census Bureau
Predictive Cropland Data Layer and Uncertainty Measures
The National Agricultural Statistics Service (NASS) of the United States Department of Agriculture (USDA) uses High-Order Markov Chains (HOMC) to analyze crop rotation patterns over time and project future crop-specific planting. However, HOMCs often face issues with sparsity and identifiability due to the representation of categorical data as indicator variables. As the number of HOMCs needed for analysis increases, the parametric space's dimension grows exponentially. Parsimonious representations reduce the number of parameters but often produce less accurate predictions. To better represent the complexity of the data, a deep neural network model is suggested. To measure the degree of uncertainty surrounding categorical predictions, two uncertainty measures are also offered.
Presenting/First Author: Claire Boryan, USDA/NASS * CoAuthor: Luca Sartore, NASS/National Institute of Statistical Sciences
NISS-FCSM: AI in Federal Government (Pt 2)
Conference: Symposium on Data Science and Statistics (SDSS) 2024
06/05/2024: 3:45 PM - 5:15 PM EDT - Special Event
Chair: Luca Sartore, NASS/National Institute of Statistical Sciences * Discussant: David Matteson, Cornell University/NISS
Presentations
Artificial Intelligence and Official Statistics: Key Elements of a Successful AI Approach
Implementing an AI project as a federal statistical agency involves navigating a complex landscape with various challenges and considerations. Transparent AI systems enhance accountability, trust, and user acceptance, but complex AI algorithms may be difficult to interpret, and there might be tension between transparency and proprietary considerations. However, when working with official statistics, ensuring the accuracy and reliability of AI systems is crucial, especially in sensitive domains. Even with the best technology and statistical methods, projects need a broader framework to be successfully implemented in a federal statistical agency. Understanding these broader elements, including the statutory and other guidance such as Executive Orders and OMB Memoranda, as well as the scientific integrity framework under which agencies support research and evidence-building is crucial to successful implementation.
Presenting/First Author: Nancy Potok, NAPx Consulting LLC
Implications of federal policy on AI use in the federal statistical system
Chris Marcum from The Office of the Chief Statistician will present insights on the integration of Artificial Intelligence (AI) with the Federal statistical system in light of President Biden's recent executive order emphasizing secure and trustworthy AI development. This order extends across various sectors, emphasizing responsible AI entry not only within the government but also in broader societal contexts. Key aspects include establishing clear definitions for AI and AI models, aligning with Title 3 of The Evidence Act of 2018 and OMB's regulations for secure access to confidential statistical data. The executive order emphasizes lawful and secure data collection, usage, and retention, addressing privacy and confidentiality concerns within the federal system. Additionally, it guides agencies in advancing privacy-enhancing technologies to safeguard American data against risks, aligning with OMB responsibilities. The order also addresses how federal agencies obtain, quality control, and utilize commercially available information, directing OMB to consult with the Federal Privacy Council and the Interagency Council on Statistical Policy to guide agencies in mitigating privacy and confidentiality risks associated with such information.
Presenting/First Author: Christopher Marcum, Executive Office of the President, Office of Science and Technology Policy * Co-Author: Nancy Potok, NAPx Consulting LLC
Conference Committee
Program Chair
Amanda Koepke, National Institute of Standards and Technology
Program Chair-Elect (2025 Program Chair)
Stephanie Shipp, University of Virginia
Past Chair (2023)
Emily Dodwell, AT&T
Poster Chair
Robert Tumasian III, FDA
Short Course Chair
Emma Zhang, Emory University
Activity & Outreach Chair
Krissie Gierz, National Institute of Standards and Technology
Computational Statistics Track Chairs
Jun Yan, University of Connecticut
Chris Fonnesbeck, Philadelphia Phillies
Statistical Data Science (formerly Machine Learning) Track Chairs
Ginger Holt, Databricks
Glen Colopy, Wildfell
Data Visualization Track Chairs
Susan VanderPlas, University of Nebraska-Lincoln
RJ Andrews, Info We Trust
Practice and Applications Track Chairs
Kathy Ensor, Rice University
Lada Kyj, Vanguard
Sarah Kalicin, Intel
Education Track Chairs
Sunghwan Byun, NC State University
Kate Kozak, Coconino Community College
Software & Data Science Technologies Track Chairs
Haley Hunter-Zinck, US Census Bureau
Nathan Cruze, NASA
KEY DATES:
November 1, 2023
Refereed Online Abstract Submission Opens
January 8, 2024 11:59 PM
Refereed Online Abstract Submission Closes
February 5, 2024
Lightning Abstract Submission Opens
February 5, 2024
Early Registration and Housing Open
March 10, 2024 11:59 PM
Lightning Abstract Submission Closes
April 30, 2024
Early Registration Deadline
April 30, 2024
Speaker Registration Deadline
May 1, 2024
Regular Registration (increased fees apply)
May 14, 2024 5:00 PM
Housing Deadline
June 4, 2024 – June 7, 2024
SDSS 2024 in Richmond, VA
Agenda
Tuesday, June 4 |
|
7:30 a.m. – 6:30 p.m. |
Registration |
8:30 a.m. – 5:30 p.m. |
Short Courses |
5:30 p.m. – 7:00 p.m. |
Opening Reception |
Denotes a ticketed event that requires an additional fee |
|
|
|
Wednesday, June 5 |
|
8:00 a.m. – 5:00 p.m. |
Registration |
8:00 a.m. – 9:00 a.m. |
Continental Breakfast |
9:00 a.m. – 10:15 a.m. |
Welcome and Plenary Session |
10:30 a.m. – 12:00 p.m. |
Concurrent Sessions |
12:00 p.m. – 1:15 p.m. |
Lunch Meet-Ups (on own) |
1:15 p.m. – 2:45 p.m. |
Concurrent Sessions |
2:45 p.m. – 3:45 p.m. |
E-Posters and Refreshments |
3:45 p.m. – 5:15 p.m. |
Concurrent Sessions |
|
|
Thursday, June 6 |
|
8:00 a.m. – 5:00 p.m. |
Registration |
8:00 a.m. – 8:45 a.m. |
Continental Breakfast |
8:45 a.m. – 9:45 a.m. |
Plenary Session |
9:55 a.m. – 10:30 a.m. |
E-Posters and Refreshments |
10:30 a.m. – 12:00 p.m. |
Concurrent Sessions |
12:00 p.m. – 1:15 p.m. |
Lunch Meet-Ups (on own) |
1:15 p.m. – 2:45 p.m. |
Concurrent Sessions |
2:45 p.m. – 3:45 p.m. |
E-Posters and Refreshments |
3:45 p.m. – 5:15 p.m. |
Concurrent Sessions |
|
|
Friday, June 7 |
|
8:00 a.m. – 12:30 p.m. |
Registration |
8:00 a.m. – 8:45 a.m. |
Continental Breakfast |
8:45 a.m. – 9:45 a.m. |
Plenary Session |
9:50 a.m. – 11:20 a.m. |
Concurrent Sessions |
11:20 a.m. – 11:55 a.m. |
E-Posters and Refreshments |
11:55 a.m. – 1:25 p.m. |
Concurrent Sessions |
1:30 p.m. – 2:00 p.m. |
SDSS Wrap-Up & Fireside Chat |
Event Type
- NISS Sponsored