Symposium on Data Science and Statistics (SDSS) 2024

June 4-7, 2024

The American Statistical Association invites you to join us at the seventh annual Symposium on Data Science and Statistics in Richmond, VA, June 4–7, 2024. SDSS provides a unique opportunity for data scientists, computer scientists, and statisticians to come together and exchange ideas. SDSS 2024 will offer many occasions to learn about new data tools and methodologies, see data science in action, and network with experts. Sessions will center on the following six topic areas:

Computational Statistics * Practice and Applications * Data Visualization * Statistical Data Science (formerly Machine Learning) * Education * Software & Data Science Technologies

REGISTER NOW!

Featured Speakers

Data Journalism

Alyssa Fowers, The Washington Post

Alyssa Fowers is a graphics reporter at The Washington Post. Before becoming a journalist, she worked in data management and analysis for businesses and nonprofits.

Ethics and Fairness in Statistics and Data Science: A Panel Discussion

Caitlin Wylie, University of Virginia

Caitlin Wylie studies underrecognized work and workers in research communities. This includes technicians whose names and work are missing from publications, students who contribute broad knowledge and learning opportunities to research groups, and community members whose expertise about their homes enriches environmental research. She uses qualitative social research methods, including interviews and participant observation.

Elham Tabassi, National Institute of Standards and Technology

Elham Tabassi is a senior research scientist at the National Institute of Standards and Technology and associate director for emerging technologies in the Information Technology Laboratory. She also leads the institute’s Trustworthy and Responsible AI Program, which aims to cultivate trust in the design, development, and use of artificial intelligence technologies. As associate director for emerging technologies, Elham assists the National Institute of Standards and Technology’s leadership and management determine strategic direction for research, development, standards, testing, and evaluation of emerging technologies. She also coordinates interaction related to artificial intelligence with the US research community, US industrial community, international standards community, and federal agencies, as well as provides leadership within the institute.

Matthew D. Rotelli, Eli Lilly and Company

Matthew Rotelli is vice president for the bioethics program at Eli Lilly and Company, where he leads the company’s evaluation of bioethical considerations across the continuum of its research, development, and commercialization activities. He has led diverse disciplines to bring medicine to patients in oncology, immunology, cardiovascular, endocrine, and neuroscience indications and is passionate about making the drug development process more reliable, efficient, and trustworthy. Throughout his career, Rotelli has performed or directed statistical and pharmacokinetic/ pharmacodynamic work in all phases of clinical development, including commercialization, pharmacovigilance, and real-world evidence generation. He is a graduate of the Lilly Bioethics Leadership Academy and member of the American Statistical Association, American Society for Bioethics and Humanities, and Public Responsibility in Medicine and Research. He has also served on or been a member of many committees and working groups focused on pharmacometrics, clinical trials, and ethics.

Research Horizons for AI and CISE

Dilma Da Silva, Texas A&M University

Dilma Da Silva is a systems software researcher with primary research interests in operating systems, distributed computing, and computer science education. She is working on research projects focusing on streaming computing, cloud computing, cybersecurity, and autonomous vehicles and is passionate about broadening participation in computing. Da Silva is a professor and holder of the Ford Motor Company Design Professorship II of the department of computer science and engineering at Texas A&M University and interim director of the Texas A&M Cybersecurity Center.

We highlight the two NISS Special Events at this years SDSS:

NISS-FCSM: AI in Federal Government (Pt 1)

Conference: Symposium on Data Science and Statistics (SDSS) 2024

06/05/2024: 1:15 PM - 2:45 PM EDT - Special Event

Chair: David Matteson, Cornell University/NISS

Presentations

AI Guidelines, Best Practice, and Use-Cases at the National Center for Health Statistics/CDC

The National Center for Health Statistics/Centers for Disease Control and Prevention (NCHS/CDC) is developing guidelines and best-practices for the use of AI. There are many potential benefits of AI, including generative AI, for NCHS/CDC, including efficiency and resource savings through increased automation, and supported code-generation, synthesis and summarization of written material, and communication. However, risks of AI, most recently risks of generative AI, are regularly documented. Risks can cause agency harm through fabrication and hallucination, poor model performance, bias and discrimination, privacy and data security failure, and other legal and ethical risks that risk the credibility and integrity of the agency. Use-cases illustrate the opportunities and challenges of AI for two data processing tasks – including the identification of nonresponse for survey text responses and the differentiation of absence or presence of conditions and risk factors using clinical notes. This presentation will describe processes for the development of guidelines and best-practices for AI use with examples drawn from use-cases.

Presenting/First Author: Jennifer Parker, National Center for Health Statistics

Artificial Intelligence and Official Statistics: Responsibly Leveraging Large Language Models in Support of Open Data

One of the fundamental responsibilities of a statistical agency is to produce and publicly disseminate relevant, accurate, and credible statistical information. The scale and complexity of some of these data products (file size, number of variables, technical documentation), however, can hinder their direct use by non-technical audiences. Consequently, third parties will often repackage and share that information in myriad ways to make it more accessible and interpretable to the average person. The repackaging of statistical information by non-authoritative sources, however, may impact the integrity of the underlying statistics, calling their accuracy or credibility into question. Emerging technologies like mass-market Large Language Models (LLMs) and other generative artificial intelligence (AI) applications may provide an opportunity for statistical agencies to enhance their ability to disseminate statistics more directly to the average web user, but only if AI can properly and efficiently ingest and interpret the official statistics. The U.S. Department of Commerce, one of the world's largest producers of public data, has assembled a working group to help realize the benefits and mitigate the risks of AI models for finding, linking, and interpreting the Department's data. The goal is to advance dissemination standards for data and statistics from being machine-readable to being machine-understandable, capturing and conveying the information's context, structure, and meaning. This working group is currently drafting technical guidelines for publishing AI-ready open data. The Department of Commerce is interested in engagement from industry, academia, and other partners across the public data ecosystem. We will share the progress of the working group and elicit your feedback.

Presenting/First Author: Sallie Ann Keller, University of Virginia * CoAuthors: Michael Hawes, U.S. Census Bureau & Kenneth Haase, U.S. Census Bureau

Predictive Cropland Data Layer and Uncertainty Measures

The National Agricultural Statistics Service (NASS) of the United States Department of Agriculture (USDA) uses High-Order Markov Chains (HOMC) to analyze crop rotation patterns over time and project future crop-specific planting. However, HOMCs often face issues with sparsity and identifiability due to the representation of categorical data as indicator variables. As the number of HOMCs needed for analysis increases, the parametric space's dimension grows exponentially. Parsimonious representations reduce the number of parameters but often produce less accurate predictions. To better represent the complexity of the data, a deep neural network model is suggested. To measure the degree of uncertainty surrounding categorical predictions, two uncertainty measures are also offered.

Presenting/First Author: Claire Boryan, USDA/NASS * CoAuthor: Luca Sartore, NASS/National Institute of Statistical Sciences

NISS-FCSM: AI in Federal Government (Pt 2)

Conference: Symposium on Data Science and Statistics (SDSS) 2024

06/05/2024: 3:45 PM - 5:15 PM EDT - Special Event

Chair: Luca Sartore, NASS/National Institute of Statistical Sciences * Discussant: David Matteson, Cornell University/NISS

Presentations

Artificial Intelligence and Official Statistics: Key Elements of a Successful AI Approach

Implementing an AI project as a federal statistical agency involves navigating a complex landscape with various challenges and considerations. Transparent AI systems enhance accountability, trust, and user acceptance, but complex AI algorithms may be difficult to interpret, and there might be tension between transparency and proprietary considerations. However, when working with official statistics, ensuring the accuracy and reliability of AI systems is crucial, especially in sensitive domains. Even with the best technology and statistical methods, projects need a broader framework to be successfully implemented in a federal statistical agency. Understanding these broader elements, including the statutory and other guidance such as Executive Orders and OMB Memoranda, as well as the scientific integrity framework under which agencies support research and evidence-building is crucial to successful implementation.

Presenting/First Author: Nancy Potok, NAPx Consulting LLC

Implications of federal policy on AI use in the federal statistical system

Chris Marcum from The Office of the Chief Statistician will present insights on the integration of Artificial Intelligence (AI) with the Federal statistical system in light of President Biden's recent executive order emphasizing secure and trustworthy AI development. This order extends across various sectors, emphasizing responsible AI entry not only within the government but also in broader societal contexts. Key aspects include establishing clear definitions for AI and AI models, aligning with Title 3 of The Evidence Act of 2018 and OMB's regulations for secure access to confidential statistical data. The executive order emphasizes lawful and secure data collection, usage, and retention, addressing privacy and confidentiality concerns within the federal system. Additionally, it guides agencies in advancing privacy-enhancing technologies to safeguard American data against risks, aligning with OMB responsibilities. The order also addresses how federal agencies obtain, quality control, and utilize commercially available information, directing OMB to consult with the Federal Privacy Council and the Interagency Council on Statistical Policy to guide agencies in mitigating privacy and confidentiality risks associated with such information.

Presenting/First Author: Christopher Marcum, Executive Office of the President, Office of Science and Technology Policy * Co-Author: Nancy Potok, NAPx Consulting LLC

Conference Committee

Program Chair

Amanda Koepke, National Institute of Standards and Technology

Program Chair-Elect (2025 Program Chair)

Stephanie Shipp, University of Virginia

Past Chair (2023)

Emily Dodwell, AT&T

Poster Chair

Robert Tumasian III, FDA

Short Course Chair

Emma Zhang, Emory University

Activity & Outreach Chair

Krissie Gierz, National Institute of Standards and Technology

Computational Statistics Track Chairs

Jun Yan, University of Connecticut

Chris Fonnesbeck, Philadelphia Phillies

Statistical Data Science (formerly Machine Learning) Track Chairs

Ginger Holt, Databricks

Glen Colopy, Wildfell

Data Visualization Track Chairs

Susan VanderPlas, University of Nebraska-Lincoln

RJ Andrews, Info We Trust

Practice and Applications Track Chairs

Kathy Ensor, Rice University

Lada Kyj, Vanguard

Sarah Kalicin, Intel

Education Track Chairs

Sunghwan Byun, NC State University

Kate Kozak, Coconino Community College

Software & Data Science Technologies Track Chairs

Haley Hunter-Zinck, US Census Bureau

Nathan Cruze, NASA

KEY DATES:

November 1, 2023
Refereed Online Abstract Submission Opens
January 8, 2024 11:59 PM
Refereed Online Abstract Submission Closes
February 5, 2024
Lightning Abstract Submission Opens
February 5, 2024
Early Registration and Housing Open
March 10, 2024 11:59 PM
Lightning Abstract Submission Closes
April 30, 2024
Early Registration Deadline
April 30, 2024
Speaker Registration Deadline
May 1, 2024
Regular Registration (increased fees apply)
May 14, 2024 5:00 PM
Housing Deadline
June 4, 2024 – June 7, 2024
SDSS 2024 in Richmond, VA

Agenda

Tuesday, June 4
7:30 a.m. – 6:30 p.m.	Registration
8:30 a.m. – 5:30 p.m.	Short Courses
5:30 p.m. – 7:00 p.m.	Opening Reception
Denotes a ticketed event that requires an additional fee

Wednesday, June 5
8:00 a.m. – 5:00 p.m.	Registration
8:00 a.m. – 9:00 a.m.	Continental Breakfast
9:00 a.m. – 10:15 a.m.	Welcome and Plenary Session
10:30 a.m. – 12:00 p.m.	Concurrent Sessions
12:00 p.m. – 1:15 p.m.	Lunch Meet-Ups (on own)
1:15 p.m. – 2:45 p.m.	Concurrent Sessions
2:45 p.m. – 3:45 p.m.	E-Posters and Refreshments
3:45 p.m. – 5:15 p.m.	Concurrent Sessions

Thursday, June 6
8:00 a.m. – 5:00 p.m.	Registration
8:00 a.m. – 8:45 a.m.	Continental Breakfast
8:45 a.m. – 9:45 a.m.	Plenary Session
9:55 a.m. – 10:30 a.m.	E-Posters and Refreshments
10:30 a.m. – 12:00 p.m.	Concurrent Sessions
12:00 p.m. – 1:15 p.m.	Lunch Meet-Ups (on own)
1:15 p.m. – 2:45 p.m.	Concurrent Sessions
2:45 p.m. – 3:45 p.m.	E-Posters and Refreshments
3:45 p.m. – 5:15 p.m.	Concurrent Sessions

Friday, June 7
8:00 a.m. – 12:30 p.m.	Registration
8:00 a.m. – 8:45 a.m.	Continental Breakfast
8:45 a.m. – 9:45 a.m.	Plenary Session
9:50 a.m. – 11:20 a.m.	Concurrent Sessions
11:20 a.m. – 11:55 a.m.	E-Posters and Refreshments
11:55 a.m. – 1:25 p.m.	Concurrent Sessions
1:30 p.m. – 2:00 p.m.	SDSS Wrap-Up & Fireside Chat

Event Type

NISS Sponsored

Host

American Statistical Association (ASA)

Website

Symposium on Data Science and Statistics (SDSS)

Location

Omni Richmond Hotel

100 South 12th Street

Richmond

Virginia

23219

United States

You are here