NISS AI, Statistics & Data Science in Practice: Veridical Data Science: Bridging the Gap in Education and Research

Tuesday, October 15, 2024 - 11:30am to 1:00pm

Speaker: 

Bin Yu, Statistics, EECS, Center for Comp. Bio. And Simons Institute

Abstract:

The rapid advancement of AI relies heavily on the foundation of data science, yet its education significantly lags its demand in practice. The upcoming book 'Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making' (Yu and Barter, MIT Press, 2024; free online at www.vdsbook.com) tackles this gap by promoting Predictability, Computability, and Stability (PCS) as core principles for trustworthy data insights. PCS for veridical data science (VDS) has been developed in the process of solving scientific data science problems. It thoroughly integrates these principles into the Data Science Life Cycle (DSLC), from problem formulation to data cleansing and to result communication, fostering a new standard for responsible data analysis. This talk explores PCS' motivations, and compare the VDS book appraoch with traditional ones. Then I will describe two PCS projects on prostate cancer detection and discovery of epistastic genetic drivers for a heart disease. I will end with on-going work for PCS uncertainty quantificaiton in regression and its comparison with conformal prediction, PCS software pacakages (v-flow, simChe), and MEIRTS guidelines for data-inspired simulations.

View News Story & Recording

About the Speaker:

Bin Yu is The Class of 1936 Second Chair in the College of Letters and Science, and Chancellor's Distinguished Professor, Departments of Statistics and of Electrical Engineering & Computer Sciences, University of California at Berkeley. Her current research focuses on practice, algorithm, and theory of statistical machine learning and causal inference. Her group is engaged in interdisciplinary research with scientists from genomics, neuroscience, and precision medicine. Her past work covered research areas in empirical process theory, information theory (MDL), MCMC methods, signal processing, machine learning, high dimensional data inference (e.g. sparse modeling (boosting and lasso), and spectral clustering). She is a member of the U.S. National Academy of Sciences and a fellow of the American Academy of Arts and Sciences. She was a Guggenheim Fellow in 2006, and the Tukey Memorial Lecturer of the Bernoulli Society in 2012. She was President of IMS (Institute of Mathematical Statistics) in 2013-2014 and the Rietz Lecturer of IMS in 2016. She received the E. L. Scott Award from COPSS (Committee of Presidents of Statistical Societies) in 2018, and was the Breiman Lecturer at NeurIPS 2019. Profile: Bin Yu (berkeley.edu)
 
 

About the Moderator:

Nancy McMillan currently serves as Data Science Research Leader within Battelle’s Health Research & Analytics Business Line. For a diverse set of federal government clients, she currently leads development of a large language model (LLM) based biocuration acceleration pipeline and user tool, development of pipelines, analytics, and visualizations of electronic initial case reporting data, and development of analytical methods for achieving abbreviated new drug application (ANDA) approval for an agile drug manufacturing technology. Nancy has a long history of collaborative work across Battelle bringing statistics and machine learning to Battelle’s deep capability in biology, chemistry, and material science. As a researcher and Project Management Professional, Nancy has worked and published on environmental exposure and risk assessment; transportation safety benefits; quantitative risk assessment related to chemical, biological, radiological and nuclear (CBRN) terrorism; bio surveillance; and bioinformatics. She managed the Health Analytics Division from 2017-2023, a team of approximately 100 data scientists that supports Battelle’s contract research business. Nancy is a member of the Board of Trustees for the National Institute of Statistical Sciences (NISS), the Chair of NISS’s Affiliates Committee, and a member of the Organ Procurement and Transplantation Network’s Data Advisory Committee.


About AI, Data Science, and Statistics in Practice

The NISS AI, Data Science, and Statistics in Practice is a monthly event series part of the NISS Collaboratory (CoLab) will bring together leading experts from industry and academia to discuss the latest advances and practical applications in AI, data science, and statistics. Each session will feature a keynote presentation on cutting-edge topics, where attendees can engage with speakers on the challenges and opportunities in applying these technologies in real-world scenarios. This series is intended for professionals, researchers, and students interested in the intersection of AI, data science, and statistics, offering insights into how these fields are shaping various industries. The series is designed to provide participants with exposure to and understanding of how modern data analytic methods are being applied in real-world scenarios across various industries, offering both theoretical insights, practical examples, and discussion of issues.

Featured Topics:

Event Type

Location

United States