About
The NISS AI, Statistics and Data Science in Practice is a monthly event series will bring together leading experts from industry and academia to discuss the latest advances and practical applications in AI, data science, and statistics. Each session will feature a keynote presentation on cutting-edge topics, where attendees can engage with speakers on the challenges and opportunities in applying these technologies in real-world scenarios. This series is intended for professionals, researchers, and students interested in the intersection of AI, data science, and statistics, offering insights into how these fields are shaping various industries. The series is designed to provide participants with exposure to and understanding of how modern data analytic methods are being applied in real-world scenarios across various industries, offering both theoretical insights, practical examples, and discussion of issues.
Featured Topics:
- Veridical Data Science - Speaker: Bin Yu, October 15,2024
- Random Forests: Why they Work and Why that’s a Problem - Speaker: Lucas Mentch, November 19, 2024
- Causal AI in Business Practices - Speakers: Victor Lo, and Victor Chen, January 24, 2025
- Large Language Models: Transforming AI Architectures and Operational Paradigms - Speaker: Frank Wei, February 18, 2025
- Machine Learning for Airborne Biological Hazard Detection - Speaker: Jared Schuetter, March 11, 2025
- ML and Bayesian geospatial approaches for prediction of opioid overdose deaths - Speaker: Soledad Fernández, April 22, 2025
- Trustworthy AI in Weather, Climate, and Coastal Oceanography - Speaker: Dr. Amy McGovern, May 13, 2025
- Statistics and Experimentation Needs in Industry
- Causal Inference in Marketing Analytics
- Practical Return on AI Investment
- Cost-benefit Analysis of Public Health Programs
Upcoming Webinars in Series
ML and Bayesian geospatial approaches for prediction of opioid overdose deaths
Speaker: Soledad Fernández | April 22, 2025
Join us for the next session of the NISS AI, Statistics, and Data Science in Practice webinar series, featuring Dr. Soledad Fernández, Distinguished Professor and Division Chief of Biostatistics and Population Health at The Ohio State University. Dr. Fernández also serves as the Director of the Center for Biostatistics in the Department of Biomedical Informatics, College of Medicine. In this talk, Dr. Fernández will discuss the application of machine learning (ML) and Bayesian geospatial modeling for predicting opioid overdose deaths. As the opioid crisis continues to impact communities nationwide, leveraging statistical and AI-driven approaches can provide critical insights into geographic and population-level risk factors. This presentation will explore how advanced modeling techniques can enhance surveillance efforts, inform public health interventions, and improve policy decision-making.
Trustworthy AI in Weather, Climate, and Coastal Oceanography
Speaker: Amy McGovern | May 13, 2025
Dr. Amy McGovern is a professor in the School of Computer Science at the University of Oklahoma and in the School of Meteorology at the University of Oklahoma. Dr McGovern is also the director of the NSF AI Institute for Research on Trustworthy AI in Weather, Climate, and Coastal Oceanography. Her research focuses on developing and applying trustworthy AI and machine learning methods primarily for severe weather phenomena. Dr. McGovern received her PhD in Computer Science from the University of Massachusetts Amherst in 2002 and was a senior postdoctoral research associate at the University of Massachusetts until joining the University of Oklahoma in January, 2005. She received her MS from the University of Massachusetts Amherst (1998) and her BS (honors) from Carnegie Mellon University (1996).
Statistics and Experimentation Needs in Industry
This webinar will focus on the critical role of statistics in meeting the experimentation needs of industry. Participants will gain insights into how statistical methods are used to optimize processes, improve product quality, and drive innovation. The session will highlight various case studies, illustrating the impact of data-driven experimentation in sectors such as manufacturing, technology, and pharmaceuticals. Key topics will include A/B testing, hypothesis testing, and adaptive designs, with an emphasis on balancing speed and rigor in industrial research. The speaker will also discuss the challenges of scaling statistical methods for large, complex systems. Attendees will learn best practices for designing and analyzing experiments to generate actionable insights in competitive environments.
Causal Inference in Marketing Analytics
This webinar will focus on the application of causal inference in marketing analytics, highlighting its role in optimizing campaigns and resource allocation. Attendees will learn about methods such as difference-in-differences and propensity score matching to measure the effectiveness of marketing interventions. The session will include case studies from the banking and retail sectors, demonstrating how causal models inform strategic decisions. Challenges related to data quality and model assumptions will be discussed, along with strategies for addressing them. The importance of collaboration between data scientists and marketing teams will be emphasized. Practical insights on leveraging causal inference to drive ROI and customer engagement will be provided.
Practical Return on AI Investment
In this session, participants will explore strategies for evaluating the return on investment (ROI) from AI initiatives. The speaker will discuss frameworks for measuring both tangible and intangible benefits, such as cost savings and improved decision-making. Case studies will illustrate how organizations assess the impact of AI on productivity and customer satisfaction. Challenges related to quantifying ROI, including data availability and attribution, will be addressed. Attendees will learn best practices for aligning AI projects with business objectives and managing risks. Practical advice on communicating the value of AI investments to stakeholders will also be shared.
Cost-benefit Analysis of Public Health Programs
This webinar will examine the role of cost-benefit analysis in evaluating public health programs, emphasizing data-driven decision-making to allocate resources effectively. Attendees will gain insights into methodologies for assessing both direct and indirect costs, as well as measuring long-term societal benefits. The session will feature case studies from maternal health, disease prevention, and healthcare policy, illustrating how statistical and economic models inform public health investments. Key topics will include cost-effectiveness ratios, sensitivity analysis, and the challenges of quantifying health outcomes. The speaker will also discuss strategies for integrating equity considerations into public health evaluations. Practical guidance on using cost-benefit analysis to advocate for evidence-based policies and improve population health will be provided.
Previous Webinars + Recordings
Veridical Data Science
Speaker: Professor Bin Yu | October 15, 2024
Abstract: The rapid advancement of AI relies heavily on the foundation of data science, yet its education significantly lags its demand in practice. The upcoming book 'Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making' (Yu and Barter, MIT Press, 2024; free online at www.vdsbook.com) tackles this gap by promoting Predictability, Computability, and Stability (PCS) as core principles for trustworthy data insights. PCS for veridical data science (VDS) has been developed in the process of solving scientific data science problems. It thoroughly integrates these principles into the Data Science Life Cycle (DSLC), from problem formulation to data cleansing and to result communication, fostering a new standard for responsible data analysis. This talk explores PCS' motivations, and compare the VDS book appraoch with traditional ones. Then I will describe two PCS projects on prostate cancer detection and discovery of epistastic genetic drivers for a heart disease. I will end with on-going work for PCS uncertainty quantificaiton in regression and its comparison with conformal prediction, PCS software pacakages (v-flow, simChe), and MEIRTS guidelines for data-inspired simulations.
Random Forests: Why They Work and Why That’s a Problem
Speaker: Lucas Mentch | November 19, 2024
Abstract: Random forests remain among the most popular off-the-shelf supervised machine learning tools with a well-established track record of predictive accuracy in both regression and classification settings. Despite their empirical success, a full and satisfying explanation for their success has yet to be put forth. In this talk, we will show that the additional randomness injected into individual trees serves as a form of implicit regularization, making random forests an ideal model in low signal-to-noise ratio (SNR) settings. From a model-complexity perspective, this means that the mtry parameter in random forests serves much the same purpose as the shrinkage penalty in explicit regularization procedures like the lasso. Realizing this, we demonstrate that alternative forms of randomness can provide similarly beneficial stabilization. In particular, we show that augmenting the feature space with additional features consisting of only random noise can substantially improve the predictive accuracy of the model. This surprising fact has been largely overlooked within the statistics community, but has crucial implications for thinking about how best to define and measure variable importance. Numerous demonstrations on both real and synthetic data are provided.
Causal AI in Business Practices
Speakers: Victor Lo, and Victor Chen | January 24, 2025
This webinar will explore the growing role of causal AI in uncovering cause-and-effect relationships within complex systems. The session will highlight how causal AI differs from traditional predictive models, emphasizing its potential to improve decision-making across various domains. Attendees will gain insights into techniques for measuring the impact of interventions and understanding causal mechanisms. Broader examples will illustrate its application in optimizing strategies and enhancing outcomes. Key challenges, such as data reliability and model validation, will also be explored. The webinar will conclude with practical guidance on leveraging causal AI in dynamic and high-impact settings.
Large Language Models: Transforming AI Architectures and Operational Paradigms
Speaker: Frank Wei | February 18, 2025
Abstract: The emergence of Large Language Models (LLMs) represents a paradigm shift in artificial intelligence, fundamentally transforming our approach to natural language processing and machine learning architectures. In this presentation, we will navigate through the evolutionary trajectory of LLMs, beginning with their historical foundations and theoretical underpinnings that have shaped the current landscape of AI. We will then delve into the architectural intricacies of transformer-based models, examining their self-attention mechanisms, positional encodings, and multi-head architectures that enable unprecedented language understanding and generation capabilities. As we explore the transformative impact of LLMs on traditional machine learning paradigms, we will analyze the evolution from conventional ML to LLM, highlighting the specialized operational frameworks, deployment strategies, and infrastructure requirements that distinguish these approaches. This transition encompasses novel considerations in computational orchestration, model versioning, prompt engineering, and systematic evaluation methodologies. We will critically examine how these operational paradigms are reshaping feature engineering, model architectures, and deployment pipelines in AI systems. To demonstrate these theoretical and operational principles in practice, we will conclude with a demonstration of our innovative LLM-based solution, illustrating how sophisticated architectural designs and robust operational frameworks converge to address complex real-world challenges.
Recording Coming Soon!
Machine Learning for Airborne Biological Hazard Detection
Speaker: Jared Schuetter | March 11, 2025
This session will explore the use of machine learning for detecting and identifying airborne biological hazards. Attendees will learn how supervised and unsupervised learning techniques can analyze spectral data to differentiate between harmful and benign substances. The speaker will discuss challenges related to data preprocessing and model accuracy in dynamic environments. Case studies will illustrate real-world applications in public health and national security. The importance of rapid detection and classification in mitigating risks will be emphasized. Practical strategies for deploying machine learning models in field settings will also be shared.
Recording Coming Soon!