AI, Data Science, and Statistics in Practice

About

The NISS AI, Statistics and Data Science in Practice is a monthly event series will bring together leading experts from industry and academia to discuss the latest advances and practical applications in AI, data science, and statistics. Each session will feature a keynote presentation on cutting-edge topics, where attendees can engage with speakers on the challenges and opportunities in applying these technologies in real-world scenarios. This series is intended for professionals, researchers, and students interested in the intersection of AI, data science, and statistics, offering insights into how these fields are shaping various industries. The series is designed to provide participants with exposure to and understanding of how modern data analytic methods are being applied in real-world scenarios across various industries, offering both theoretical insights, practical examples, and discussion of issues.

Featured Topics:



Upcoming Webinars in Series

Causal AI in Finance and Technology Industries

Speakers: Victor Lo, and Victor Chen | January 24, 2025 
This webinar will explore the growing role of causal AI in uncovering cause-and-effect relationships within complex systems. The session will highlight how causal AI differs from traditional predictive models, emphasizing its potential to improve decision-making across various domains. Attendees will gain insights into techniques for measuring the impact of interventions and understanding causal mechanisms. Broader examples will illustrate its application in optimizing strategies and enhancing outcomes. Key challenges, such as data reliability and model validation, will also be explored. The webinar will conclude with practical guidance on leveraging causal AI in dynamic and high-impact settings.


Large Language Models: Transforming AI Architectures and Operational Paradigms

Speaker: Frank Wei | February 18, 2025 
Abstract: 
The emergence of Large Language Models (LLMs) represents a paradigm shift in artificial intelligence, fundamentally transforming our approach to natural language processing and machine learning architectures. In this presentation, we will navigate through the evolutionary trajectory of LLMs, beginning with their historical foundations and theoretical underpinnings that have shaped the current landscape of AI. We will then delve into the architectural intricacies of transformer-based models, examining their self-attention mechanisms, positional encodings, and multi-head architectures that enable unprecedented language understanding and generation capabilities. As we explore the transformative impact of LLMs on traditional machine learning paradigms, we will analyze the evolution from conventional ML to LLM, highlighting the specialized operational frameworks, deployment strategies, and infrastructure requirements that distinguish these approaches. This transition encompasses novel considerations in computational orchestration, model versioning, prompt engineering, and systematic evaluation methodologies. We will critically examine how these operational paradigms are reshaping feature engineering, model architectures, and deployment pipelines in AI systems. To demonstrate these theoretical and operational principles in practice, we will conclude with a demonstration of our innovative LLM-based solution, illustrating how sophisticated architectural designs and robust operational frameworks converge to address complex real-world challenges.


Statistics and Experimentation Needs in Industry

This webinar will focus on the critical role of statistics in meeting the experimentation needs of industry. Participants will gain insights into how statistical methods are used to optimize processes, improve product quality, and drive innovation. The session will highlight various case studies, illustrating the impact of data-driven experimentation in sectors such as manufacturing, technology, and pharmaceuticals. Key topics will include A/B testing, hypothesis testing, and adaptive designs, with an emphasis on balancing speed and rigor in industrial research. The speaker will also discuss the challenges of scaling statistical methods for large, complex systems. Attendees will learn best practices for designing and analyzing experiments to generate actionable insights in competitive environments.


Generative AI for Use in Industry

This session will delve into the growing adoption of generative AI across various industries. Participants will explore real-world applications, from content creation in media to synthetic data generation in finance and healthcare. The speaker will highlight the technical underpinnings of generative models, such as GANs and transformers, and discuss their strengths and limitations. Ethical considerations, including issues of bias and intellectual property, will also be examined. The session will feature case studies showcasing how companies are leveraging generative AI to innovate and gain competitive advantages. Strategies for integrating generative AI into existing workflows and managing risks will round out the discussion.


Experimental Design and Causal Inference

This webinar will explore the fundamental principles of experimental design and their role in drawing valid causal conclusions. Attendees will learn how carefully structured experiments can minimize bias and confounding factors, ensuring robust results. The session will cover key techniques such as randomization, blocking, and factorial designs, emphasizing their practical application. Additionally, the speaker will introduce causal inference methods, including the use of instrumental variables and propensity score matching. Real-world examples will illustrate how these methods are applied across fields such as healthcare, marketing, and public policy. The importance of ethical considerations in designing experiments and interpreting causal relationships will also be discussed.


Uncertainty Quantification for Random Forests

In this session, attendees will explore methods for uncertainty quantification in random forests, addressing a key challenge in using these models for critical decision-making. The talk will introduce techniques such as prediction intervals and bootstrap aggregation to assess the reliability of model outputs. Lucas Mentch will discuss the importance of accounting for uncertainty, especially in applications where errors can have significant consequences. Examples will illustrate how to interpret and communicate uncertainty to stakeholders. The webinar will also cover recent advancements in the field, including Bayesian approaches to ensemble learning. Practical guidelines for incorporating uncertainty measures into model evaluation and reporting will be shared.


Deep Learning Methods for Closed-Loop Neuromodulation

This webinar will showcase innovative applications of deep learning in neuroscience, focusing on closed-loop neuromodulation systems. The session will explore how deep learning models can process neural signals in real-time to deliver targeted stimulation, improving outcomes for individuals with paralysis. Attendees will learn about the challenges of integrating deep learning with medical devices, including issues of latency and interpretability. The speaker will present case studies from clinical trials, highlighting breakthroughs in restoring motor function. Ethical considerations, such as patient privacy and data security, will also be discussed. Insights on future directions and interdisciplinary collaboration in this field will round out the session.


Machine Learning for Airborne Biological Hazard Detection

This session will explore the use of machine learning for detecting and identifying airborne biological hazards, with a focus on Raman spectroscopy. Attendees will learn how supervised and unsupervised learning techniques can analyze spectral data to differentiate between harmful and benign substances. The speaker will discuss challenges related to data preprocessing and model accuracy in dynamic environments. Case studies will illustrate real-world applications in public health and national security. The importance of rapid detection and classification in mitigating risks will be emphasized. Practical strategies for deploying machine learning models in field settings will also be shared.


Causal Inference in Marketing Analytics

This webinar will focus on the application of causal inference in marketing analytics, highlighting its role in optimizing campaigns and resource allocation. Attendees will learn about methods such as difference-in-differences and propensity score matching to measure the effectiveness of marketing interventions. The session will include case studies from the banking and retail sectors, demonstrating how causal models inform strategic decisions. Challenges related to data quality and model assumptions will be discussed, along with strategies for addressing them. The importance of collaboration between data scientists and marketing teams will be emphasized. Practical insights on leveraging causal inference to drive ROI and customer engagement will be provided.


Practical Return on AI Investment

In this session, participants will explore strategies for evaluating the return on investment (ROI) from AI initiatives. The speaker will discuss frameworks for measuring both tangible and intangible benefits, such as cost savings and improved decision-making. Case studies will illustrate how organizations assess the impact of AI on productivity and customer satisfaction. Challenges related to quantifying ROI, including data availability and attribution, will be addressed. Attendees will learn best practices for aligning AI projects with business objectives and managing risks. Practical advice on communicating the value of AI investments to stakeholders will also be shared.



Previous Webinars + Recordings

Veridical Data Science

Speaker: Professor Bin Yu | October 15, 2024
Abstract: The rapid advancement of AI relies heavily on the foundation of data science, yet its education significantly lags its demand in practice. The upcoming book 'Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making' (Yu and Barter, MIT Press, 2024; free online at www.vdsbook.com) tackles this gap by promoting Predictability, Computability, and Stability (PCS) as core principles for trustworthy data insights. PCS for veridical data science (VDS) has been developed in the process of solving scientific data science problems. It thoroughly integrates these principles into the Data Science Life Cycle (DSLC), from problem formulation to data cleansing and to result communication, fostering a new standard for responsible data analysis. This talk explores PCS' motivations, and compare the VDS book appraoch with traditional ones. Then I will describe two PCS projects on prostate cancer detection and discovery of epistastic genetic drivers for a heart disease. I will end with on-going work for PCS uncertainty quantificaiton in regression and its comparison with conformal prediction, PCS software pacakages (v-flow, simChe), and MEIRTS guidelines for data-inspired simulations.


Random Forests: Why They Work and Why That’s a Problem

Speaker: Lucas Mentch | November 19, 2024
Abstract: Random forests remain among the most popular off-the-shelf supervised machine learning tools with a well-established track record of predictive accuracy in both regression and classification settings.  Despite their empirical success, a full and satisfying explanation for their success has yet to be put forth. In this talk, we will show that the additional randomness injected into individual trees serves as a form of implicit regularization, making random forests an ideal model in low signal-to-noise ratio (SNR) settings. From a model-complexity perspective, this means that the mtry parameter in random forests serves much the same purpose as the shrinkage penalty in explicit regularization procedures like the lasso.  Realizing this, we demonstrate that alternative forms of randomness can provide similarly beneficial stabilization.  In particular, we show that augmenting the feature space with additional features consisting of only random noise can substantially improve the predictive accuracy of the model.  This surprising fact has been largely overlooked within the statistics community, but has crucial implications for thinking about how best to define and measure variable importance.  Numerous demonstrations on both real and synthetic data are provided.