
Nancy McMillan (Battelle) welcomed attendees to the AI statistics and data science in practice webinar hosted by NISS, and introduced Jared Schuetter, a principal data scientist at Battelle with over 15 years of experience in integrating statistical methods and machine learning AI techniques. Jared discussed the importance of detecting biological contaminants in various settings, the development of their device, REBS, and the challenges faced in analyzing the data it produces. He also provided an overview of Battelle, a nonprofit independent research lab founded in 1925, and discussed the evolution of their device, REBS, and the process of Raman spectroscopy and how the REBS device works to analyze particles.
Jared shared his work on classification of chemical and biological spectral signatures, focusing on the machine learning for airborne biological hazard detection. He discussed the REBS system, a device developed at Battelle for biological hazard detection, and the challenges faced in analyzing the data it produces. Jared also shared his background in statistics and machine learning, and his transition from hands-on technical work to task leadership and business development.
Detecting Biological Contaminants with Passive Sensors
Jared discussed the importance of detecting biological contaminants in various settings, including national security, military applications, pharmaceuticals, and water supplies. He explained the concept of using passive sensors to detect and identify particles in the air, which can then be analyzed to identify anomalies or potential threats. The data collected can be used to send alerts to relevant authorities, such as first responders or dispatch services. Jared also highlighted the need to consider factors like sensor placement, complexity, time resolution, and potential scenarios when designing such a system. He ended by asking for questions or feedback.
Sensor Types and Limitations Discussed
Jared discussed various types of sensors used in the field, emphasizing the advantages and limitations of each. He highlighted the low-cost, lightweight light-induced fluorescence (LIF) sensors, which detect biological particles but lack identification capabilities. He also mentioned spectroscopic methods like mass spectrometry and Raman spectroscopy, which can identify particles but require reagents and frequent maintenance. Jared introduced the REBS sensor, which operates autonomously and requires less maintenance, but has limitations with battery life and tape reel replacement. He concluded by mentioning newer methods like microfluidic sensors, sequencing analysis, and remote sensing, which require more human interaction.
Battelle's National Security and Health Work
Jared provided an overview of Battelle, a nonprofit independent research lab founded in 1925. Battelle's work spans various areas, including national security, health, and environment infrastructure. The organization has three main pillars: contract research, charitable giving, and laboratory management of national labs. Battelle has a significant presence in the field of chemical, biological, radiological, nuclear, and explosive materials, with a large privately owned biosafety level 3 facility in the US. The organization supports the testing of medical countermeasures, including animal and in vitro testing, and offers capabilities in threat analysis, sensor data detection, device testing, and operations planning.
Evolution of REBS Device
Jared discussed the evolution of their device, REBS, which started in 2006. Initially, the device was developed to collect data for material characterization in a lab setting. As the device matured, it was adapted for various applications, including military and national security. The device was made more portable, ruggedized, and able to operate in extreme temperatures. However, this led to a challenge of collecting good data with limited space. Jared also mentioned the development of REBS+ for urban bio threat environments and indoor environments like hospitals. He concluded by discussing the data produced by the device and how it was analyzed, which has since matured.
Raman Spectroscopy and REBS Device
Jared explains the process of Raman spectroscopy and how the REBS (Resource Effective Bioidentification System) device works to analyze particles. He describes how particles are collected, deposited on a tape, and analyzed using a laser. The device captures spectral data from particles, which is then processed through several steps including background removal, trend removal, and interpolation to standardize the data across instruments. Jared also mentions that they have algorithms to handle multiple particle types in a single image and to classify different types of spectral responses.
Classification System for Organisms Based on Spectra
Jared discussed the development of a classification system for different organisms based on their spectra. Initially, they used principal component analysis and later moved to support vector machines, random forests, and convolutional neural nets. They also considered non-negative matrix factorization and functional data analysis methods but did not proceed with them. The system has evolved over time, with improvements in data quality and method complexity. The current method uses convolutional neural nets and does not require dimension reduction. Nancy asked about the use of non-negative matrix factorization and functional data analysis methods, to which Jared responded that they were considered but did not go far due to issues with the spectra not following certain functional forms. Jared also clarified that the "library" in the autoencoder picture refers to the organisms in the training set.
Collecting Training Data for Algorithms
Jared discussed the process of collecting training data for their algorithms, which involves injecting target material into a chamber and measuring spectra. He explained the challenges of dealing with noisy data, such as spectra from particles that only partially overlap with the laser or background noise. To address this, they developed automated methods for assigning truth to spectra and used clustering to group them. They also discussed the importance of signal-to-noise ratio in their analysis. Jared mentioned that they have settled on a Fourier analysis approach to define signal and noise ranges and calculate the signal-to-noise ratio.
Challenges in Organism Signature Classification
Jared discussed the challenges in classifying different signatures of the same organism, particularly spores and vegetative cells. He explained the process of “truthing”, which involves identifying and bucketing spectra into different signatures. Jared also highlighted the issue of between-class variability being less than within-class variability for similar organisms, making classification difficult. He mentioned the use of signal-to-noise filters and complex models to improve accuracy. Additionally, Jared discussed the classification of organisms at different levels of specificity and the impact of environmental factors on the spectra.
Mitigating Data Collection Issues
Jared discussed the variability in data collection sessions and the environmental conditions that affect the instruments. He highlighted issues such as the rolling circle filter being affected by fluorescent signatures and the occurrence of different pixel sensitivities in the CCD array. Jared also mentioned the sinusoidal pattern in the data due to the CCD array's etaloning pattern. He explained the methods used to mitigate these issues, including smoothing over the bad pixels, adjusting the analysis region, and using Fourier space to detect and remove the sinusoidal pattern. Jared concluded by noting that the hardware improvements have reduced the impact of these issues over time.
Challenges in Classification and Data Collection
Jared discussed the challenges of classification in their research, particularly with the algorithm trained on a set of known organisms. He highlighted the issue of detecting particles from unknown organisms and the need to develop methods to avoid misclassifications. Jared also mentioned the difficulties in R&D work due to changing hardware and data quality, and the need to pivot towards new markets. He touched on the challenges of collecting data on dangerous organisms and the need for robust training libraries. Nancy asked about testing small organic particles from trees, to which Jared responded that their focus is on particles small enough to be inhaled, but they have explored other applications like monitoring liquid streams and chemical analysis.
Data Analysis Challenges and Lessons
Jared discussed the challenges and lessons learned from his work on a project involving data analysis and algorithm development. He emphasized the importance of regularly reviewing raw data and process data to ensure accuracy, understanding variability in data sources, and considering multiple perspectives when developing solutions. He also highlighted the need for skepticism and open-mindedness in the R&D process. Nancy then asked Jared about the potential use of Kruskal-Wallis to create a frequency distribution on a matrix from a laser, which Jared agreed to consider. The conversation ended with Nancy thanking Jared for his presentation and offering to share his slides in PDF format.
Thank you and Recognition
We extend our heartfelt gratitude to Jared Schuetter for his insightful presentation at the NISS AI, Statistics, and Data Science in Practice webinar. His expertise in machine learning for airborne biological hazard detection provided a fascinating look into the innovative work being done at Battelle. Jared’s discussion on the evolution of the REBS device, the challenges in spectral classification, and the complexities of data analysis in biological hazard detection was both informative and engaging. His ability to break down intricate concepts, from Raman spectroscopy to convolutional neural networks, offered valuable insights to our audience. We appreciate his time, knowledge, and dedication to advancing technology in this critical field. Thank you, Jared, for sharing your expertise and for your contributions to the intersection of AI, data science, and public safety.