R & Spark: Tools for Data Science Workflows
Co-sponsored by both NISS and the Canadian Statistical Sciences Institute (CANSSI), E. James Harner, Professor Emeritus of Statistics at West Virginia University worked with 15 participants in Toronto, April 12th and 13th, 2018.
In this workshop Dr. Harner worked through the initial steps in the data science process; extracting data from source systems, transforming data into a workable format, and then loading data into distributed file systems, distributed data warehouses or NoSQL databases. SparkR and sparklyr were then used as interfaces for modeling big data using regression and classification supervised learning methods. Unsupervised learning methods, such as clustering and dimension reduction, are also covered. Additional methods, such as gradient boosting and deep learning, are illustrated using the h2o and rsparkling R packages. Methods for analyzing streaming data were also presented. After the two-day workshop, participants walked away tired but a happy group.
Comments from participants:
"For the content and level of complexity this course provides, it was taught extremely well."
"I found the workshop very helpful. I know I'm going to look at the course notes again in the near future."
"The professor's years of experience can be seen very well. Provided a lot of insight on the field, as well as some small tips/tricks from his code."
If this is the type of workshop that you would like to see take place in your area, please do not hesitate to contact NISS with your suggestions.
Check out additional photos from this workshop.