Supercomputing and Big Data Tutorial: Parallel and Scalable Machine Learning Algorithms

NextGen @ Helmholtz Conference 2018
GFZ German Research Centre for Geosciences, Potsdam, Germany
2018-07-26
[ Event ]

Abstract

The fast training of traditional machine learning models and more innovative deep learning networks from increasingly growing large quantities of scientific and engineering datasets (aka ‘Big Data‘) requires high performance computing (HPC) on modern supercomputers today. HPC technologies such as those developed within the European DEEP-EST project provide innovative approaches w.r.t. processing, memory, and modular supercomputing usage during training, testing, and validation processes. This workshop thus focus on parallel and scalable machine learning driven by HPC and will pave the way for participants to use parallel processing on supercomputers as a key enabler for a wide variety of machine learning and deep learning algorithms used today. Examples include scientific and engineering applications that leverage traditional machine learning techniques such as scalable feature engineering, density-based spatial clustering of applications with noise (DBSCAN) and support vector machines (SVMs) with kernel methods. Those applications of traditional machine learning will be also compared with innovative deep learning models using Keras and TensorFlow taking advantage of convolutional neural networks (CNNs) for image datasets as well as long short-term memory (LSTM) networks for sequence data. Throughout learning these concrete models the participants will further learn required aspects of statistical learning theory and how to avoid overfitting in context of applications using various regularization and cross-validation techniques.


Supercomputing and Big Data Tutorial Parallel and Scalable Machine Learning Algorithms

Materials

[ Lecture 1 – HPC Introduction & Parallel and Scalable Clustering using DBSCAN – Slides ~13.2 MB (pdf) ]

[ Lecture 2 – Parallel and Scalable Classification using SVMs with Applications – Slides ~11.4 MB (pdf) ]

Lecture 3: Deep Learning using CNNs driven by HPC & GPUs – will be available shortly
Lecture 4: Deep Learning using LSTMs driven by HPC & GPUs – will be available shortly