Supercomputing and Big Data Tutorial: Parallel and Scalable Machine Learning Algorithms
NextGen @ Helmholtz Conference 2018
GFZ German Research Centre for Geosciences, Potsdam, Germany
2018-07-26
[ Event ]
Abstract
The fast training of traditional machine learning models and more innovative deep learning networks from increasingly growing large quantities of scientific and engineering datasets (aka ‘Big Data‘) requires high performance computing (HPC) on modern supercomputers today. HPC technologies such as those developed within the European DEEP-EST project provide innovative approaches w.r.t. processing, memory, and modular supercomputing usage during training, testing, and validation processes. This workshop thus focus on parallel and scalable machine learning driven by HPC and will pave the way for participants to use parallel processing on supercomputers as a key enabler for a wide variety of machine learning and deep learning algorithms used today. Examples include scientific and engineering applications that leverage traditional machine learning techniques such as scalable feature engineering, density-based spatial clustering of applications with noise (DBSCAN) and support vector machines (SVMs) with kernel methods. Those applications of traditional machine learning will be also compared with innovative deep learning models using Keras and TensorFlow taking advantage of convolutional neural networks (CNNs) for image datasets as well as long short-term memory (LSTM) networks for sequence data. Throughout learning these concrete models the participants will further learn required aspects of statistical learning theory and how to avoid overfitting in context of applications using various regularization and cross-validation techniques.
Materials
[ Lecture 1 – HPC Introduction & Parallel and Scalable Clustering using DBSCAN – Slides ~13.2 MB (pdf) ]
Lecture 3: Deep Learning using CNNs driven by HPC & GPUs – will be available shortly
Lecture 4: Deep Learning using LSTMs driven by HPC & GPUs – will be available shortly
Thanks to @HelmholtzJrs for an invited keynote on 'Supercomputing Impact in Big Data & AI' including modular supercomputing driven by @DEEPprojects @fzj_jsc @fz_juelich & @Haskoli_Islands alongside @helmholtz_de president Prof. Otmar Wiestler at @GFZ_Potsdam #NextGenHelmholtz pic.twitter.com/0yUUS31z5e
— Morris Riedel (@MorrisRiedel) July 25, 2018