Upskilling Researchers in Machine Learning

Maxime Rio, Jens Brinkmann

New Zealand eResearch Infrastructure (NeSI), The University of Auckland (UoA)

Tuesday, October 17, 2023

Who are we?

Maxime Rio

  • Data Science Engineer @ NeSI
  • Data Scientist @ NIWA
  • Help researchers optimise and scale-up
    their code
  • Develop ML pipelines and models
  • Organise ML and data science training

Jens Brinkmann

  • Senior eResearch Engagement Specialist @ UoA
  • Mechanical Engineer with a background in Photography/Videography
  • Support researchers with their computational needs and training around that

About this talk

  • We want to tell you about our experience with Machine Learning (ML) workshops.
  • We want to share some recommendations.
  • You can do it!

A shared goal

  • Introduce researchers to Machine Learning and Deep Learning
  • Start with foundational ML aspects and build up from there
  • Research-field agnostic
  • Hands-on approach (teaching by doing)
    • no show and tell of existing commercial solutions
    • no theoretical lectures
    • no deep dive into the maths behind ML
  • Not an exhaustive course but make attendees confident to try things by themselves

Workshops overview

UoA workshops

  • Audience: The University of Auckland (UoA) researchers
  • Two runs
    • Run #1: March 2023
    • Run #2: September 2023
  • Well-received
    • filtering by mandatory Expression of Interest (EoI)
    • about 100 applications for 40 spots

NeSI workshops

First ML101 workshop at eResearch NZ 2021

  • Audience: Aotearoa – NZ researchers
  • ML 101
    • Intro to Machine Learning
    • started in 2021
    • 7 workshops (in person, online)
    • 127 attendees in total (from 10 to 32)
  • ML 102
    • Intro to Deep Learning (CNNs)
    • started in 2022
    • 2 workshops (online)
    • 44 attendees in total (20 and 24)
  • Mixture of direct registration and EoIs

Recommendations

There will be a lot of interest so…

  • use an Expression of Interest for registration and filter,
  • 30 participants is a good number for an online training,
  • expect people to not show up (if free and online).

Platform(s)

UoA workshops

  • online only event: Zoom.us
  • BYOD (bring your own device)
  • major deviation from The Carpentries: No local Python installs
  • Goolge Colab Colab,a browser-based Jupyter Notebook using Google infrastructure (a virtual machine; a GPU can be added)

Google Colab in a Browser

NeSI workshops

JupyterLab session running on Jupyter-on-NeSI

  • Online and in person
    • 2 delivered in person (1 had wifi issues 😓)
    • 5 delivered online
  • Use Jupyter-on-NeSI
    • JupyterHub platform
    • Requires a NeSI account
    • ML101: 2 cores & 4 GB of RAM
    • ML102: 4 cores & 8 GB of RAM
  • Use Slurm-based job for GPU training (a little bit)
  • Tip: make sure the Platform team does not schedule upgrades that day 😬…

Recommendations

  • Make it online
  • Leverage online computational platforms (Google Colab, JupyterHub, Open OnDemand…)
  • No need for GPU to start (or small ones on Google Colab available)

Schedule

UoA workshops

Run #1

Time Budget Activity
two afternoons (8h) Python
one afternoon (4h) ML
two afternoons (8h) DL

Run #2

Time Budget Activity
two afternoons (8h) ML
two afternoons (8h) DL
  • all workshops took place in the same week
  • no mixing and matching, signing up = coming to all sessions
  • Major adjustment for Run #2: Python as a prerequisite, not part of the series

NeSI workshops

  • ML 101
    • 6 hours with 3 breaks ☕
    • at first in one day
    • now split over 2 mornings
  • ML 102
    • 3 hours with 2 breaks 🍵
  • Independent workshops
  • But organised “close” to each other

ML101 runsheet, used to keep track of time

Recommendations

  • Split/shorter sessions (bearing in mind the scheduling challenges for researchers)
  • Stick to scheduled breaks
  • Follow best practices for online audiences:

Material

UoA workshops

Lesson Title Status Run #1 Run #2
Programming with Python Released Mon, Tue -
Introduction to Machine Learning with Scikit Learn Alpha Wed Mon, Tue
Introduction to Deep Learning Beta Thu, Fri Wed, Thu

 

NeSI workshops

My rehearsal and source of inspiration 💓

NeSI workshops

Recommendations

CeR and NeSI independently decided to base the workshops on existing material.

  • Don’t reinvent the wheel
  • Reuse/adapt content

Content

Machine Learning

  • Data preparation
  • Supervised vs. unsupervised learning
    • regression
    • classification
    • clustering
    • dimensionality reduction
  • Ensemble models (random forests)
  • Validation
    • train/test/validation split
    • cross-validation
    • validation and learning curves

Deep Learning

  • Model architectures
    • Multi-layer perceptron
    • Convolutional neural network
      (CNN | computer vision)
  • Model training
    • optimisers and mini-batch
    • overfitting and early stopping
    • data augmentation
    • dropout, batch normalisation, …
  • Transfer learning

Recommendations

  • Random forest is a good first non-linear model to learn
    • intuitive to understand how it works
    • good performances on tabular data
    • doesn’t require too much care in terms of data preparation
  • Resist the temptation of MLP (multi-layer perceptron) for an ML intro
    • require more notions (architecture, training, data preprocessing, …)
    • keep it for Deep Learning introduction

Summary

  • Use Expression-of-Interest for registrations
  • Make it online
  • Use an online compute platform
    • (Google Colab, JupyterHub or Open OnDemand at your institution)
  • You don’t need fancy GPUs, even though you can have some
  • Re-use and adapt existing material
  • Use shorter sessions and stick to breaks
  • Keep MLP for the Deep Learning section, start with Random Forests

How to get in touch

Kererū / New Zealand pigeon, not use(ful) for mails 😅 Image Credit: Department of Conservation

Maxime Rio
📨 maxime.rio@nesi.org.nz

Jens Brinkmann
📨 j.brinkmann@auckland.ac.nz

Upskilling Researchers in Machine Learning