Peaceful Co-existence of HPC and Kubernetes Workloads for Research Computing

Time: 
11:00 AM to 11:40 AM
Room: 
Genentech Hall S202
Track: 
Research
Description: 

Sponsored by IT GovernanceIn 2018, our campus stood up a high-performance computing (HPC) cluster by networking together several different department’s research data centers. This consolidated compute nodes and storage into a single, much larger cluster which could then be utilized by any lab. We followed a co-operative model where a department could donate hardware to this cluster, and their jobs will be “niced” at level proportionate to their contribution value. Primarily all these jobs are traditional batch computing jobs and are only approved to work with non-PHI data, however, with time, new research computing use cases have started emerging.

Share UCSF data responsiblyFirst off, there is growing demand for use cases that require batch computing with PHI data, including de-identification jobs that would run much more efficiently if they had access to the HPC cluster. Then there are interactive data science use cases that require hosting application services such as Spark and JupyterHub. For Machine Learning, Artificial Intelligence, and Deep Learning use cases, more complex container orchestration is desired using tools such as Kubernetes (K8S).

But how do you take an HPC cluster that was built for batch computing from the ground up and make it support K8S? Do you spin up an entirely new cluster? And since supporting PHI requires security controls like encryption and audit logging, how do you prevent them from adding significant performance hit to non-PHI jobs? And lastly, how would the co-op model work in the new environment containing GPU, Containers, and shared security responsibilities for working with PHI? These are the questions our group is going through to designing a solution approach to enable our HPC environment to transition from a batch-only HPC model to also accommodate container orchestration and PHI. This presentation will walk you through our journey and include interactive discussions to learn from each other’s experiences on how best to support these diverse research computing use cases.

Slides: https://ucsf.box.com/s/cwrmo11oeq3a2fitnl5it5yh9l5aki4l (MyAccess login required)

Presenter(s): 
Sandeep Giri
Rick Larsen
Session Type: 
Skill Level: 
Intermediate
Previous Knowledge: 

Basic knowledge of research computing, HPC, and containers

Speaker Experience: 

Sandeep Giri is a project manager for Academic Research Services where he manages project Wynton that provides on-site high-performance computing clusters in a co-op model. Previously, he was the program manager for AWS Research Cloud (ARC) program for SOM Tech, and he also led product management for CDHI to develop a solution that streamlines patient referral processing workflow leveraging interoperability technologies like FHIR. Sandeep has held several CTO, Product Manager, and Software Architect roles since the late 90’s.

Rick Larsen is the UCSF Director of Research Informatics, and has 30 years of experience in the Healthcare Informatics field.