Running a massively scalable CUDA-accelerated AI/ML lab on WSL 2 with Determined
Getting a scalable AI/ML model training environment set up and running on WSL 2, with Docker Desktop and CUDA GPU compute.
![Running a massively scalable CUDA-accelerated AI/ML lab on WSL 2 with Determined](/content/images/size/w2000/2022/06/Screenshot-2022-06-21-011700-2.png)
Determined is an open source platform for AI/ML model development and training at scale.
Determined handles the provisioning of machines, networking, data loading, and provides fault tolerance.
![](https://boxofcables.dev/content/images/2022/06/image-22.png)
It allows AI/ML engineers to pool and share computing resources, track experiments, and supports deep learning frameworks like PyTorch, TensorFlow, and Keras.
I am still learning about AI/ML. My interest was piqued after GPU compute arrived on Windows Subsystem for Linux (WSL), starting with CUDA.
Determined seems like a very cool and easy to use platform to learn more on, it offers a web-based dashboard and includes a built-in AI/ML IDE.
There are several ways to deploy Determined, including pip
, and to use Determined, like a terminal cli tool, det
.
My preference is a more cloud native approach deploying with containers and interacting through the web-based dashboard.
This guide will cover setting up a local Determined deployment on WSL 2 with Docker Desktop.
We will:
- Verify a working GPU setup on WSL
- Deploy a database backend container
- Deploy a Determined master node container connected to the database
- Deploy and connect a Determined agent node container to the Determined master node
- Launch the JupyterLab IDE in the Determined web interface
Requirements for this tutorial:
- Windows 11 (recommended) or Windows 10 21H2
- Windows Subsystem for Linux Preview from the Microsoft Store (recommended) or the standard Windows Subsystem for Linux feature but run
wsl.exe --update
to make sure you have the latest WSL kernel - The latest NVIDIA GPU drivers directly from NVIDIA, not just Windows Update drivers
- Any WSL distro
- Docker Desktop 4.9+ installed with WSL integration enabled for the WSL distro you are going to be working in
- A CUDA-enabled NVIDIA GPU, e.g. GeForce RTX 1080 or higher*
*This workflow does work without a CUDA-enabled NVDIA GPU but will default to CPU-only if no GPU is available.
Links
- Determined.AI
- Determined Docs
- Enable NVIDIA CUDA on WSL (Microsoft Docs)
Basics
Verify that Docker Desktop is accessible from WSL 2:
docker --version
![](https://boxofcables.dev/content/images/2022/06/image.png)
This should not be docker-ce or an equivalent installed in WSL, but the aliases Docker Desktop places using WSL integration:
![](https://boxofcables.dev/content/images/2022/06/image-3.png)
Verify that GPU support is working in Docker and WSL 2:
docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
![](https://boxofcables.dev/content/images/2022/06/image-2.png)
Note my NVIDIA GeForce 2070 Super is visible in nvidia-smi
output.
Set up PostgreSQL
Start an instance of PostgreSQL:
docker run -d --name determined-db -p 5432:5432 -v determined_db:/var/lib/postgresql/data -e POSTGRES_DB=determined -e POSTGRES_PASSWORD=password postgres:10
I recommend changing your password to anything besides password
.
![](https://boxofcables.dev/content/images/2022/06/image-6.png)
![](https://boxofcables.dev/content/images/2022/06/image-4.png)
Get your WSL IP address
Grab your WSL instance's eth0
IP address from ip
, parse it using sed
, and stash it as an environmental variable $WSLIP
:
WSLIP=$(ip -f inet addr show eth0 | sed -En -e 's/.*inet ([0-9.]+).*/\1/p')
![](https://boxofcables.dev/content/images/2022/06/image-7.png)
Start the Determined Master Node
Start up an instance of the determined-master image, connected to the PostgreSQL determined
database we spun up on port 5432:
docker run -d --name determined-master -p 8080:8080 -e DET_DB_HOST=$WSLIP -e DET_DB_NAME=determined -e DET_DB_PORT=5432 -e DET_DB_USER=postgres -e DET_DB_PASSWORD=password determinedai/determined-master:latest
![](https://boxofcables.dev/content/images/2022/06/image-8.png)
![](https://boxofcables.dev/content/images/2022/06/image-9.png)
Launch the Determined Master Node web dashboard:
powershell.exe /c start http://$WSLIP:8080
![](https://boxofcables.dev/content/images/2022/06/image-11.png)
Use the default admin
account, no password, to log in.
Now you have access to the Determined dashboard.
![](https://boxofcables.dev/content/images/2022/06/image-12.png)
But we do not have any agents connected to run experiments on.
![](https://boxofcables.dev/content/images/2022/06/image-13.png)
Attach a Determined Agent Node
Start up an instance of the determined-agent image, pointed at our Determined Master host IP ($WSLIP) and port (8080):
docker run -d --gpus all -v /var/run/docker.sock:/var/run/docker.sock --name determined-agent -e DET_MASTER_HOST=$WSLIP -e DET_MASTER_PORT=8080 -e NVIDIA_DRIVER_CAPABILITIES=compute,utility determinedai/determined-agent:latest
![](https://boxofcables.dev/content/images/2022/06/image-17.png)
Note:
- Include
--gpus all
is to pass-through our NVIDIA GPU to the determined-agent container. - Set
NVIDIA_DRIVER_CAPABILITIES
to also includecompute
, overriding the determined-agent default of justutility
. This enables the agent to detect the pass-through CUDA GPU. This issue was documented and I submitted a PR. - If you do not have an CUDA-enabled GPU and wish to use CPU only, use:
docker run -d -v /var/run/docker.sock:/var/run/docker.sock --name determined-agent -e DET_MASTER_HOST=$WSLIP -e DET_MASTER_PORT=8080 determinedai/determined-agent:latest
![](https://boxofcables.dev/content/images/2022/06/image-15.png)
Return to the Determined dashboard, to see our clusters:
powershell.exe /c start http://$WSLIP:8080/det/clusters
We can now see 1 connected agent and 0/1 CUDA slots allocated, ready for training deep learning models:
![](https://boxofcables.dev/content/images/2022/06/image-18.png)
Click Launch JupyterLab
to spin up a web-based Python IDE for notebooks, code, and data:
![](https://boxofcables.dev/content/images/2022/06/image-19.png)
And our available CUDA GPU will be automatically assigned. You can see how it is provisioned and visible in the Determined dashboard:
![](https://boxofcables.dev/content/images/2022/06/1-2.png)
![](https://boxofcables.dev/content/images/2022/06/Screenshot-2022-06-21-011700-1.png)
![](https://boxofcables.dev/content/images/2022/06/Screenshot-2022-06-21-011710-1.png)
And now we have a CUDA-accelerated JupyterLab Python AI/ML IDE:
![](https://boxofcables.dev/content/images/2022/06/image-20.png)
We can even start up additional CPU-only Determined worker agents:
docker run -d -v /var/run/docker.sock:/var/run/docker.sock --name determined-agent-2 -e DET_MASTER_HOST=$WSLIP -e DET_MASTER_PORT=8080 determinedai/determined-agent:latest
Note the tweaked the name of the image to determined-agent-2
.
And see those resources available in the Determined web dashboard:
![](https://boxofcables.dev/content/images/2022/06/image-21.png)
Notes
- When stopping determined-agent, be sure to stop determined-fluent too.