Docker Install Nvidia Driver

2/13/2022by admin
Docker
  • Sep 13, 2019 Since the NVIDIA GPU support is 'in' docker-ce now there is no need to force the repo to 'Bionic' to get compatibility with the NVIDIA docker setup. (However, you will have to force 'ubuntu18.04' for the nvidia-container-toolkit install since NVIDIA doesn't officially support 19.04. We'll take care of that later.) Install docker-ce.
  • Dec 01, 2018 So to clarify, do the nvidia/cuda docker images come with Nvidia drivers pre-installed? If not, what is the set of Dockerfile commands to run to download/install the Nvidia drivers into that Docker image?
  • Then, continue to install the Docker packages with these commands (Our nvidia-docker did not work for version 1.4.0. We found the solution as downgrading the nvidia-docker packages and disabling requirements checking (NVIDIADISABLEREQUIRE) in “docker run” commands.): cd.

In this tutorial, I’m going to walk through why RHEL Docker is different, why we can’t just install nvidia-docker 2.0 in this environment, and why that isn’t really a bad thing. Note: For those of you who aren’t interested in the background, and just want to install nvidia-docker 2 on RHEL, you can jump to the section here.

Estimated reading time: 6 minutes

Compose services can define GPU device reservations if the Docker host contains such devices and the Docker Daemon is set accordingly. For this, make sure to install the prerequisites if you have not already done so.

The examples in the following sections focus specifically on providing service containers access to GPU devices with Docker Compose. You can use either docker-compose or docker compose commands.

Use of service runtime property from Compose v2.3 format (legacy)

Docker Install Nvidia Driver

Docker Compose v1.27.0+ switched to using the Compose Specification schema which is a combination of all properties from 2.x and 3.x versions. This re-enabled the use of service properties as runtime to provide GPU access to service containers. However, this does not allow to have control over specific properties of the GPU devices.

Enabling GPU access to service containers

Docker Compose v1.28.0+ allows to define GPU reservations using the device structure defined in the Compose Specification. This provides more granular control over a GPU reservation as custom values can be set for the following device properties:

Install
  • capabilities - value specifies as a list of strings (eg. capabilities: [gpu]). You must set this field in the Compose file. Otherwise, it returns an error on service deployment.
  • count - value specified as an int or the value all representing the number of GPU devices that should be reserved ( providing the host holds that number of GPUs).
  • device_ids - value specified as a list of strings representing GPU device IDs from the host. You can find the device ID in the output of nvidia-smi on the host.
  • driver - value specified as a string (eg. driver: 'nvidia')
  • options - key-value pairs representing driver specific options.

Note

You must set the capabilities field. Otherwise, it returns an error on service deployment.

count and device_ids are mutually exclusive. You must only define one field at a time.

For more information on these properties, see the deploy section in the Compose Specification.

Example of a Compose file for running a service with access to 1 GPU device:

Run with Docker Compose:

If no count or device_ids are set, all GPUs available on the host are going to be used by default.

On machines hosting multiple GPUs, device_ids field can be set to target specific GPU devices and count can be used to limit the number of GPU devices assigned to a service container. If count exceeds the number of available GPUs on the host, the deployment will error out.

To enable access only to GPU-0 and GPU-3 devices:

documentation, docs, docker, compose, GPU access, NVIDIA, samples

The document refers to the installation of NVIDIA CUDA drivers and NVIDIA Docker on each node containing GPUs. For using the NVIDIA GPU Operator instead, see the Cluster Installation documentation.

If you are using DGX OS then NVIDIA prerequisites are already installed and you may skip this document

On each machine with GPUs run the following steps.

Step 1: Install the NVIDIA CUDA Toolkit¶

Run:

If the command is installed. Verify that the NVIDIA driver version 410.104 or later and CUDA 10 or later.

If the command is not successful, you must install the CUDA Toolkit. Follow the instructions here to install. When the installation is finished you must reboot your computer.

If the machine is DGX A100, then, depending on your operating system, you may have to install the NVIDIA Fabric Manager.

  • If the operating system is DGX OS. No installation is required
  • For other operating systems:
    • Run: nvidia-smi and get the NVIDIA Driver version (it must be 450 or later).
    • Run: sudo apt search fabricmanager to find a Fabric Manager package with the same version and install it.

Docker Install Nvidia Driver

Important

NVIDIA Fabric manager does not work on CoreOS. If you are working with OpenShift, use RHEL instead or use the NVIDIA GPU Operator rather than installing NVIDIA software on each machine.

Step 2: Install Docker¶

Install Docker by following the steps here: https://docs.docker.com/engine/install/. Specifically, you can use a convenience script provided in the document:

Step 3: NVIDIA Container Toolkit (previously named NVIDIA Docker)¶

To install NVIDIA Docker on Debian-based distributions (such as Ubuntu), run the following:

For RHEL-based distributions, run:

For a detailed review of the above instructions, see the NVIDIA Container Toolkit installation instructions.

Ubuntu Docker Install Nvidia Driver

Warning

Kubernetes does not currently support the NVIDIA container runtime, which is the successor of NVIDIA Docker/NVIDIA container toolkit.

Docker Install Nvidia Driver In Debian

Nvidia

Step 4: Make NVIDIA Docker the default docker runtime¶

Set the NVIDIA runtime as the default Docker runtime on your node. Edit the docker daemon config file at /etc/docker/daemon.json and add the default-runtime key as follows:

Then run the following again:
Comments are closed.