Concepts

Edit This Page

Device Plugins

Starting in version 1.8, Kubernetes provides a device plugin framework for vendors to advertise their resources to the kubelet without changing Kubernetes core code. Instead of writing custom Kubernetes code, vendors can implement a device plugin that can be deployed manually or as a DaemonSet. The targeted devices include GPUs, High-performance NICs, FPGAs, InfiniBand, and other similar computing resources that may require vendor specific initialization and setup.

Device plugin registration

The device plugins feature is gated by the DevicePlugins feature gate which is disabled by default before 1.10. When the device plugins feature is enabled, the kubelet exports a Registration gRPC service:

service Registration {
	rpc Register(RegisterRequest) returns (Empty) {}
}

A device plugin can register itself with the kubelet through this gRPC service. During the registration, the device plugin needs to send:

Following a successful registration, the device plugin sends the kubelet the list of devices it manages, and the kubelet is then in charge of advertising those resources to the API server as part of the kubelet node status update. For example, after a device plugin registers vendor-domain/foo with the kubelet and reports two healthy devices on a node, the node status is updated to advertise 2 vendor-domain/foo.

Then, users can request devices in a Container specification as they request other types of resources, with the following limitations:

Suppose a Kubernetes cluster is running a device plugin that advertises resource vendor-domain/resource on certain nodes, here is an example user pod requesting this resource:

apiVersion: v1
kind: Pod
metadata:
  name: demo-pod
spec:
  containers:
    - name: demo-container-1
      image: k8s.gcr.io/pause:2.0
      resources:
        limits:
          vendor-domain/resource: 2 # requesting 2 vendor-domain/resource

Device plugin implementation

The general workflow of a device plugin includes the following steps:

  service DevicePlugin {
        // ListAndWatch returns a stream of List of Devices
        // Whenever a Device state change or a Device disappears, ListAndWatch
        // returns the new list
        rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {}

        // Allocate is called during container creation so that the Device
        // Plugin can run device specific operations and instruct Kubelet
        // of the steps to make the Device available in the container
        rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
  }

A device plugin is expected to detect kubelet restarts and re-register itself with the new kubelet instance. In the current implementation, a new kubelet instance deletes all the existing Unix sockets under /var/lib/kubelet/device-plugins when it starts. A device plugin can monitor the deletion of its Unix socket and re-register itself upon such an event.

Device plugin deployment

A device plugin can be deployed manually or as a DaemonSet. Being deployed as a DaemonSet has the benefit that Kubernetes can restart the device plugin if it fails. Otherwise, an extra mechanism is needed to recover from device plugin failures. The canonical directory /var/lib/kubelet/device-plugins requires privileged access, so a device plugin must run in a privileged security context. If a device plugin is running as a DaemonSet, /var/lib/kubelet/device-plugins must be mounted as a Volume in the plugin’s PodSpec.

Kubernetes device plugin support is still in alpha. As development continues, its API version can change in incompatible ways. We recommend that device plugin developers do the following:

If you enable the DevicePlugins feature and run device plugins on nodes that need to be upgraded to a Kubernetes release with a newer device plugin API version, upgrade your device plugins to support both versions before upgrading these nodes to ensure the continuous functioning of the device allocations during the upgrade.

Monitoring Device Plugin Resources

In order to monitor resources provided by device plugins, monitoring agents need to be able to discover the set of devices that are in-use on the node and obtain metadata to describe which container the metric should be associated with. Prometheus metrics exposed by device monitoring agents should follow the Kubernetes Instrumentation Guidelines, which requires identifying containers using pod, namespace, and container prometheus labels.
The kubelet provides a gRPC service to enable discovery of in-use devices, and to provide metadata for these devices:

// PodResources is a service provided by the kubelet that provides information about the
// node resources consumed by pods and containers on the node
service PodResources {
    rpc List(ListPodResourcesRequest) returns (ListPodResourcesResponse) {}
}

The gRPC service is served over a unix socket at /var/lib/kubelet/pod-resources/kubelet.sock. Monitoring agents for device plugin resources can be deployed as a daemon, or as a DaemonSet. The cannonical directory /var/lib/kubelet/pod-resources requires privileged access, so monitoring agents must run in a privileged security context. If a device monitoring agent is running as a DaemonSet, /var/lib/kubelet/pod-resources must be mounted as a Volume in the plugin’s PodSpec.

Support for the “PodResources service” is still in alpha.

Examples

For examples of device plugin implementations, see:

FEATURE STATE: Kubernetes v1.13 beta
This feature is currently in a beta state, meaning:

  • The version names contain beta (e.g. v2beta3).
  • Code is well tested. Enabling the feature is considered safe. Enabled by default.
  • Support for the overall feature will not be dropped, though details may change.
  • The schema and/or semantics of objects may change in incompatible ways in a subsequent beta or stable release. When this happens, we will provide instructions for migrating to the next version. This may require deleting, editing, and re-creating API objects. The editing process may require some thought. This may require downtime for applications that rely on the feature.
  • Recommended for only non-business-critical uses because of potential for incompatible changes in subsequent releases. If you have multiple clusters that can be upgraded independently, you may be able to relax this restriction.
  • Please do try our beta features and give feedback on them! After they exit beta, it may not be practical for us to make more changes.

Feedback