Install the DRA Driver for Edera Zones

7 min read · Intermediate


This guide installs the DRA Driver for Edera Zones, claims a Zone from a Pod, and verifies the allocation end-to-end. By the end you’ll have:

  • The driver running as a DaemonSet on every Edera-runtime node.
  • The edera-zone DeviceClass available cluster-wide.
  • A working Pod that consumes a Zone through a ResourceClaimTemplate.

For the underlying design—what DRA is, how the driver represents Zone capacity, and why an extra device appears per claim—see DRA Driver for Edera Zones.

Prerequisites

Before starting:

  • Kubernetes 1.33 or later. The chart is capability-aware and selects between resource.k8s.io/v1 (GA in 1.34), v1beta2 (1.33), and v1beta1 (1.32) based on what the cluster offers. The ResourceClaimTemplate example below is provided in both v1 and v1beta1 variants—pick the tab that matches your cluster.
  • Edera runtime installed on the target nodes. The driver does not install or manage the runtime. See Install Edera.
  • The edera RuntimeClass. Verify with kubectl get runtimeclass edera. If missing, see Install Edera—apply the Edera RuntimeClass.
  • Helm 3.7 or later (for OCI registry support).
  • kubectl 1.32 or later.

Verify the cluster has the DRA API available:

kubectl api-resources --api-group=resource.k8s.io

Expected output (abridged, on a 1.34 cluster):

NAME                     APIVERSION           NAMESPACED   KIND
deviceclasses            resource.k8s.io/v1   false        DeviceClass
resourceclaims           resource.k8s.io/v1   true         ResourceClaim
resourceclaimtemplates   resource.k8s.io/v1   true         ResourceClaimTemplate
resourceslices           resource.k8s.io/v1   false        ResourceSlice

If you see no rows, the cluster is older than 1.32 or DRA is disabled at the API server.

Install the chart

Install the chart from the Edera OCI registry into the edera-system namespace. The edera-system namespace is reserved for cluster-administrative Edera components—keep tenant workloads out of it.

helm install edera-kube-agent \
  oci://ghcr.io/edera-dev/charts/edera-kube-agent \
  --namespace edera-system \
  --create-namespace
ℹ️

Supply chain verification. Released images are published with build provenance attestations. Verify before installing into a production cluster:

gh attestation verify oci://ghcr.io/edera-dev/edera-kube-agent:v1.0.0 \
  --owner edera-dev

Verify the driver pod is running on every Edera node:

kubectl -n edera-system get pod -o wide

Expected output:

NAME                       READY   STATUS    RESTARTS   AGE
edera-kube-agent-xxxxx     1/1     Running   0          30s

One pod per node is expected—the driver is a DaemonSet. Verify the DeviceClass and a ResourceSlice per node:

kubectl get deviceclass edera-zone
kubectl get resourceslice

Expected output:

NAME         AGE
edera-zone   45s

NAME                                            NODE            DRIVER           POOL            AGE
<node>-zone.edera.dev-<suffix>                  <node>          zone.edera.dev   <node>          45s

If kubectl get resourceslice returns No resources found, see Troubleshooting.

Claim a Zone from a Pod

The driver is installed. Now create a ResourceClaimTemplate and a Pod that consumes it.

Create the namespace and template

kubectl create namespace edera-test1

Create a ResourceClaimTemplate that asks for one device from the edera-zone class. Use the tab that matches your cluster’s Kubernetes version:

kubectl -n edera-test1 apply -f - <<EOF
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  name: edera-zone
spec:
  spec:
    devices:
      requests:
        - name: zone
          exactly:
            deviceClassName: edera-zone
EOF
kubectl -n edera-test1 apply -f - <<EOF
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaimTemplate
metadata:
  name: edera-zone
spec:
  spec:
    devices:
      requests:
        - name: zone
          deviceClassName: edera-zone
          allocationMode: ExactCount
          count: 1
EOF

Run a Pod that consumes the template

kubectl -n edera-test1 apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: pod0
  labels:
    app: pod
spec:
  runtimeClassName: edera
  containers:
    - name: ctr0
      image: ubuntu:22.04
      command: ["bash", "-c"]
      args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"]
      resources:
        claims:
          - name: zone
  resourceClaims:
    - name: zone
      resourceClaimTemplateName: edera-zone
EOF

Wait for the Pod to reach Running:

kubectl -n edera-test1 get pod pod0 -w

Verify the allocation

Inspect the ResourceClaim the scheduler created:

kubectl -n edera-test1 get resourceclaim

Expected output:

NAME              STATE                AGE
pod0-zone-xxxxx   allocated,reserved   12s

The allocated,reserved state means the scheduler bound the claim to a specific device on a specific node.

Inspect which device on which node received the claim:

kubectl -n edera-test1 get resourceclaim -o jsonpath='{.items[*].status.allocation.devices.results[*]}{"\n"}'

You should see the device name, the driver (zone.edera.dev), and the node.

The node’s ResourceSlice now lists the claimed device alongside a newly published unclaimed device that represents the node’s remaining Zone capacity:

kubectl get resourceslice -o jsonpath='{range .items[*].spec.devices[*]}{.name}{"\t"}{.capacity.memory.value}{"\n"}{end}'

Expected output (two devices—the claimed one and the new unclaimed one):

device-xxxxx    20625540Ki
device-yyyyy    19576968Ki

This re-publication-on-claim behavior is the driver’s current model. See the technical overview for the background and the planned migration to DRA Consumable Capacity.

Clean up the demo workload

kubectl delete namespace edera-test1

Configuration

The chart defaults are tuned for a typical single-cluster install. The values most operators tune are below. For the full surface, run helm show values oci://ghcr.io/edera-dev/charts/edera-kube-agent.

ValueDefaultPurpose
image.repositoryghcr.io/edera-dev/edera-kube-agentContainer image. Override to pull from a mirrored registry.
image.tagChart appVersionImage tag. Override to pin a specific build.
image.pullPolicyIfNotPresentStandard Kubernetes pull policy.
imagePullSecrets[]Secrets for pulling from a private registry.
kubeletPlugin.nodeSelector{}Restrict the DaemonSet to Edera-runtime nodes on mixed-runtime clusters.
kubeletPlugin.tolerations[]Tolerate taints applied to Edera nodes.
kubeletPlugin.priorityClassNamesystem-node-criticalEnsures the driver is scheduled before tenant workloads.
kubeletPlugin.verbosity2klog verbosity level (0 to 10).
kubeletPlugin.minimumDeviceSize315MThe driver stops publishing unclaimed devices once free Zone capacity falls below this.
kubeletPlugin.containers.plugin.resources{}Standard requests / limits. Recommended in production.
kubeletPlugin.containers.plugin.healthcheckPort9440Port for /healthz and /readyz. Set to 0 to disable.
kubeletPlugin.containers.plugin.metricsPort8443Port for the metrics endpoint. Served over HTTPS with required authentication. Set to 0 to disable.
ℹ️
On mixed-runtime clusters, set kubeletPlugin.nodeSelector so the DaemonSet only lands on nodes that actually have the Edera runtime installed. Without it, the driver pod will enter CrashLoopBackOff on non-Edera nodes—it cannot reach the local protect-daemon socket—producing noisy alerts. There is no scheduling impact (the driver never gets far enough to publish a ResourceSlice), but the alert volume on a mixed fleet is a real operational cost.

Uninstall

Remove the chart:

helm uninstall edera-kube-agent --namespace edera-system
⚠️
Uninstall leaves the ResourceSlice and the edera-system namespace in place. Helm only removes resources it created—the ResourceSlice is owned by the Node rather than the chart, and Helm does not delete namespaces created with --create-namespace. Both must be cleaned up manually.

After uninstall, the driver pod and DeviceClass are gone, but a stale ResourceSlice for zone.edera.dev remains in the cluster. The scheduler still sees it as available capacity, but no driver is registered to handle prepare/unprepare calls—Pods that claim against it will hang at NodePrepareResources. Delete the slice explicitly.

The chart does not label its ResourceSlice objects, and ResourceSlice does not currently support field selectors. Filter on .spec.driver from the API output:

kubectl get resourceslice -o json \
  | jq -r '[.items[] | select(.spec.driver == "zone.edera.dev") | .metadata.name] | join(",")' \
  | xargs -r kubectl delete resourceslice

Once the slice is deleted, any Pods that previously claimed against zone.edera.dev will continue to hang—at the scheduling phase instead of NodePrepareResources—because no devices satisfy their ResourceClaim. Delete those Pods to clear them.

Finally, remove the namespace once you’ve confirmed no other Edera components share it:

kubectl delete namespace edera-system

Troubleshooting

Pod stays Pending with no scheduling errors

Check the ResourceClaim state—if it’s pending rather than allocated,reserved, the scheduler couldn’t find a matching device:

kubectl -n <namespace> get resourceclaim

Common causes:

  • The driver isn’t running on any node that satisfies the Pod’s other constraints. Confirm kubectl -n edera-system get pod -o wide shows a Running driver on the expected node.
  • The node already has all its Zone capacity claimed and free memory is below minimumDeviceSize. Confirm with kubectl get resourceslice -o yaml.
  • The Pod’s runtimeClassName is missing or wrong. The Pod needs runtimeClassName: edera.

Driver pod is CrashLoopBackOff

Check the driver logs:

kubectl -n edera-system logs -l app.kubernetes.io/component=kubeletplugin --tail=100

Common causes:

  • exec format error—the image architecture doesn’t match the node architecture. Verify with docker inspect <image> --format '{{.Architecture}}' and pull or build a matching arch.
  • Kubernetes version too old—the driver requires 1.33 or later. Verify with kubectl version.
  • API not enabled—confirm kubectl api-resources --api-group=resource.k8s.io lists resourceslices. If empty, DRA is disabled at the API server.
  • Can’t reach the protect-daemon socket—the driver landed on a node without the Edera runtime installed. Set kubeletPlugin.nodeSelector so the DaemonSet only targets Edera nodes.

kubectl get resourceslice returns nothing after install

Confirm the driver pod is Running and check its logs for Publishing initial resources. If the logs show a startup error, the driver hasn’t reached the point of publishing. See “Driver pod is CrashLoopBackOff” above.

NodePrepareResources errors in the kubelet log

The kubelet calls the driver over a local gRPC socket. If the kubelet log shows connection errors:

  • Confirm the driver pod is Running on the same node as the Pod that’s trying to start.
  • Confirm the priorityClassName: system-node-critical default is in place—the driver needs to be scheduled before tenant workloads that depend on it.

Additional notes

Tested with edera-kube-agent v1.0.0.

Last updated on