Install the DRA Driver for Edera Zones
This guide installs the DRA Driver for Edera Zones, claims a Zone from a Pod, and verifies the allocation end-to-end. By the end you’ll have:
- The driver running as a DaemonSet on every Edera-runtime node.
- The
edera-zoneDeviceClassavailable cluster-wide. - A working Pod that consumes a Zone through a
ResourceClaimTemplate.
For the underlying design—what DRA is, how the driver represents Zone capacity, and why an extra device appears per claim—see DRA Driver for Edera Zones.
Prerequisites
Before starting:
- Kubernetes 1.33 or later. The chart is capability-aware and selects between
resource.k8s.io/v1(GA in 1.34),v1beta2(1.33), andv1beta1(1.32) based on what the cluster offers. TheResourceClaimTemplateexample below is provided in bothv1andv1beta1variants—pick the tab that matches your cluster. - Edera runtime installed on the target nodes. The driver does not install or manage the runtime. See Install Edera.
- The
ederaRuntimeClass. Verify withkubectl get runtimeclass edera. If missing, see Install Edera—apply the Edera RuntimeClass. - Helm 3.7 or later (for OCI registry support).
kubectl1.32 or later.
Verify the cluster has the DRA API available:
kubectl api-resources --api-group=resource.k8s.ioExpected output (abridged, on a 1.34 cluster):
NAME APIVERSION NAMESPACED KIND
deviceclasses resource.k8s.io/v1 false DeviceClass
resourceclaims resource.k8s.io/v1 true ResourceClaim
resourceclaimtemplates resource.k8s.io/v1 true ResourceClaimTemplate
resourceslices resource.k8s.io/v1 false ResourceSliceIf you see no rows, the cluster is older than 1.32 or DRA is disabled at the API server.
Install the chart
Install the chart from the Edera OCI registry into the edera-system namespace. The edera-system namespace is reserved for cluster-administrative Edera components—keep tenant workloads out of it.
helm install edera-kube-agent \
oci://ghcr.io/edera-dev/charts/edera-kube-agent \
--namespace edera-system \
--create-namespaceSupply chain verification. Released images are published with build provenance attestations. Verify before installing into a production cluster:
gh attestation verify oci://ghcr.io/edera-dev/edera-kube-agent:v1.0.0 \
--owner edera-devVerify the driver pod is running on every Edera node:
kubectl -n edera-system get pod -o wideExpected output:
NAME READY STATUS RESTARTS AGE
edera-kube-agent-xxxxx 1/1 Running 0 30sOne pod per node is expected—the driver is a DaemonSet. Verify the DeviceClass and a ResourceSlice per node:
kubectl get deviceclass edera-zone
kubectl get resourcesliceExpected output:
NAME AGE
edera-zone 45s
NAME NODE DRIVER POOL AGE
<node>-zone.edera.dev-<suffix> <node> zone.edera.dev <node> 45sIf kubectl get resourceslice returns No resources found, see Troubleshooting.
Claim a Zone from a Pod
The driver is installed. Now create a ResourceClaimTemplate and a Pod that consumes it.
Create the namespace and template
kubectl create namespace edera-test1Create a ResourceClaimTemplate that asks for one device from the edera-zone class. Use the tab that matches your cluster’s Kubernetes version:
kubectl -n edera-test1 apply -f - <<EOF
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
name: edera-zone
spec:
spec:
devices:
requests:
- name: zone
exactly:
deviceClassName: edera-zone
EOFkubectl -n edera-test1 apply -f - <<EOF
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaimTemplate
metadata:
name: edera-zone
spec:
spec:
devices:
requests:
- name: zone
deviceClassName: edera-zone
allocationMode: ExactCount
count: 1
EOFRun a Pod that consumes the template
kubectl -n edera-test1 apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: pod0
labels:
app: pod
spec:
runtimeClassName: edera
containers:
- name: ctr0
image: ubuntu:22.04
command: ["bash", "-c"]
args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"]
resources:
claims:
- name: zone
resourceClaims:
- name: zone
resourceClaimTemplateName: edera-zone
EOFWait for the Pod to reach Running:
kubectl -n edera-test1 get pod pod0 -wVerify the allocation
Inspect the ResourceClaim the scheduler created:
kubectl -n edera-test1 get resourceclaimExpected output:
NAME STATE AGE
pod0-zone-xxxxx allocated,reserved 12sThe allocated,reserved state means the scheduler bound the claim to a specific device on a specific node.
Inspect which device on which node received the claim:
kubectl -n edera-test1 get resourceclaim -o jsonpath='{.items[*].status.allocation.devices.results[*]}{"\n"}'You should see the device name, the driver (zone.edera.dev), and the node.
The node’s ResourceSlice now lists the claimed device alongside a newly published unclaimed device that represents the node’s remaining Zone capacity:
kubectl get resourceslice -o jsonpath='{range .items[*].spec.devices[*]}{.name}{"\t"}{.capacity.memory.value}{"\n"}{end}'Expected output (two devices—the claimed one and the new unclaimed one):
device-xxxxx 20625540Ki
device-yyyyy 19576968KiThis re-publication-on-claim behavior is the driver’s current model. See the technical overview for the background and the planned migration to DRA Consumable Capacity.
Clean up the demo workload
kubectl delete namespace edera-test1Configuration
The chart defaults are tuned for a typical single-cluster install. The values most operators tune are below. For the full surface, run helm show values oci://ghcr.io/edera-dev/charts/edera-kube-agent.
| Value | Default | Purpose |
|---|---|---|
image.repository | ghcr.io/edera-dev/edera-kube-agent | Container image. Override to pull from a mirrored registry. |
image.tag | Chart appVersion | Image tag. Override to pin a specific build. |
image.pullPolicy | IfNotPresent | Standard Kubernetes pull policy. |
imagePullSecrets | [] | Secrets for pulling from a private registry. |
kubeletPlugin.nodeSelector | {} | Restrict the DaemonSet to Edera-runtime nodes on mixed-runtime clusters. |
kubeletPlugin.tolerations | [] | Tolerate taints applied to Edera nodes. |
kubeletPlugin.priorityClassName | system-node-critical | Ensures the driver is scheduled before tenant workloads. |
kubeletPlugin.verbosity | 2 | klog verbosity level (0 to 10). |
kubeletPlugin.minimumDeviceSize | 315M | The driver stops publishing unclaimed devices once free Zone capacity falls below this. |
kubeletPlugin.containers.plugin.resources | {} | Standard requests / limits. Recommended in production. |
kubeletPlugin.containers.plugin.healthcheckPort | 9440 | Port for /healthz and /readyz. Set to 0 to disable. |
kubeletPlugin.containers.plugin.metricsPort | 8443 | Port for the metrics endpoint. Served over HTTPS with required authentication. Set to 0 to disable. |
kubeletPlugin.nodeSelector so the DaemonSet only lands on nodes that actually have the Edera runtime installed. Without it, the driver pod will enter CrashLoopBackOff on non-Edera nodes—it cannot reach the local protect-daemon socket—producing noisy alerts. There is no scheduling impact (the driver never gets far enough to publish a ResourceSlice), but the alert volume on a mixed fleet is a real operational cost.Uninstall
Remove the chart:
helm uninstall edera-kube-agent --namespace edera-systemResourceSlice and the edera-system namespace in place. Helm only removes resources it created—the ResourceSlice is owned by the Node rather than the chart, and Helm does not delete namespaces created with --create-namespace. Both must be cleaned up manually.After uninstall, the driver pod and DeviceClass are gone, but a stale ResourceSlice for zone.edera.dev remains in the cluster. The scheduler still sees it as available capacity, but no driver is registered to handle prepare/unprepare calls—Pods that claim against it will hang at NodePrepareResources. Delete the slice explicitly.
The chart does not label its ResourceSlice objects, and ResourceSlice does not currently support field selectors. Filter on .spec.driver from the API output:
kubectl get resourceslice -o json \
| jq -r '[.items[] | select(.spec.driver == "zone.edera.dev") | .metadata.name] | join(",")' \
| xargs -r kubectl delete resourcesliceOnce the slice is deleted, any Pods that previously claimed against zone.edera.dev will continue to hang—at the scheduling phase instead of NodePrepareResources—because no devices satisfy their ResourceClaim. Delete those Pods to clear them.
Finally, remove the namespace once you’ve confirmed no other Edera components share it:
kubectl delete namespace edera-systemTroubleshooting
Pod stays Pending with no scheduling errors
Check the ResourceClaim state—if it’s pending rather than allocated,reserved, the scheduler couldn’t find a matching device:
kubectl -n <namespace> get resourceclaimCommon causes:
- The driver isn’t running on any node that satisfies the Pod’s other constraints. Confirm
kubectl -n edera-system get pod -o wideshows aRunningdriver on the expected node. - The node already has all its Zone capacity claimed and free memory is below
minimumDeviceSize. Confirm withkubectl get resourceslice -o yaml. - The Pod’s
runtimeClassNameis missing or wrong. The Pod needsruntimeClassName: edera.
Driver pod is CrashLoopBackOff
Check the driver logs:
kubectl -n edera-system logs -l app.kubernetes.io/component=kubeletplugin --tail=100Common causes:
exec format error—the image architecture doesn’t match the node architecture. Verify withdocker inspect <image> --format '{{.Architecture}}'and pull or build a matching arch.- Kubernetes version too old—the driver requires 1.33 or later. Verify with
kubectl version. - API not enabled—confirm
kubectl api-resources --api-group=resource.k8s.iolistsresourceslices. If empty, DRA is disabled at the API server. - Can’t reach the
protect-daemonsocket—the driver landed on a node without the Edera runtime installed. SetkubeletPlugin.nodeSelectorso the DaemonSet only targets Edera nodes.
kubectl get resourceslice returns nothing after install
Confirm the driver pod is Running and check its logs for Publishing initial resources. If the logs show a startup error, the driver hasn’t reached the point of publishing. See “Driver pod is CrashLoopBackOff” above.
NodePrepareResources errors in the kubelet log
The kubelet calls the driver over a local gRPC socket. If the kubelet log shows connection errors:
- Confirm the driver pod is
Runningon the same node as the Pod that’s trying to start. - Confirm the
priorityClassName: system-node-criticaldefault is in place—the driver needs to be scheduled before tenant workloads that depend on it.
Additional notes
Tested with edera-kube-agent v1.0.0.