Velero → Commvault Kubernetes

Velero ↔ Commvault Kubernetes: integration to migration path.

Kubernetes backup is load-bearing — the day you need a namespace back is the worst day to find a gap, so we never flip a switch. Velero installs alongside Commvault Kubernetes Protection first, each namespace is backed up by both under dual-write, and only then does Velero become operational primary — tier by tier, each step reversible in minutes, with no forced re-snapshot and no flag day.

For most regulated estates the honest end state is two-tier, not a full Commvault retirement: Velero owns daily operations while Commvault is retained for cross-cluster DR orchestration and governance e-discovery. We say so up front, because the rest only matters if you can trust it.

The idea

Two snapshotters, one CSI driver, zero collisions.

The topology that makes this zero-downtime is parallel protection that never garbage-collects itself. Velero runs in every cluster with per-namespace Schedule CRs writing to a separate Object-Lock bucket, while Commvault's iDataAgent keeps calling the same CSI driver via its own Subclient. Each tool creates its own VolumeSnapshot objects under distinct VolumeSnapshotClass names pinned to Retain, so neither prunes the other. Commvault stays the system of record through dual-write; only after a production-scale restore drill passes per namespace does Velero become primary — and for regulated estates Commvault is retained, not retired.

The phases

Six steps. Each one reversible.

0

Baseline & inventory

We document every namespace-to-Plan mapping, RPO/RTO per namespace, the StorageClass-to-CSI-driver matrix, quiesce-hook needs per stateful workload, and which namespaces carry long-retention obligations. Read-only, via the CommServe REST API and kubectl.

Users see: No user impact.

Rollback: N/A

1

Stand up Velero per cluster

Velero is GitOps-installed in every protected cluster with the node-agent DaemonSet, pointing at a separate Object-Lock bucket — never the Commvault library. A canary namespace gets a Schedule, runs one Backup, and is restored end-to-end into a non-prod cluster. Commvault is untouched.

Users see: None for production users.

Rollback: helm uninstall velero — no namespace depends on it yet.

2

Point net-new namespaces at Velero

Every namespace created from here on ships a Velero Schedule via its Helm chart and an ArgoCD ApplicationSet, and is excluded from the Commvault Subclient by label. Existing namespaces are untouched.

Users see: None.

Rollback: Add the namespace back to the Commvault Subclient and delete the Velero Schedule. Under 15 minutes per namespace.

3

Migrate existing namespaces, one tier at a time

In waves from Tier 3 to Tier 0, each namespace runs dual-write — both tools back it up with Retain deletion policy on both VolumeSnapshotClasses — for at least two RPO cycles plus a restore drill from each side. We diff the restored object graphs before tightening Commvault's scope.

Users see: None — both systems are backups; a failure of one is invisible until restore.

Rollback: Reactivate the Commvault Subclient include rule; the Plan picks the namespace back up next run. Under 15 minutes.

4

Cut Velero to primary; Commvault secondary

Velero becomes operational primary. Commvault is either reduced to a monthly long-retention copy with WORM Lock for e-discovery (topology 1.3, the likely final state) or drained toward retirement (topology 1.4). Operational restores go to Velero; long-retention and e-discovery restores go to Commvault.

Users see: None.

Rollback: Re-enable the Commvault Plan at full cadence; Velero keeps running. Under 15 minutes.

5

Retire Commvault K8s (only if 1.4 chosen)

Only if no long-retention obligation remains: all K8s Subclients stop, the CommServe catalog is exported to immutable storage, a 30-day read-only evidence window opens, then the contract changes and MediaAgent capacity is reclaimed. Most regulated estates do not reach this phase — they stop at Phase 4.

Users see: None.

Rollback: Reactivate Subclients within the read-only evidence window. After it closes, rollback is out of scope.

Feature parity

Where Velero matches Commvault — and where it honestly does not.

CapabilityVeleroCommvault KubernetesParity
Kubernetes namespace / PV backup Velero Backup / Schedule CRs to a BSL Commvault K8s Plan + namespace Subclient (iDataAgent) At parity
Kubernetes backup (CSI snapshots) Velero CSI path (VolumeSnapshotLocation, DataUpload v2alpha1) Commvault iDataAgent CSI Subclient At parity
Cluster-config / object-graph capture Velero captures CRDs, RBAC, ConfigMaps, Secrets via the K8s API Commvault etcd + manifest capture At parity
Deduplication Kopia content-defined chunking (default since 1.10) Commvault MediaAgent dedup At parity
Encryption Bucket-side SSE-KMS + Kopia repo encryption Commvault copy encryption At parity
Immutability (Object Lock / WORM) BSL to S3 Object Lock Compliance, honored by Kopia retention Commvault WORM Lock + Air Gap Protect At parity
DR orchestration (cross-cluster) None — custom Argo Workflows / runbook code Commvault RecoveryAir DRaaS (boot order, IP/DNS remap) SaaS only
Ransomware anomaly detection None first-party — roll your own from Kopia content stats + Prometheus Commvault Threatwise / Command Center anomaly SaaS only
E-discovery / governance Restore-then-search; no index Commvault Activate full-text e-discovery + chain-of-custody SaaS only
Vendor air-gap copy Object Lock + cross-region + separate AWS root — your IAM is the boundary Commvault Air Gap Protect (vendor-attested IAM boundary) Partial
Retention / GFS Per-Schedule ttl (a separate Schedule per tier) Commvault Storage Policy GFS retention At parity
Restore testing velero restore; Kopia content verify; SRE-run drills Commvault console restore verification At parity
App-team self-service restore kubectl create -f restore.yaml, RBAC-scoped per namespace CommServe console role required Partial
Deployment model Self-hosted controller pods + node-agent DaemonSet + object store CommServe + MediaAgent SaaS-managed estate SaaS only
Cost model Cluster compute + object storage Per-protected-VM/TB SKU + MediaAgent capacity Partial
Compliance (SEC 17a-4 / HIPAA / PCI v4) Object Lock Compliance meets SEC 17a-4(f) Commvault WORM + vendor-attested reports Partial

What we're honest about

The gaps that keep Commvault in the picture.

No DR orchestration — RecoveryAir has no OSS equal

Velero Restore is one CR per cluster. Commvault RecoveryAir orchestrates cross-cluster failover: power-on order, network reconfig, IP and DNS remap, app-aware boot ordering, scriptable runbooks. Replacing it means custom Argo Workflows or Terraform you build, drill, and maintain — a separate workstream, not a checkbox. If RecoveryAir runs a live DR runbook, Commvault stays.

No anomaly detection or vendor e-discovery

Commvault Threatwise flags unusual encryption rates as a ransomware indicator; Commvault Activate gives full-text e-discovery with chain-of-custody across backup content. Velero offers neither first-party — the OSS path is roll-your-own from Kopia content stats and Prometheus, and restore-then-search for legal review. A genuine gap for security and legal teams.

Long-retention keeps Commvault for governance

For estates under SEC 17a-4, HIPAA, or PCI v4, the honest end state is two-tier (topology 1.3): Velero owns daily operations while Commvault retains the long-tail WORM copy and serves as auditor-facing system of record. Most regulated customers stop at Phase 4 and never retire Commvault — and we tell you which case you are in during inventory.

Self-hosting means you own the boundary and the uptime

Velero's air-gap is Object Lock Compliance plus cross-region plus a separate AWS account — your IAM is the boundary, not a vendor's attested one. Two snapshotters can also race on one CSI driver if misconfigured. We mitigate with pinned Retain deletion policies, SSE-KMS on the bucket, a pinned Velero version, and one aggregated dashboard — managed, not just installed.

Why this beats a flag day

Reversible per phase. Commvault stays until Velero earns it.

Every namespace phase rolls back in under 15 minutes — reactivate the Commvault Subclient include rule and the Plan picks the namespace back up at its next run, no rebuild required. Nothing decommissions Commvault until Velero has run as primary through a 30-day soak with at least two passing restore drills, one of them in the DR cluster, plus SRE and Compliance sign-off. A flag-day cutover gives you neither the soak nor the rollback; this gives you both.

See which namespaces move to Velero — and what stays on Commvault.

A 30-minute call with a senior Kubernetes backup engineer. We map your namespaces by tier and CSI coverage, scope any RecoveryAir DR replacement as its own workstream, and tell you honestly whether your compliance obligations let Commvault retire or keep it for governance — before you commit to anything.

Map my migration →