Velero → Commvault Kubernetes

Velero ↔ Commvault Kubernetes: integration to migration path.

Kubernetes backup is load-bearing — the day you need a namespace back is the worst day to find a gap, so we never flip a switch. Velero installs alongside Commvault Kubernetes Protection first, each namespace is backed up by both under dual-write, and only then does Velero become operational primary — tier by tier, each step reversible in minutes, with no forced re-snapshot and no flag day.

For most regulated estates the honest end state is two-tier, not a full Commvault retirement: Velero owns daily operations while Commvault is retained for cross-cluster DR orchestration and governance e-discovery. We say so up front, because the rest only matters if you can trust it.

Map my migration with an engineer →

The idea

Two snapshotters, one CSI driver, zero collisions.

The topology that makes this zero-downtime is parallel protection that never garbage-collects itself. Velero runs in every cluster with per-namespace Schedule CRs writing to a separate Object-Lock bucket, while Commvault's iDataAgent keeps calling the same CSI driver via its own Subclient. Each tool creates its own VolumeSnapshot objects under distinct VolumeSnapshotClass names pinned to Retain, so neither prunes the other. Commvault stays the system of record through dual-write; only after a production-scale restore drill passes per namespace does Velero become primary — and for regulated estates Commvault is retained, not retired.

The phases

Six steps. Each one reversible.

Baseline & inventory

We document every namespace-to-Plan mapping, RPO/RTO per namespace, the StorageClass-to-CSI-driver matrix, quiesce-hook needs per stateful workload, and which namespaces carry long-retention obligations. Read-only, via the CommServe REST API and kubectl.

Users see: No user impact.

Rollback: N/A

Stand up Velero per cluster

Velero is GitOps-installed in every protected cluster with the node-agent DaemonSet, pointing at a separate Object-Lock bucket — never the Commvault library. A canary namespace gets a Schedule, runs one Backup, and is restored end-to-end into a non-prod cluster. Commvault is untouched.

Users see: None for production users.

Rollback: helm uninstall velero — no namespace depends on it yet.

Point net-new namespaces at Velero

Every namespace created from here on ships a Velero Schedule via its Helm chart and an ArgoCD ApplicationSet, and is excluded from the Commvault Subclient by label. Existing namespaces are untouched.

Users see: None.

Rollback: Add the namespace back to the Commvault Subclient and delete the Velero Schedule. Under 15 minutes per namespace.

Migrate existing namespaces, one tier at a time

In waves from Tier 3 to Tier 0, each namespace runs dual-write — both tools back it up with Retain deletion policy on both VolumeSnapshotClasses — for at least two RPO cycles plus a restore drill from each side. We diff the restored object graphs before tightening Commvault's scope.

Users see: None — both systems are backups; a failure of one is invisible until restore.

Rollback: Reactivate the Commvault Subclient include rule; the Plan picks the namespace back up next run. Under 15 minutes.

Cut Velero to primary; Commvault secondary

Velero becomes operational primary. Commvault is either reduced to a monthly long-retention copy with WORM Lock for e-discovery (topology 1.3, the likely final state) or drained toward retirement (topology 1.4). Operational restores go to Velero; long-retention and e-discovery restores go to Commvault.

Users see: None.

Rollback: Re-enable the Commvault Plan at full cadence; Velero keeps running. Under 15 minutes.

Retire Commvault K8s (only if 1.4 chosen)

Only if no long-retention obligation remains: all K8s Subclients stop, the CommServe catalog is exported to immutable storage, a 30-day read-only evidence window opens, then the contract changes and MediaAgent capacity is reclaimed. Most regulated estates do not reach this phase — they stop at Phase 4.

Users see: None.

Rollback: Reactivate Subclients within the read-only evidence window. After it closes, rollback is out of scope.

Feature parity

Where Velero matches Commvault — and where it honestly does not.

Capability	Velero	Commvault Kubernetes	Parity
Kubernetes namespace / PV backup	Velero Backup / Schedule CRs to a BSL	Commvault K8s Plan + namespace Subclient (iDataAgent)	At parity
Kubernetes backup (CSI snapshots)	Velero CSI path (VolumeSnapshotLocation, DataUpload v2alpha1)	Commvault iDataAgent CSI Subclient	At parity
Cluster-config / object-graph capture	Velero captures CRDs, RBAC, ConfigMaps, Secrets via the K8s API	Commvault etcd + manifest capture	At parity
Deduplication	Kopia content-defined chunking (default since 1.10)	Commvault MediaAgent dedup	At parity
Encryption	Bucket-side SSE-KMS + Kopia repo encryption	Commvault copy encryption	At parity
Immutability (Object Lock / WORM)	BSL to S3 Object Lock Compliance, honored by Kopia retention	Commvault WORM Lock + Air Gap Protect	At parity
DR orchestration (cross-cluster)	None — custom Argo Workflows / runbook code	Commvault RecoveryAir DRaaS (boot order, IP/DNS remap)	SaaS only
Ransomware anomaly detection	None first-party — roll your own from Kopia content stats + Prometheus	Commvault Threatwise / Command Center anomaly	SaaS only
E-discovery / governance	Restore-then-search; no index	Commvault Activate full-text e-discovery + chain-of-custody	SaaS only
Vendor air-gap copy	Object Lock + cross-region + separate AWS root — your IAM is the boundary	Commvault Air Gap Protect (vendor-attested IAM boundary)	Partial
Retention / GFS	Per-Schedule ttl (a separate Schedule per tier)	Commvault Storage Policy GFS retention	At parity
Restore testing	velero restore; Kopia content verify; SRE-run drills	Commvault console restore verification	At parity
App-team self-service restore	kubectl create -f restore.yaml, RBAC-scoped per namespace	CommServe console role required	Partial
Deployment model	Self-hosted controller pods + node-agent DaemonSet + object store	CommServe + MediaAgent SaaS-managed estate	SaaS only
Cost model	Cluster compute + object storage	Per-protected-VM/TB SKU + MediaAgent capacity	Partial
Compliance (SEC 17a-4 / HIPAA / PCI v4)	Object Lock Compliance meets SEC 17a-4(f)	Commvault WORM + vendor-attested reports	Partial

What we're honest about

The gaps that keep Commvault in the picture.

No DR orchestration — RecoveryAir has no OSS equal

Velero Restore is one CR per cluster. Commvault RecoveryAir orchestrates cross-cluster failover: power-on order, network reconfig, IP and DNS remap, app-aware boot ordering, scriptable runbooks. Replacing it means custom Argo Workflows or Terraform you build, drill, and maintain — a separate workstream, not a checkbox. If RecoveryAir runs a live DR runbook, Commvault stays.

No anomaly detection or vendor e-discovery

Commvault Threatwise flags unusual encryption rates as a ransomware indicator; Commvault Activate gives full-text e-discovery with chain-of-custody across backup content. Velero offers neither first-party — the OSS path is roll-your-own from Kopia content stats and Prometheus, and restore-then-search for legal review. A genuine gap for security and legal teams.

Long-retention keeps Commvault for governance

For estates under SEC 17a-4, HIPAA, or PCI v4, the honest end state is two-tier (topology 1.3): Velero owns daily operations while Commvault retains the long-tail WORM copy and serves as auditor-facing system of record. Most regulated customers stop at Phase 4 and never retire Commvault — and we tell you which case you are in during inventory.

Self-hosting means you own the boundary and the uptime

Velero's air-gap is Object Lock Compliance plus cross-region plus a separate AWS account — your IAM is the boundary, not a vendor's attested one. Two snapshotters can also race on one CSI driver if misconfigured. We mitigate with pinned Retain deletion policies, SSE-KMS on the bucket, a pinned Velero version, and one aggregated dashboard — managed, not just installed.

Why this beats a flag day

Reversible per phase. Commvault stays until Velero earns it.

Every namespace phase rolls back in under 15 minutes — reactivate the Commvault Subclient include rule and the Plan picks the namespace back up at its next run, no rebuild required. Nothing decommissions Commvault until Velero has run as primary through a 30-day soak with at least two passing restore drills, one of them in the DR cluster, plus SRE and Compliance sign-off. A flag-day cutover gives you neither the soak nor the rollback; this gives you both.

See which namespaces move to Velero — and what stays on Commvault.

A 30-minute call with a senior Kubernetes backup engineer. We map your namespaces by tier and CSI coverage, scope any RecoveryAir DR replacement as its own workstream, and tell you honestly whether your compliance obligations let Commvault retire or keep it for governance — before you commit to anything.

Map my migration →