cert-manager + Smallstep → Venafi

cert-manager + Smallstep ↔ Venafi: integration to migration path.

cert-manager and Step-CA deploy alongside Venafi first — starting as a thin K8s-native client through the Venafi Issuer, then taking over in-cluster issuance once Step-CA is cross-signed and trusted. The cutover is a one-field issuer change per workload against a dual-trust chain, so there is no flag day and no forced re-credentialing.

The honest boundary, stated up front: the two stacks overlap only in the Kubernetes quadrant. Cross-CA inventory, ITSM approvals on the F5/Citrix/IIS/Java estate, and Outagedetection ML stay on Venafi. This is a partial migration scoped to the cluster tier — not Venafi retirement.

The idea

Take the cluster corner. Keep Venafi for the rest.

The topology that makes this zero-downtime is the cross-sign plus cert-manager's pluggable issuer model: Step-CA stands up HA with an HSM-backed key, its intermediate cross-signed by your existing internal root, so currently deployed workloads validate Step-CA leaves during the transition. Because changing the issuer is a single issuerRef field on a Certificate CRD, each workload flips with a one-line PR and reverts the same way. Venafi stays the enterprise inventory and workflow plane for everything outside the cluster at every phase; only the K8s issuance path moves.

The phases

Six steps. Each one reversible.

0

Baseline & inventory

Every cert is tagged by workload tier — K8s in-cluster, K8s ingress, F5, Citrix, IIS, Java keystore, network appliance — with its backend CA driver, Policy Folder or Zone, validity, SAN shape and owner team, cross-referenced against 90 days of Outagedetection events. Read-only.

Users see: No user impact.

Rollback: N/A

1

cert-manager via the Venafi Issuer

cert-manager installs in in-scope clusters with a Venafi Issuer (VEI) scoped to a dedicated cluster-tier Policy Folder. A canary namespace issues Certificate CRDs end to end, still backed by Venafi. Production is untouched and there is no spend reduction yet — this is a K8s-native stepping stone.

Users see: None — Venafi still issues every cert.

Rollback: Delete the issuer and canary CRDs. Under 15 minutes.

2

Move K8s TLS onto cert-manager + VEI

All in-cluster TLS — ingress, mesh sidecars, workload consumers — is declared as Certificate CRDs issued by VEI and renewed by the cert-manager reconciler. Manual VCert flows and one-off ticketed issuance are retired. trust-manager rolls the managed root out as ConfigMaps.

Users see: None — same SANs, same chain.

Rollback: Re-mount the previous Secret. Under 15 minutes.

3

Stand up Step-CA, cross-signed

Step-CA goes live HA — at least three replicas, external Postgres, HSM-backed signing key — with its intermediate cross-signed by your existing internal root and chained to a new offline Step-CA root. trust-manager publishes the union of both roots. No Certificate CRD references Step-CA yet.

Users see: None — a bigger trust store, no behaviour change.

Rollback: Pull the new root from the Bundle; decommission Step-CA. Under 30 minutes — no workload depends on it.

4

Cut canary workloads to step-issuer

Canary namespaces flip their Certificate issuerRef to StepIssuer, so issuance happens entirely in-cluster against Step-CA. The chain still validates through the cross-sign, validity drops to short-lived where supported, and per-cert Venafi cost on the canary falls to zero.

Users see: None — the chain validates via the cross-sign.

Rollback: Flip issuerRef back to the Venafi issuer; next renewal reverts. Under 15 minutes per namespace.

5

Wave the cluster tier; Venafi retained

All in-cluster CRDs move to Step-CA in waves (dev → staging → prod). Public-internet ingress stays on a publicly-trusted CA via cert-manager ACME — not Step-CA. Venafi Discovery keeps observing inventory, and the F5, Citrix, IIS, Java and network estate is untouched.

Users see: None — same SANs and usages, fresher leaves.

Rollback: Re-enable issuance on the Policy Folder; flip issuerRef back to VEI. Under 30 minutes per wave.

Feature parity

What moves, what stays on Venafi.

Capabilitycert-manager + SmallstepVenafiParity
ACME issuance Step-CA ACME provisioner + cert-manager acme Issuer (HTTP-01 / DNS-01 / tls-alpn-01, EAB) Venafi TPP/Cloud ACME via Issuing Template + EAB At parity
Private CA Step-CA online intermediate + offline root Venafi orchestrates downstream private CAs (ADCS, OpenSSL, AWS PCA) At parity
Publicly-trusted root None — Step-CA chain is private Venafi drives DigiCert/Sectigo/Entrust public roots under WebTrust SaaS only
Code signing Step-CA can mint signing EKU but no enterprise custody Venafi Code Sign Protect (workflow + HSM custody) SaaS only
Certificate inventory / discovery cert-manager sees its own Certificate state only Venafi Network / Onboard Discovery across F5/Citrix/IIS/Java/network gear SaaS only
HSM support Step-CA PKCS#11 (YubiHSM 2 / CloudHSM / Azure Managed HSM / GCP KMS-HSM) Venafi connects HSM-backed downstream CAs At parity
Short-lived certs Step-CA 24h to 7d certs, no CRL/OCSP; ACME ARI early-renew Issuing Template low validity, but per-cert cost discourages high churn Partial
cert-manager integration Native step-issuer external Issuer Venafi Enhanced Issuer (VEI) kind: Issuer At parity
RBAC K8s RBAC + Step-CA provisioner claims Venafi TPP roles (MRAO/RAO/Operator/Approver/Auditor) Partial
Workflow approvals Kyverno/Gatekeeper admission (policy, not workflow); PR review Venafi ITSM-integrated approvals (ServiceNow/Jira), SoD, evidence trail SaaS only
Anomaly / outage detection Prometheus metrics + alerts on instrumented certs Venafi Outagedetection cross-estate ML SaaS only
Deployment & HA Step-CA self-hosted HA (3+ replicas, external Postgres, HSM) Venafi vendor-operated SaaS / on-prem TPP Partial
Cost model Self-hosted compute + ops; zero per-cert license in-cluster Per-machine-identity licensing At parity
Compliance (WebTrust / SOC 2 / FIPS) You operate + attest; FIPS iff HSM validated; CP/CPS is yours Venafi vendor SOC 2 / ISO 27001 inherited as evidence SaaS only

What we're honest about

The caveats most vendors leave out.

This is not Venafi retirement

The two stacks overlap in exactly one quadrant: Kubernetes and cloud-native workloads. cert-manager plus Smallstep replaces the K8s corner cleanly, but it cannot replace cross-CA enterprise inventory, ITSM-driven approvals on F5, Citrix, IIS, Java and network gear, or Outagedetection ML. Any plan promising full Venafi removal breaks the audit. We state the cluster-tier scope in Phase 0 and at every steering review.

Public trust and code signing stay on Venafi

Step-CA's chain is private — public-internet ingress must keep routing through cert-manager ACME to Sectigo, DigiCert or Let's Encrypt, never Step-CA, enforced by Kyverno admission. Publicly-trusted code signing (Authenticode, Apple Developer ID, kernel-mode) requires HSM custody and OS-pinned roots, so it stays on Venafi Code Sign Protect. We make the public-vs-private split a hard admission rule.

You lose single-pane inventory and ITSM approvals in-cluster

cert-manager sees only its own Certificate state — there is no cross-CA Discovery layer, and Kyverno is an admission policy, not a ServiceNow-style approval workflow with segregation of duties. We run topology 1.3, feeding leaf metadata back to Venafi via WebSDK so the security org keeps visibility, and we get the auditor to sign off on cluster-tier zero-touch issuance before Phase 5.

You now own uptime, the HSM and the cross-sign clock

Once VEI is removed from the path, a Step-CA outage is a self-inflicted cluster cert outage with no managed-service backstop — so HA, an HSM activation runbook and cached-cert grace are mandatory. The cross-sign window must outlast the soak plus a 90-day buffer or consumers trusting only the old root fail; we track that expiry as a P0 item. Outagedetection's cross-estate ML has no OSS equivalent.

Why this beats a flag day

Reversible in minutes, retired only after a long soak.

No phase forces an outage. Each wave rolls back in under 30 minutes — flipping issuerRef back to the Venafi issuer reverts on the next renewal, and within the cross-sign window no chain change is needed at all — while canary namespaces revert in under 15 minutes. A wave only counts as migrated after at least 30 consecutive days at full cluster scale on Step-CA with renewal success at or above 99.9%. Venafi is never retired: only the cluster-tier per-cert cost reaches zero, while it keeps owning the legacy and cross-CA estate indefinitely.

See how much of the cluster tier migrates cleanly.

A call with a senior platform and PKI engineer. We classify your certs by tier, size your HSM and cross-sign window honestly, and tell you exactly how much of the K8s estate moves to cert-manager + Step-CA — and how much must stay on Venafi.

Map my migration →