OpenTelemetry Collector → Datadog Agent
OpenTelemetry Collector ↔ Datadog Agent: integration to migration path.
Your collection tier is load-bearing, so we never flip a switch. The OpenTelemetry Collector deploys alongside the Datadog Agent first — both run, with no pipeline overlap — and only then does the Collector take over collection per cluster while Datadog stays the backend. No app-code redeploy, no flag day, and every phase rolls back in minutes.
This is an agent-tier swap, not a backend swap: the Collector ships straight into Datadog via its exporter, so the Datadog backend, dashboards and monitors stay intact. The honest gaps — Live Processes, NPM, CWS, DBM — we name up front.
The idea
Swap the agent, keep the backend.
The framing that makes this zero-outage: the Collector and the Datadog Agent are both collection-tier processes, and replacing one with the other does not require leaving Datadog. The Collector's Datadog exporter performs the OTel-resource to Datadog-tag translation in-process and ships directly into Datadog intake, so storage, indexing, dashboards and monitors are untouched. The Collector adds upstream OTTL transform, tail-based sampling and one config language across the fleet — and because backend migration is a separable later workstream, the same Collector fleet can later fan out to an OSS backend with a one-line exporter change. Each cluster cuts over independently, each reversible.
The phases
Seven steps. Each one reversible.
Baseline & inventory
We document every host's Datadog Agent version and enabled features, the custom conf.d integrations, APM tracer language per service, and log/metric/trace volume. Read-only — and we explicitly classify each SaaS-only subagent as keep or abandon.
Stand up Collector in canary
An OpenTelemetry Collector deploys as a daemonset plus gateway pair in one canary cluster with a single Datadog exporter, wired to a subset of receivers. The Collector and Datadog Agent both run, with no pipeline overlap.
Collector takes over net-new signals
Every net-new app and cluster onboards Collector-first via OTLP, and the Datadog Agent on those nodes runs thin — logs and APM disabled, retaining only the SaaS-only subagents from Phase 0. Old workloads are untouched.
Migrate existing sources, per cluster
Cluster by cluster in low-to-high blast-radius waves, the Collector takes over logs, metrics and node signals while the equivalent Agent subsystem is disabled one at a time with a 24h soak between each. We validate facets, monitors and volume at Datadog.
Dual-export proving, OSS in parallel
The Collector exports to both Datadog and an OSS backend, with every active dashboard and monitor reproduced OSS-side against the same stream. Tail-sampling and batch both run before the fan-out. Where parity is impossible — Watchdog, Log Patterns, DBM, NPM — we declare the gap.
Cut Datadog intake
The Collector exports OSS-only and the Datadog exporter comes out of the pipelines. The thin Agent continues shipping only the retained SaaS-only signals. Datadog dashboards and monitors quiesce on cutover services.
Datadog Agent retirement (or thin-Agent permanence)
Either full retirement — no Agent, with SaaS-only features replaced by OSS workstreams or abandoned — or thin-Agent permanence, where the Agent stays only for retained subagents and the contract is reduced to those SKUs.
Feature parity
What moves cleanly, and what doesn't.
| Capability | OpenTelemetry Collector | Datadog Agent | Parity |
|---|---|---|---|
| Logs/metrics/traces receive | receivers: otlp, prometheus, filelog, hostmetrics, kubeletstats (≥80 contrib) | Autodiscovery-driven integration checks (≥600 integrations) | At parity |
| Integration breadth | prometheus + filelog + dedicated receivers; BYO Prom exporter for long-tail | ≥600 prebuilt DD integration checks + dashboards | SaaS only |
| Upstream transform | transform processor OTTL (replace_pattern, delete_key) | logs_config.processing_rules (mask only, log-only) | At parity |
| Tail-based sampling | tail_sampling processor (gateway tier, by outcome) | Head/ingest-control only (apm_config.error_sample_rate) | OSS only |
| k8s enrichment | k8sattributes processor (API-server pod/ns/node) | Autodiscovery ad_identifiers annotations | At parity |
| Exemplars / semconv | OTel semconv + histogram exemplar propagation (128-bit) | Unified Service Tagging (service/env/version) | At parity |
| Self-telemetry | localhost:8888/metrics (otelcol_exporter_*) scrapable anywhere | datadog.agent.* into DD (vendor-locked) | At parity |
| Config-as-code | Single YAML, offline otelcol validate, OTel Builder distros | datadog.yaml + conf.d/*.yaml (needs DD account) | Partial |
| Live Processes / Containers | None (no processreceiver parity) | DD process/container subagents | SaaS only |
| Runtime security / flow | None (Falco/Tetragon/Hubble separate workstreams) | DD CWS (eBPF), CSPM, NPM, USM subagents | SaaS only |
| Database monitoring | postgresqlreceiver / mysqlreceiver (metrics only) | DD DBM dbm-agent (per-query plan capture) | Partial |
| Backend portability | Exporter swap [datadog]→[otlphttp/signoz], one line | DD-Agent proprietary protobuf → DD only | OSS only |
| Compliance boundary | Self-managed fleet in your SSP (patch/TLS/key custody) | DD-managed vendor SOC 2 boundary | Partial |
What we're honest about
The caveats most vendors leave out.
SaaS-only subagents cannot be replicated
Live Processes, Live Containers, NPM, USM, CWS, CSPM and DBM ship proprietary protobuf via Datadog Agent subprocesses with no Collector exporter parity. You either keep a thin Datadog Agent permanently for those, or replace them OSS-side with Falco, Tetragon, Cilium Hubble or pganalyze — and we make that call explicit in Phase 0, never by accident.
Integration breadth is a real long tail
Datadog's 600-plus prebuilt integration checks have no one-to-one Collector equivalent. The Collector relies on the prometheus and filelog receivers plus a smaller dedicated set; long-tail services like Cassandra JMX or Couchbase need a bring-your-own Prometheus exporter and lose the prebuilt Datadog dashboard. We map each enabled integration to a receiver, a Prom exporter, or 'abandoned.'
Database Monitoring drops to metrics-only
Datadog DBM's per-query plan capture against pg_stat_activity and SQL Server DMVs has no Collector parity — the postgresql and mysql receivers collect metrics only. If DBM is doing real work, that is a thin-Agent retain or a pg_stat_statements-plus-Grafana workstream, not a clean swap.
The Collector fleet is now your control
A managed-vendor SOC 2 boundary becomes your responsibility: patching, TLS, key rotation and a reproducible build all move into your SSP. That is exactly why we run it — gateway HA with anti-affinity, a file_storage durability queue, mTLS on the gateway tier, and a tested DR runbook. Managed, not just installed.
Why this beats a flag day
Reversible at every step.
Every phase is a Helm value or Collector YAML change with an under-15-minute rollback while the Datadog Agent stays installed as the break-glass collection-of-record — re-enable a disabled subsystem and restart, or re-add an exporter and dashboards reanimate on next intake. We never uninstall the Agent or drop Datadog SKUs until the OSS backend has proven parity through a minimum 30-day green soak and on-call coverage on real incidents. The soak gate is the point: the new fleet earns its keep before any bridge is burned.
See whether your agent tier swaps cleanly.
A 30-minute call with a senior observability engineer. We inventory your Datadog Agent features, classify which SaaS-only subagents — NPM, CWS, DBM, Live Processes — are in real use, and tell you honestly which need a thin Agent or an OSS workstream. Before you commit.
Map my migration →