Loki + Grafana → Datadog Logs
Loki + Grafana ↔ Datadog Logs: integration to migration path.
Logging is load-bearing for every incident, so we never flip a switch. Loki + Grafana deploys alongside Datadog first — one OpenTelemetry Collector tees the same log line to both backends — and only then does Loki take over indexed search in phases. No flag day, no re-credentialing, and every phase rolls back in minutes.
Datadog stays the indexed-search source of record until Phase 4. The honest exceptions — Watchdog, Sensitive Data Scanner, Cloud SIEM — we name up front, because the rest only matters if you can trust it.
The idea
Tee one Collector to both backends first.
The trick that makes this zero-outage: an OpenTelemetry Collector gateway sits between your apps and your backends, and its logs pipeline fans out to both the Datadog exporter and the Loki exporter from a single source. Apps send OTLP once; the same parsed, redacted line lands in both Datadog and Loki. That single switchable point of control lets Loki parallel-run at full fidelity while Datadog still owns indexed search — so you move dashboards, monitors and the source of record one layer at a time, each reversible, never betting an outage on a cutover.
The phases
Seven steps. Each one reversible.
Baseline & inventory
We map every Datadog log source by intake path, GB/day and EPS, plus the Pipelines, indexed facets, monitors, dashboards and retention tiers in play. Read-only — nothing is touched, and we pull a 90-day volume sample by service and env for Loki sizing.
Loki goes live; tee a canary
Loki and Grafana stand up in your cloud on S3-backed storage, and an OpenTelemetry Collector gateway tees one canary service to both Datadog and Loki. Datadog stays the source of record.
Dual-export every service
Every service ships to both backends through the Collector, or via a Promtail/Alloy sidecar alongside the Datadog Agent on legacy hosts. Loki parallel-runs at full fidelity while Datadog remains authoritative.
Rebuild dashboards, monitors, pipelines on OSS
Every Datadog log dashboard gets a Grafana twin in git, every log-alert monitor a Loki ruler or Grafana unified-alert equivalent, and every Pipeline processor is reimplemented as OTTL or VRL upstream of the fan-out. Additive only.
Switch source of record to Loki
Loki becomes the indexed-search source of record and on-call defaults to Grafana. Datadog indexed retention is cut, moved to Flex, or reduced to a filtered sample through the Collector.
Retire Datadog-specific features (or accept the gap)
For each Datadog-only feature — Watchdog, Log Patterns, Sensitive Data Scanner, Cloud SIEM — we replace it OSS-side, swap a narrower SaaS, or record it as an accepted gap with a compensating control. The Datadog sample shrinks toward zero.
Retire Datadog Logs
The Datadog log exporter comes out of the Collector, indexes are set to size zero, and the Logs SKU is dropped at the next renewal. Archives are retained per your compliance hold in your own Object Lock buckets.
Feature parity
What moves cleanly, and what doesn't.
| Capability | Loki + Grafana | Datadog Logs | Parity |
|---|---|---|---|
| Log signal coverage | Loki distributor /loki/api/v1/push ingest | Datadog Log intake http-intake.logs.<site>/api/v2/logs | At parity |
| Collection / agent | Promtail / Grafana Alloy / OTel Collector loki exporter | Datadog Agent logs: block / Lambda Forwarder | At parity |
| Query language | Loki LogQL (| json, count_over_time, rate) | Datadog log search syntax (faceted, intake-indexed) | At parity |
| Dashboards-as-code | Grafana dashboard JSON via Terraform grafana_dashboard / Grizzly jsonnet | Datadog datadog_dashboard TF + datadog-sync-cli | At parity |
| Alerting | Loki ruler recording/alerting rules + Grafana unified alerting | Datadog log-alert Monitors | At parity |
| Live tail | Grafana Explore Live tailing / /loki/api/v1/tail WebSocket | Datadog Live Tail | At parity |
| RBAC + multi-tenancy | X-Scope-OrgID header + per-tenant limits_config overrides | Datadog orgs (separate accounts), parent-child views | At parity |
| Retention / storage tiers | S3 lifecycle Standard→IA→Glacier IR via compactor | Datadog Flex Logs + Archives (S3-backed) | Partial |
| Log-to-metric | Loki ruler recording rules → Prometheus remote-write | Datadog Generate Metrics from Logs (billable custom metric) | At parity |
| Anomaly / AIOps | None (no Watchdog equivalent; per-signal only) | Datadog Watchdog (zero-config cross-signal) | SaaS only |
| Pattern clustering | Grafana 11 log patterns / LogQL | pattern extraction | Datadog Log Patterns (auto-clustering) | Partial |
| Intake PII redaction | None first-party (redact in Collector/Vector tier) | Datadog Sensitive Data Scanner (intake-side) | SaaS only |
| SIEM detection content | None (Wazuh/OpenSearch is a separate workstream) | Datadog Cloud SIEM (Sigma rules, ATT&CK) | SaaS only |
| Compliance attestations | Self-hosted; controls in your SSP (Object Lock, KMS) | Datadog SOC 2 / ISO / FedRAMP boundary | Partial |
What we're honest about
The caveats most vendors leave out.
Watchdog AIOps has no OSS parity
Datadog Watchdog runs zero-config anomaly detection across logs, metrics and APM at whole-tenant scope. There is no general-purpose open-source replacement — per-signal anomaly via predict_linear() or isolation-forest is possible but never zero-config. Plan to accept the gap or replace it with a narrower-scope SaaS, and we will tell you which up front.
Intake-time PII scanning moves upstream
Loki has no first-party Sensitive Data Scanner equivalent. Both backends assume you redact upstream, so we move PII redaction into the Collector or Vector tier — Presidio is the closest parity for managed rule packs. It is a real workstream, not a checkbox, and we scope it before Phase 2.
Pattern clustering is a functional gap
Datadog Log Patterns auto-clusters semantically similar lines. Grafana 11's log patterns is closer but not equivalent, and LogQL's pattern extraction is a manual primitive, not auto-clustering. We treat this as an honest gap rather than pretend the UX is identical.
Self-hosting means you own the math
Loki forces the cardinality conversation upfront, you own retention tiering on S3, and Cloud SIEM detection content is a separate Wazuh/OpenSearch workstream — not a logging-migration concern. At more than 2TB/day indexed the TCO typically wins; below that, Datadog often does. We size it honestly against your contracted rate.
Why this beats a flag day
Reversible at every step.
Every phase up to retirement is a Collector or agent config flip with an under-15-minute rollback window while the parallel pipeline is live — revert a pod spec, drop a filter, switch a pager destination back. We never cancel the Datadog Logs contract until Loki has held source-of-record on-call for a minimum 30-day green soak. The soak gate is the point: you only burn the bridge once the new path has proven itself in production.
See whether your log estate migrates cleanly.
A 30-minute call with a senior observability engineer. We map your Datadog Pipelines, facets and monitors, size Loki against your real volume, and tell you honestly which Datadog-only features have no OSS parity — before you commit to anything.
Map my migration →