Graylog → Splunk

Graylog ↔ Splunk: integration to migration path.

Splunk is your authoritative log surface, so we never cut over on a single day. Graylog stands up alongside Splunk first, one high-volume source class is dual-shipped to prove parity, and only then does each source class move across in waves — turning verbose, low-signal data from a metered cost into a free tier you control. No flag day, no detection blackout, and every wave rolls back in minutes.

The honest exception: the SPL-heavy, high-value tail — firewall, EDR, critical auth, anything ITSI touches — may stay on a smaller Splunk by design. We decide that per source class with the evidence in front of us.

Map my migration with an engineer →

The idea

Dual-ship to prove parity. Move source classes last.

The topology that makes this zero-blackout is dual-shipping at the agent or collector: every source tees to both SIEMs at once, so Graylog is evaluated against exactly the same data Splunk sees without disturbing a single analyst. The shipper carries a shared event ID for cross-tool dedup, per-sink buffers stop one outage starving the other, and Splunk keeps owning every dashboard, scheduled search and premium app unchanged. Only once a source class is parity-proven and its detections are trusted do we cut its Splunk leg, move it to Graylog, then iterate — each class independently, each reversible.

The phases

Seven steps. Each one reversible.

Baseline & inventory

We document what each Splunk surface actually depends on — sourcetype, GB per day, parse rules, every scheduled search, dashboard, alert, lookup and accelerated data model — cross-referenced against real usage so dead content is flagged. Read-only.

Users see: No user impact.

Rollback: N/A

Stand up Graylog; mirror a low-value source

A Graylog HA cluster (OpenSearch plus a MongoDB replica set) goes live, and one high-volume, low-detection-value source class — web access or DNS — is dual-shipped so we can validate parse correctness against Splunk. Splunk is untouched.

Users see: None.

Rollback: Disable the shipper tee. Under 15 minutes if config is in config-management.

Rebuild detections in shadow mode

Every saved search and dashboard for that source class gets a Graylog equivalent running in shadow — both fire, Graylog alerts to a non-prod channel while Splunk stays primary. Simple filters port 1:1; complex SPL is flagged for human rewrite.

Users see: None.

Rollback: Delete the Event Definitions and dashboards. Under 15 minutes.

Promote Graylog for the mirrored source

Once detections are trusted, the Splunk leg for that source class is cut at the shipper and Graylog becomes the sole search surface. A 30-day read-only window stays open in Splunk for fallback.

Users see: Analysts use Graylog (or Grafana) for the migrated class. Communicated at least 14 days ahead.

Rollback: Re-enable the shipper's Splunk output. Under 15 minutes; Splunk historical search continues against the 30-day window.

Iterate per source class, in waves

Phases 1 to 3 repeat per source class: low-value high-volume first (DNS, web, S3, CDN), medium-value next, high-value last. Wave N+1 can begin while wave N finishes, inside a deliberate rollback budget.

Users see: Incremental; one analyst-comms event per migrated class.

Rollback: Per-wave, per-source-class. Under 15 minutes per cutover if configs are versioned.

Address the high-value tail

Firewall, EDR and critical-auth sources get a content-feasibility audit. Each class is either migrated — accepting reduced sophistication for SPL-heavy patterns — or honestly kept on a smaller Splunk. We do not fake-port to keep a clean narrative.

Users see: Controlled migration with retraining, or a permanent two-tool reality ops must staff for.

Rollback: Per-source-class. Choosing not to migrate is an explicit decision, not a rollback.

Retire Splunk or settle the partition

Full retirement: Splunk runs read-only for a 30 to 90 day evidence window, evidence exports to Object-Locked S3, then licences lapse and indexers decommission. Or the partition becomes permanent, documented, with audit scope updated.

Users see: Full retirement: SPL muscle memory ends. Partition: a documented two-tool support model.

Rollback: Only within the evidence window; after contract termination it is out of scope.

Feature parity

Where Graylog matches Splunk, and where it doesn't.

Capability	Graylog	Splunk	Parity
Structured log ingest	GELF (UDP/TCP/HTTP) plus Beats, Syslog and Raw inputs	Splunk HEC plus Universal Forwarder (S2S 9997)	At parity
Agent / data collection	Graylog Sidecar managing Filebeat / Winlogbeat / NXLog	Splunk Universal Forwarder / Heavy Forwarder	At parity
Schema-on-read parsing	Extractors plus Pipeline Rules (UI-edited, no re-index)	props.conf / transforms.conf plus Ingest Actions / Edge Processor	Partial
Search / query language	Lucene field-aware operators plus UI aggregations	SPL (tstats, transaction, streamstats, DM acceleration)	SaaS only
Alerting / notification	Alerts & Events (Aggregation, Filter & Aggregation, Correlation)	Scheduled searches plus Notable Events / adaptive response	Partial
Sigma rule ingestion	Graylog Sigma importer to Event Definitions	sigma-cli to SPL outside the platform	OSS only
Lookup / enrichment	Data Adapters (HTTP JSONPath, CSV, DNS, Threat Intel) plus Lookup Tables	lookup / KV Store	At parity
RBAC plus multi-tenancy	Streams plus Index Sets plus Roles plus Teams (Teams is Enterprise)	Roles plus srchIndexesAllowed per-index / eventtype RBAC	Partial
Retention / archive	Index Sets per-stream rotation plus Archive (or DIY S3 lifecycle)	Per-index retention plus SmartStore plus frozen-to-S3 Object Lock	At parity
Deployment model & HA	Self-hosted Graylog plus OpenSearch plus MongoDB replica set	Splunk Cloud managed scaling or self-hosted indexer cluster	Partial
Cost model	No ingest licence (AGPLv3); compute and storage only	Per-GB ingest (Enterprise) or workload-based SVC (Cloud)	OSS only
App / content catalogue	Illuminate	Splunkbase (~1500 apps, premium TAs, ITSI)	SaaS only

What we're honest about

The caveats most vendors leave out.

Graylog is log management, not a SIEM peer

This path is Graylog Open against Splunk as a log-aggregation and search platform — not against Splunk ES. Graylog has growing detection features but no RBA, ESCU or correlation-search equivalent. If ES is in play, this is the wrong path and we will say so.

SPL does not fully translate

Fifteen years of tstats, transaction, streamstats, eventstats, sub-searches and macros sit at roughly 50 to 65% auto-portable, 25% manually portable, and 10 to 20% not portable at all. The complex queries that catch the real things are the ones that resist. We classify every search and never promise full portability.

Splunkbase breadth has no Graylog catalogue

Around 1500 Splunkbase apps — premium TAs with field extraction and CIM mapping for hundreds of vendor products — have no Graylog peer; Illuminate covers a fraction. Long-tail vendors mean hand-rolled Pipeline Rules, and ITSI-served sources stay on Splunk entirely.

Self-hosting moves uptime and ops onto you

A Splunk Cloud outage is a vendor problem with an SLA; a Graylog outage is yours at 2 a.m., and losing MongoDB loses your config. We run HA Graylog plus OpenSearch plus a MongoDB replica set with daily dumps, a tested restore, and a break-glass revert to Splunk-only — managed, not just installed.

Why this beats a flag day

Reversible per wave, soaked before you cancel.

Every source-class cutover rolls back in under 15 minutes by re-enabling the shipper's Splunk output, so a bad wave is a single config reload, not an incident. And we never cancel the Splunk contract on a hunch: each migrated class soaks at least 30 consecutive days as Graylog's sole search surface with no detection regression and ticket volume trending to zero before we move on, and Splunk holds a read-only evidence window before any licence lapses. You are never betting the SOC on one big cutover.

See whether your Splunk content migrates cleanly.

A call with a senior detection engineer. We inventory your sources by ingest cost and your searches by SPL construct, separate the auto-portable content from the high-value tail that stays on Splunk, and tell you honestly whether this is full retirement or cost reduction.

Map my migration →