DataStream Stats
Synopsis
Polls the local DataStream stats database for completed partitions and republishes route, target, device, pipeline, device-resource, and queue metrics through standard pipelines and routes. Output field naming is selectable between ECS, Cribl, and Prometheus conventions.
Schema
- id: <numeric>
name: <string>
description: <string>
type: stats
tags: <string[]>
pipelines: <pipeline[]>
status: <boolean>
properties:
poll_interval: <numeric>
workers: <numeric>
field_format: <string>
Configuration
The following fields are used to define the device:
Device
| Field | Required | Default | Description |
|---|---|---|---|
id | Y | Unique numeric identifier | |
name | Y | Device name | |
description | N | - | Optional description |
type | Y | Must be stats | |
tags | N | - | Array of labels for categorization |
pipelines | N | - | Array of preprocessing pipeline references |
status | N | true | Enable/disable the device |
Polling
| Field | Required | Default | Description |
|---|---|---|---|
poll_interval | N | 60 | How often to scan for completed partitions, in seconds. Must be greater than 0. |
workers | N | 1 | Number of worker goroutines that drain the partition queue |
Output
| Field | Required | Default | Description |
|---|---|---|---|
field_format | N | cribl | Output field naming convention. One of ecs, cribl, prometheus. |
Details
What Gets Emitted
Each polling cycle emits records in one of six input_type categories:
- route — per-route counters including events and bytes in/out, latency, errors, and drops
- target — per-target counters with the same shape as route records, plus a target identifier
- device — per-device counters tracking events and bytes processed by each device
- pipeline — per-pipeline
execution_countandexecution_timeaggregates - device_resource — host resource usage with
resource_typeofvolume,cpu, ormemory, plus identifier and total/utilized sizes - queue — queue depth and discard counters including
pending_files,pending_bytes,discarded_files, anddiscarded_bytes
Output Formats
Three field naming conventions are supported:
- ecs — flat snake_case internal representation using ECS-aligned field names such as
route_name,target_name,latency_ns, anderror_count - cribl — Cribl Stream internal-metrics naming with
cribl_*system fields (cribl_route,cribl_output,cribl_host) and underscore-prefixed custom fields (_time,_type,_events_in,_error_count) - prometheus — Prometheus base-unit naming with
_total,_nanoseconds, and_bytessuffixes (events_in_total,latency_nanoseconds,bytes_in_total)
Polling Cadence
poll_interval must be greater than 0; the device rejects any configuration where this constraint is violated. Completed partitions are scanned at each interval and enqueued for processing by workers goroutines.
Lifecycle
The device has no external network connectivity or remote heartbeat. On start, it validates that the stats database directory exists. If validation succeeds, the device reports ConnectionStateConnected. If the directory is absent or poll_interval is invalid, it reports ConnectionStateErrorConfiguration. If 120 seconds pass without an internal heartbeat, the device reports ConnectionStateErrorHeartbeat and stops the collector.
Restart Triggers
Changes to poll_interval, workers, or field_format trigger a clean collector restart. Changes to other configuration fields do not restart the collector.
Examples
The following are commonly used configuration types.
Basic Configuration
Creating a basic statistics publisher with default polling and Cribl output naming... | |
Device polls every 60 seconds and emits route records with Cribl field naming... | |
ECS Format
Configuring flat ECS field naming for the emitted records... | |
Route records use ECS-aligned snake_case field names... | |
Cribl Format
Configuring Cribl Stream internal-metrics field naming explicitly... | |
Route records use Cribl system fields with underscore-prefixed custom fields... | |
Prometheus Format
Configuring Prometheus base-unit field naming for the emitted records... | |
Route records use Prometheus naming conventions with | |
High-Volume Polling
Reducing poll interval and increasing workers for higher-throughput metric collection... | |
Four workers drain the partition queue in parallel; partitions are scanned every 30 seconds... |