Skip to main content

DataStream Statistics

Pull

Synopsis

Polls the local DataStream stats database for completed partitions and republishes route, target, device, pipeline, device-resource, and queue metrics through standard pipelines and routes. Output field naming is selectable between ECS, Cribl, and Prometheus conventions.

Schema

- id: <numeric>
name: <string>
description: <string>
type: stats
tags: <string[]>
pipelines: <pipeline[]>
status: <boolean>
properties:
poll_interval: <numeric>
workers: <numeric>
field_format: <string>

Configuration

The following fields are used to define the device:

Device

FieldRequiredDefaultDescription
idYUnique numeric identifier
nameYDevice name
descriptionN-Optional description
typeYMust be stats
tagsN-Array of labels for categorization
pipelinesN-Array of preprocessing pipeline references
statusNtrueEnable/disable the device

Polling

FieldRequiredDefaultDescription
poll_intervalN60How often to scan for completed partitions, in seconds. Must be greater than 0.
workersN1Number of worker goroutines that drain the partition queue

Output

FieldRequiredDefaultDescription
field_formatNcriblOutput field naming convention. One of ecs, cribl, prometheus.

Details

What Gets Emitted

Each polling cycle emits records in one of six input_type categories:

  • route — per-route counters including events and bytes in/out, latency, errors, and drops
  • target — per-target counters with the same shape as route records, plus a target identifier
  • device — per-device counters tracking events and bytes processed by each device
  • pipeline — per-pipeline execution_count and execution_time aggregates
  • device_resource — host resource usage with resource_type of volume, cpu, or memory, plus identifier and total/utilized sizes
  • queue — queue depth and discard counters including pending_files, pending_bytes, discarded_files, and discarded_bytes

Output Formats

Three field naming conventions are supported:

  • ecs — flat snake_case internal representation using ECS-aligned field names such as route_name, target_name, latency_ns, and error_count
  • cribl — Cribl Stream internal-metrics naming with cribl_* system fields (cribl_route, cribl_output, cribl_host) and underscore-prefixed custom fields (_time, _type, _events_in, _error_count)
  • prometheus — Prometheus base-unit naming with _total, _nanoseconds, and _bytes suffixes (events_in_total, latency_nanoseconds, bytes_in_total)

Polling Cadence

poll_interval must be greater than 0; the device rejects any configuration where this constraint is violated. Completed partitions are scanned at each interval and enqueued for processing by workers goroutines.

Lifecycle

The device has no external network connectivity or remote heartbeat. On start, it validates that the stats database directory exists. If validation succeeds, the device reports ConnectionStateConnected. If the directory is absent or poll_interval is invalid, it reports ConnectionStateErrorConfiguration. If 120 seconds pass without an internal heartbeat, the device reports ConnectionStateErrorHeartbeat and stops the collector.

Restart Triggers

Changes to poll_interval, workers, or field_format trigger a clean collector restart. Changes to other configuration fields do not restart the collector.

Examples

The following are commonly used configuration types.

Basic Configuration

Creating a basic statistics publisher with default polling and Cribl output naming...

devices:
- id: 1
name: stats-publisher
type: stats

Device polls every 60 seconds and emits route records with Cribl field naming...

{
"_time": 1746009600,
"_input_type": "route",
"_type": "in",
"cribl_route": "main-route",
"cribl_host": "director-01",
"_events_in": 14820,
"_events_out": 14820,
"bytes_in": 9437184,
"bytes_out": 9437184,
"_latency_ns": 412000,
"_error_count": 0,
"_dropped_count": 0
}

ECS Format

Configuring flat ECS field naming for the emitted records...

devices:
- id: 2
name: stats-ecs
type: stats
properties:
field_format: ecs

Route records use ECS-aligned snake_case field names...

{
"timestamp": 1746009600,
"input_type": "route",
"type": "in",
"route_name": "main-route",
"director_name": "director-01",
"events_in": 14820,
"events_out": 14820,
"bytes_in": 9437184,
"bytes_out": 9437184,
"latency_ns": 412000,
"error_count": 0,
"dropped_count": 0
}

Cribl Format

Configuring Cribl Stream internal-metrics field naming explicitly...

devices:
- id: 3
name: stats-cribl
type: stats
properties:
field_format: cribl

Route records use Cribl system fields with underscore-prefixed custom fields...

{
"_time": 1746009600,
"_input_type": "route",
"_type": "in",
"cribl_route": "main-route",
"cribl_host": "director-01",
"_events_in": 14820,
"_events_out": 14820,
"bytes_in": 9437184,
"bytes_out": 9437184,
"_latency_ns": 412000,
"_error_count": 0,
"_dropped_count": 0
}

Prometheus Format

Configuring Prometheus base-unit field naming for the emitted records...

devices:
- id: 4
name: stats-prometheus
type: stats
properties:
field_format: prometheus

Route records use Prometheus naming conventions with _total, _nanoseconds, and _bytes suffixes...

{
"timestamp_ms": 1746009600000,
"input_type": "route",
"direction": "in",
"route": "main-route",
"instance": "director-01",
"events_in_total": 14820,
"events_out_total": 14820,
"bytes_in_total": 9437184,
"bytes_out_total": 9437184,
"latency_nanoseconds": 412000,
"errors_total": 0,
"dropped_total": 0
}

High-Volume Polling

Reducing poll interval and increasing workers for higher-throughput metric collection...

devices:
- id: 5
name: stats-highvolume
type: stats
properties:
poll_interval: 30
workers: 4
field_format: ecs

Four workers drain the partition queue in parallel; partitions are scanned every 30 seconds...