Skip to main content
Version: 1.4.0

Devices: Overview

Devices are configurable components that serve as data collection points in your environment. They represent various input mechanisms for receiving, processing, and forwarding log data and security events. Each device is defined using a standardized YAML configuration format that specifies its behavior, connection parameters, and processing options.

DataStream uses devices as an abstraction layer to manage telemetry. As such, they decouple data sources from DataStream's pipelines.

note

Each device type provides specific configuration options detailed in their respective sections.

Definitions

Devices operate on the following principles:

  1. Unified Configuration Structure: All devices share a common configuration framework with device-specific properties.
  2. Data Collection: Devices receive data through network connections, APIs, or direct system access.
  3. Pipeline Integration: Devices can link to preprocessing pipelines for data transformation.
  4. Stateful Operation: Devices maintain their operational state and can be enabled or disabled.
note

Devices enable:

Authentication: Basic authentication, API keys, HMAC signing, and client certificates.
Encryption: TLS/SSL, SNMPv3 privacy, and custom encryption.

They also provide access control, and audit logging.

Configuration

All devices share the following base configuration fields:

FieldRequiredDescription
idYUnique numeric identifier
nameYDevice name
description-Optional description of the device's purpose
typeYDevice type identifier (e.g., http, syslog, tcp)
tags-Array of labels for categorization
pipelines-Array of preprocessing pipeline references
status-Boolean flag to enable/disable the device (default: true)
tip

Each device type provides specific options detailed in its respective section.

Use the id of the device to refer to it in your configurations.

Example:

devices:
- id: 1
name: http_logs
type: http
properties:
port: 8080
content_type: "application/json"

This is an HTTP device listening on port 8080, and it expects the incoming data to be in JSON format.

Deployment

Devices enable handling and processing of data through the following means:

Pipelines: These provide the capability to parse and enrich messages with custom rules. Data is transformed—through field mapping and type conversion—and events are correlated and aggregated, custom fields are extracted, tagged, filtered, and routed based on message content. For certain conditions, alerts are generated. Additionally, there is templating support for output formatting, and built-in and custom functions.

Queues: These provide persistent message storage, configurable batch processing capability, automatic checkpoint recovery, and rotation and cleanup ability.

Devices offer various deployment options for different purposes:

  • Standalone - Sources are connected directly to Director, bypassing any intermediate mechanism.

    Sources → Director

  • High Availability - Sources are first connected to a load balancer—which distributes the work to multiple processes—after which they are streamed to either a Director cluster:

    Sources → Load Balancer → Director

    -or- sources are connected to a cluster with Variable Information Period (VIP):

    Sources → Director Cluster

Device Types

The system supports the following device types:

  • Network-Based - These devices listen for incoming network connections:

    • HTTP: Accepts JSON data via HTTP/HTTPS POST requests with authentication options
    • TCP: Receives messages over TCP connections with framing and TLS support
    • UDP: Collects datagram-based messages with high throughput capabilities
    • Syslog: Specialized for syslog format messages with forwarding capabilities
  • Cloud Integration - These devices connect to cloud services:

    • Azure Monitor: Collects logs from Azure Log Analytics workspaces
  • Security-Specific - These devices integrate with security products:

    • eStreamer: Connects to eStreamer servers for security event collection
  • System Integration - These devices interact with operating systems:

    • Windows: Collects Windows events, performance data, and metrics

Use Cases

Devices can be used in the following scenarios:

  • Infrastructure monitoring: Provides system performance metrics, event logs, resource utilization, and service availability information.

  • Security operations: Enables security event monitoring, threat detection, compliance monitoring, and provides audit trails.

  • Application telemetry: Provides application logs and performance metrics, and enables error tracking and user activity monitoring.

  • Network monitoring: Provides network device logs and SNMP data, and enables traffic analysis and connection tracking

  • Performance: Improves telemetry performance through the multi-worker architecture, dynamic scaling ability, socket reuse optimization, buffer management, and resource monitoring capability.

Implementation Strategies

Listening

Data is received from listeners using various network protocols with two types of collection:

Push-based: Devices like Syslog (via UDP/TCP), SNMP Traps, HTTP/HTTPS, TCP/UDP, and eStreamer send event data.

Pull-based: Data is fetched from Kafka topics, Microsoft Sentinel, REST APIs, database queries, and other custom integration types.

Monitoring

For monitoring operating systems, Director uses a unified agent-based approach with two types of deployment:

Managed (Traditional): The agent is installed and managed by system administrators. This provides persistent installation on the target system. Local data is buffered in the emergence of network issues. Director supports Windows, Linux, macOS, Solaris, and AIX.

Auto-managed (Agentless): The agent is automatically deployed and managed, no manual installation is required. Auto-managed agents provide local data buffering, network resilience, and performance optimization. This deployment type is self-healing, since the agent is automatically redeployed if the process terminates. Also, it supports remote credential management. Deployment is done using WinRM for Windows, and SSH for Linux, macOS, Solaris, and AIX.

Both approaches provide local data processing, store-and-forward capability against connectivity issues, real-time metrics and events, and native OS monitoring. The key difference is deployment and lifecycle management, not functionality.

Layered Collectors

Configure multiple devices to handle different aspects of data collection:

  • External-facing HTTP endpoints for application logs
  • Internal TCP/UDP listeners for network device logs
  • Specialized connectors for cloud and security products

Pipeline Integration

Enhance device functionality by attaching preprocessing pipelines:

  • Filtering unwanted events
  • Normalizing data formats
  • Enriching events with additional context
  • Transforming raw data into structured formats