Version: 1.3.0

Sample

Synopsis

Reduces data volume by sampling log entries based on configurable rules.

Schema

- sample:
    rules: <rule[]>
    exclude_filters: <string[]>
    tag: <string>
    description: <text>
    if: <script>
    ignore_failure: <boolean>
    ignore_missing: <boolean>
    on_failure: <processor[]>
    on_success: <processor[]>

Configuration

The following fields are used to define the processor:

Field	Required	Default	Description
`rules`	N	-	List of sampling rules with filters and rates
`exclude_filters`	N	-	List of conditions to exclude events from sampling
`description`	N	-	Explanatory note
`if`	N	-	Condition to run
`ignore_failure`	N	`false`	Continue processing if sampling fails
`ignore_missing`	N	`false`	Skip if referenced fields don't exist
`on_failure`	N	-	Error handling processors
`on_success`	N	-	Success handling processors
`tag`	N	-	Identifier

Sampling Rule Object

Each rule in the rules array is an object with the following properties:

Field	Required	Default	Description
`filter`	Y	-	Condition that determines which events this rule applies to
`sampling_rate`	Y	`2`	Keep 1 event for every N events (integer or string template)

Details

Reduces data volume by implementing systematic sampling of log entries based on configurable rules. The processor keeps one event for every N matching events, where N is the specified sampling rate, allowing for efficient data reduction while maintaining statistical representation.

note

The processor adds a metadata field _vmetric.sampled to sampled events, showing the current sampling rate (e.g., "10:1"). This information is useful for adjusting statistics during analysis to account for sampling.

Rule-based sampling provides fine-grained control over which types of events are sampled and at what rates. This helps balance data volume with analytical needs by keeping all critical events while sampling high-volume, routine events.

warning

Sampling inherently discards data, so use with caution for critical events. Always use exclude_filters to preserve important events like errors, alerts, or security incidents that require 100% preservation regardless of volume.

Examples

Basic

Applying simple 1:10 sampling...	`- sample: rules: - filter: "true" sampling_rate: 10`
keeps every 10th event, reducing volume by 90%

Conditional

Applying different sampling rates based on log level...

- sample:
    rules:
      - filter: "logEntry.log.level == 'debug'"
        sampling_rate: 100
      - filter: "logEntry.log.level == 'info'"
        sampling_rate: 10
      - filter: "logEntry.log.level == 'warning'"
        sampling_rate: 2
    exclude_filters:
      - "logEntry.log.level == 'error'"
      - "logEntry.log.level == 'critical'"

keeps all error/critical logs, 50% of warnings, 10% of info, and 1% of debug logs

Dynamic

Using field values to determine sampling rate...

- sample:
    rules:
      - filter: "hasField(logEntry, 'http.response.status_code')"
        sampling_rate: "{{config.sampling_rates.http}}"
      - filter: "hasField(logEntry, 'database.query')"
        sampling_rate: "{{config.sampling_rates.db}}"

applies configurable rates from system settings

Service-Based

Sampling differently based on service...

- sample:
    rules:
      - filter: "logEntry.service.name == 'frontend'"
        sampling_rate: 5
      - filter: "logEntry.service.name == 'backend-api'"
        sampling_rate: 20
      - filter: "logEntry.service.name == 'auth-service'"
        sampling_rate: 2
    exclude_filters:
      - "hasField(logEntry, 'error.message')"
      - "logEntry.event.outcome == 'failure'"

tailors sampling based on service characteristics while preserving error data

Complex

Comprehensive sampling keeps relevant data while significantly reducing traffic

- sample:
    if: "logEntry.environment == 'production'"
    rules:
      - filter: "logEntry.http.request.method == 'GET' && logEntry.http.response.status_code >= 200 && logEntry.http.response.status_code < 300"
        sampling_rate: 50
      - filter: "logEntry.http.request.method == 'GET' && logEntry.http.response.status_code >= 400 && logEntry.http.response.status_code < 500"
        sampling_rate: 5
      - filter: "logEntry.event.category == 'database' && logEntry.event.duration < 100"
        sampling_rate: 20
    exclude_filters:
      - "logEntry.http.response.status_code >= 500"
      - "logEntry.event.severity >= 70"
      - "logEntry.http.request.method != 'GET'"
      - "logEntry.event.duration >= 1000"
      - "logEntry.user.id == 'admin'"

Synopsis​

Schema​

Configuration​

Sampling Rule Object​

Details​

Examples​

Basic​

Conditional​

Dynamic​

Service-Based​

Complex​