Skip to main content
Version: 1.4.0

A Local Pipeline

Synopsis

This tutorial walks through the process of creating, validating, and testing a DataStream pipeline using the CSV processor.

Scenario

Define a pipeline configuration that will

  • parse a prepared CSV test data, and
  • output the extracted fields in JSON format.

Validation and testing of the pipeline will be done via Director's pipeline mode.

Setup and Trial

First, create a file named csv-consumer.yml in our config directory. You will use this file to define your pipeline.

Configure Your Pipeline

Create a YAML configuration file that defines your pipeline structure and processors:

csv-consumer.yml
pipelines:
name: csv_processing_pipeline
description: "Process CSV data from log entries"
processors:
- csv:
field: "message"
target_fields: ["timestamp", "level", "component", "details"]
separator: ","
trim: true
empty_value: "unknown"
ignore_failure: false
- set:
field: "processed_at"
value: "{{ now() }}"

Copy this configuration to the file, and save it.

Prepare Test Data

Create sample input data to use for validating your pipeline. To do this, create a file named test-data.json and place it in our working directory:

test-data.json
{
"message": "2024-01-15T10:30:00Z,INFO,auth-service,User login successful",
"@timestamp": "2024-01-15T10:30:00.000Z"
}

Validate Pipeline Syntax

Use Director to make sure that:

  • There are no syntax errors [✓]
  • Pipeline configuration is valid [✓]
  • All processors are correctly configured [✓]
vmetric-director -pipeline -path csv-consumer.yml -validate
info

The validator checks for syntactic correctness, required field presence, reference integrity, and logical consistency.

Test Pipeline Processing

Run your pipeline in test mode with sample data:

vmetric-director -pipeline -path csv-consumer.yml -input test-data.json -test

Verify Output

Check that your pipeline produces the following expected output:

{
"message": "2024-01-15T10:30:00Z,INFO,auth-service,User login successful",
"timestamp": "2024-01-15T10:30:00Z",
"level": "INFO",
"component": "auth-service",
"details": "User login successful",
"processed_at": "2024-01-15T10:30:15.123Z",
"@timestamp": "2024-01-15T10:30:00.000Z"
}