A Local Pipeline
Synopsis
This tutorial walks through the process of creating, validating, and testing a DataStream pipeline using the CSV processor.
Scenario
Define a pipeline configuration that will
- parse a prepared CSV test data, and
- output the extracted fields in JSON format.
Validation and testing of the pipeline will be done via Director's pipeline mode.
Setup and Trial
First, create a file named csv-consumer.yml
in our config
directory. You will use this file to define your pipeline.
Configure Your Pipeline
Create a YAML configuration file that defines your pipeline structure and processors:
pipelines:
name: csv_processing_pipeline
description: "Process CSV data from log entries"
processors:
- csv:
field: "message"
target_fields: ["timestamp", "level", "component", "details"]
separator: ","
trim: true
empty_value: "unknown"
ignore_failure: false
- set:
field: "processed_at"
value: "{{ now() }}"
Copy this configuration to the file, and save it.
Prepare Test Data
Create sample input data to use for validating your pipeline. To do this, create a file named test-data.json
and place it in our working directory:
{
"message": "2024-01-15T10:30:00Z,INFO,auth-service,User login successful",
"@timestamp": "2024-01-15T10:30:00.000Z"
}
Validate Pipeline Syntax
Use Director to make sure that:
- There are no syntax errors [✓]
- Pipeline configuration is valid [✓]
- All processors are correctly configured [✓]
vmetric-director -pipeline -path csv-consumer.yml -validate
The validator checks for syntactic correctness, required field presence, reference integrity, and logical consistency.
Test Pipeline Processing
Run your pipeline in test mode with sample data:
vmetric-director -pipeline -path csv-consumer.yml -input test-data.json -test
Verify Output
Check that your pipeline produces the following expected output:
{
"message": "2024-01-15T10:30:00Z,INFO,auth-service,User login successful",
"timestamp": "2024-01-15T10:30:00Z",
"level": "INFO",
"component": "auth-service",
"details": "User login successful",
"processed_at": "2024-01-15T10:30:15.123Z",
"@timestamp": "2024-01-15T10:30:00.000Z"
}