Routes: Overview
Routes define how incoming data is filtered, processed, and directed to specific destinations. They act as traffic controllers, determining what data goes through which pipelines and ultimately reaches which targets.
In the following chapters we will detail their structure and behavior.
Basics
Routes are typically made up of filtering expressions and sequences of processors used to shape the format and flow of the data streams. As such, they curate data through a rich and complex evaluation process, and forward that data to consumers downstream.
Components
It should be obvious from their name that routes fundamentally select and forward data. Therefore, their behavior is characterized by the following elements:
Conditions | Optional filter expressions that match specific events |
Pipelines | Optional processing schemes applied to select events |
Targets | One or more destinations to forward the processed data |
Filtering
Selecting events typically involves evaluating boolean expressions such as
device.type == 'syslog'
dataset.name == 'firewall_logs'
log.severity > 4
!(source.ip == '10.0.0.1' || source.ip == '10.0.0.2')
etc. which are applied to previously normalized data.
Data Flow
Forwarding data involves controlling the flow, and there are several design patterns:
Pipelines
Routes can include zero or more pipelines. Multiple pipelines will process data sequentially. A pipeline's processors can override routing by using the reroute
processor. The final
processor terminates the pipeline's execution.
Evaluation
Routes are evaluated in order, and there are some critical aspect to take into consideration:
- Route conditions are evaluated before the pipelines begin their work
- The first route that has a match processes the event
- The pipelines can then modify the flow or the direction of the data
No need to mention, multiple targets receive identical copies of the same data, so the above do not impact who receives what.
To optimize performance, in addition to minimizing the number of routes and avoiding unnecessary pipeline duplications, filter as early as possible—using clear, specific and effective expressions—and always consider the order of evaluation carefully.
Best Practices
As usual, a number of guidelines are in order when designing, using, and maintaining routes:
Use clear, descriptive names that bear the purpose, e.g. apache_to_analytics
, and keep them consistent across similar routes.
When filtering, start with the most specific routes, and use precise and effective conditions, avoiding overlapping ones.
When assigning pipelines, use a single one when possible, and document multi-pipeline routes. Duplicate data only when this is necessary.
As for targets, collate similar ones, documenting any mirroring requirements, and consider the target capacity and throughput.