Skip to main content

Pre-processing

This is the stage before routing. Pipelines attached to a source prepare the data for downstream work such as indexing or analysis.

Pre-processing helps organizations reduce their costs by optimizing their data ingestion before the data reaches the routing stage.

Key Concepts

The motivation for implementing this stage is rendering the data suitable for routing operations. As such, it involves the following.

Structural optimization

In this phase, fields are optimized by eliminating unnecessary ones with remove, and re-mapping relevant ones with rename for downstream handling. Strings are optimized for pattern replacement with gsub. Events are filtered conditionally using drop.

Normalization

Next is the phase of converting diverse log formats into uniform structures:

Fields are standardized with normalize for conversion between the ECS, CIM, ASIM, CEF, LEEF and CSL formats. They are then restructured with dot_nester to flatten or regroup nested objects. Finally, they are parsed to extract structured data with csv, kv, or grok.

The values are formatted for uniform casing with uppercase and lowercase. Text data is transformed using gsub for regex-based substitutions. Fields are broken into arrays with split, and array elements are combined with join.

Enrichment

This is followed by multi-layering the context: Location data is adding with geoip for geographic information. Data is correlated with lookup tables using enrich. URI components are extracted with uri_parts. And user agent information is parsed with user_agent.

Best Practices

For a robust and streamlined operation, always keep the following in mind:

  • Sequencing - Place critical transformations (removal, dropping) first, group related operations together, and perform enrichment last
tip

Start with the essential processors, and add complexity incrementally while monitoring the impact on performance.

  • Performance - Use efficient regex patterns in gsub and grok, cache the lookup results with enrich, and monitor pipeline throughput
warning

Complex processing chains can impact throughput. Monitor resource usage, and adjust pipeline complexity accordingly.

  • Error handling - Anticipate potential failures such as missing fields or malformed data, validate the transformations, and use conditional processing to handle them appropriately using ignore_failure
  • Maintenance - Document the pipeline configurations, and review and update the enrichment data