Skip to main content

Enrich

Enrich

Synopsis

Enriches documents by performing SQL queries against lookup tables loaded from CSV files or datasets. It can then join the data with externally referenced information to add context and semantic depth.

Schema

enrich:
- lookup_table: <string[]>
- dataset_name: <string[]>
- query: <string>
- type: <string>
- description: <text>
- if: <script>
- tag: <string>
- on_success: <processor[]>
- on_failure: <processor[]>
- override: <boolean>
- ignore_missing: <boolean>
- ignore_failure: <boolean>

Configuration

FieldRequiredDefaultDescription
lookup_tableN-Array of paths to CSV files containing lookup data
dataset_nameN-Array of dataset names to load lookup data from
queryY-SQL or KQL query to execute for enrichment
typeN-Query language type (kusto for KQL, empty for SQLite)
descriptionN-Explanatory note
ifN-Condition to run
ignore_failureNfalseContinue processing if enrichment fails
ignore_missingNfalseSkip if fields don't exist
on_failureN-Processors to run on failure
on_successN-Processors to run on success
overrideNfalseIf true, replaces existing document with query results
tagN-Identifier for logging

Details

Information gathering from external sources involves

  • loading lookup data from CSV files or datasets which may involve using SQL queries, including SQLite and Kusto Query Language (KQL),
  • extracting and manipulating JSON data, overriding fields, and
  • resolving conflicts.
warning

KQL queries are automatically converted to SQLite syntax, and some complex KQL features might not be supported.

The processor provides the following special tables and fields for queries:

_ingest
Contains the current document's data
_message
JSON string of the entire document
lookup_table
Contains data loaded from CSV files

CSV files must have headers in the first row as the column names become field names in queries. Headers are case-sensitive, and empty values are treated as NULL.

warning

As large CSV files are loaded directly into memory, the impact of using very large lookup tables must be carefully considered.

Examples

CSV Lookup

Enriching logs with data from a CSV table...

{
"event_code": "1234"
}
enrich:
- lookup_table: ["event_codes.csv"]
- query: |
SELECT
_ingest.*,
lookup_table.description as event_description
FROM _ingest
LEFT JOIN lookup_table
ON _ingest.event_code = lookup_table.code

adds description:

{
"event_code": "1234",
"event_description": "System startup"
}

KQL Queries

Processing logs using Kusto Query Language...

{
"DeviceVendor": "Check Point",
"DeviceProduct": "VPN-1 & FireWall-1",
"SourceIP": "192.168.1.100"
}
enrich:
- type: "kusto"
- query: |
_ingest
| where DeviceVendor == "Check Point"
| extend isInternal = SourceIP startswith "192.168."
| project DeviceVendor, DeviceProduct, SourceIP, isInternal

adds computed fields:

{
"DeviceVendor": "Check Point",
"DeviceProduct": "VPN-1 & FireWall-1",
"SourceIP": "192.168.1.100",
"isInternal": true
}

Multiple Tables

Joining data from multiple lookup tables...

{
"user_id": "U123",
"department_id": "D456"
}
enrich:
- lookup_table:
- "users.csv"
- "departments.csv"
- query: |
SELECT
_ingest.*,
u.name as user_name,
d.name as department_name
FROM _ingest
LEFT JOIN lookup_table u ON _ingest.user_id = u.id
LEFT JOIN lookup_table d ON _ingest.department_id = d.id

combines information from both tables:

{
"user_id": "U123",
"department_id": "D456",
"user_name": "John Doe",
"department_name": "IT Support"
}

Override Mode

Replacing the document with query results...

{
"raw_data": "lots of fields",
"event_type": "security"
}
enrich:
- override: true
- query: |
SELECT event_type,
'processed' as status
FROM _ingest
WHERE event_type = 'security'

keeps only the selected fields:

{
"event_type": "security",
"status": "processed"
}