Skip to main content

Normalization

Normalization is a critical stage that standardizes log data from diverse sources into consistent formats, enabling unified analysis across different logging systems and formats.

Log Format Standards

The processor supports several widely-used log formats:

Generic Formats

FormatNotationKey IdentifierLayout CharacteristicsExample Fields
Elastic Common Schema (ECS)Dot notation with lowercase@timestampHierarchical structuresource.ip, network.direction
Splunk Common Information Model (CIM)Underscore with lowercase_timeFlat structuresrc_ip, network_direction
Advanced Security Information Model (ASIM)PascalCaseTimeGeneratedExplicit namesSourceIp, NetworkDirection

Security-specific Formats

FormatDescriptionKey IdentifierExample Fields
Common Event Format (CEF)ArcSight's standard formatrt (receiptTime)networkUser, sourceAddress
Log Event Extended Format (LEEF)IBM QRadar's formatdevTimenetworkUser, srcAddr
Common Security Log (CSL)Microsoft Sentinel's formatTimeGeneratedNetworkUser, SourceAddress

Format Detection

The processor can automatically detect source formats through certain characteristic fields, e.g.:

ContextFieldFormat
Timestamp@timestampECS
_timeCIM
TimeGeneratedASIM/CSL
SecurityrtCEF
devTimeLEEF
CSL detectionTimeGenerated + LogSeverityCSL
TimeGenerated onlyASIM

Conversion

Casing and Delimiters

Each format follows specific naming conventions:

ECS
source.ip, event.severity
CIM
src_ip, event_severity
ASIM
SourceIp, EventSeverity
CEF
sourceAddress, eventSeverity
LEEF
srcAddr, evtSev
CSL
SourceIP, EventSeverity
warning

Complex format conversions may impact performance.

Field Mapping

Common network fields based on context across various formats:

Context
FormatSource IPDestination IPDirection
ECSsource.ipdestination.ipnetwork.direction
CIMsrc_ipdest_ipnetwork_direction
ASIMSourceIpDstIpNetworkDirection
CEFsrcdstnetworkDirection
LEEFsrcAddrdstAddrnetDir
CSLSourceIpDestinationIpNetworkDirection

Configuration

Basic

Convert from ECS to ASIM format:

normalize:
- source_format: ecs
- target_format: asim

Field-specific

Convert a specific network field:

normalize:
- field: network_data
- source_format: cef
- target_format: ecs

Auto-detection

Let the processor detect the source format:

normalize:
- target_format: cim

Best Practices

For data integrity, always validate transformed logs against originals, keep original fields when possible for debugging, and document format-specific transformations.

For performance, do the normalization early in the pipeline, cache results for lookup when possible, and monitor transformation overhead.

For error handling, use ignore_failure and implement fallback mechanisms. Also, do not forget to test with diverse samples.