Enforce Schema
Synopsis
Validates log entries against predefined schemas and enforces data structure compliance using Avro or Parquet schema definitions.
Schema
- enforce_schema:
description: <text>
schema: <string>
schema_type: <string>
if: <script>
ignore_failure: <boolean>
on_failure: <processor[]>
on_success: <processor[]>
tag: <string>
Configuration
The following fields are used to define the processor:
Field | Required | Default | Description |
---|---|---|---|
schema | Y | Schema definition string or reference | |
schema_type | N | "parquet" | Schema format type ("avro" or "parquet" ) |
description | N | - | Explanatory notes |
if | N | - | Condition to run |
ignore_failure | N | false | See Handling Failures |
on_failure | N | - | See Handling Failures |
on_success | N | - | See Handling Success |
tag | N | - | Identifier |
Details
The processor validates log entries against schema definitions to ensure data structure compliance. It supports two schema formats:
- Avro schemas: JSON-based schema definitions that provide rich data type validation and evolution support
- Parquet schemas: Column-oriented schema definitions optimized for analytics and big data processing
The processor caches compiled schemas using content hashing for performance optimization. When validation occurs, the log entry is automatically transformed to match the schema requirements, including:
- Type coercion: Converting compatible types to match schema expectations
- Field validation: Ensuring required fields are present
- Structure enforcement: Organizing nested data according to schema hierarchy
- Default value assignment: Adding missing fields with default values where defined
Schema enforcement is particularly useful in data pipelines where downstream systems require strict data contracts and consistent field types.
Schema validation can modify the original log entry structure. Fields that don't match the schema may be transformed, removed, or have their types changed to ensure compliance.
Use schema caching effectively by keeping schema definitions consistent. The processor uses content hashing to cache compiled schemas, so identical schema strings will reuse cached versions.
Examples
Basic Parquet Schema
Enforce a simple Parquet schema... |
|
with automatic type conversion: |
|
Avro Schema Validation
Use an Avro schema for rich validation... |
|
ensuring proper data types: |
|
Nested Structure Validation
Validate complex nested structures... |
|
with proper type enforcement: |
|
Default Values
Schema with default values... |
|
automatically adds missing fields: |
|
Array Validation
Validate arrays with typed elements... |
|
with automatic type conversion: |
|
Union Types
Handle union types for flexible fields... |
|
choosing the best matching type: |
|
Error Handling
Handle validation errors gracefully... |
|
by adding error information: |
|
Conditional Schema Enforcement
Apply schemas conditionally... |
|
based on event characteristics: |
|