Skip to main content
Version: 1.3.0

Clean

Mutate String Processing

Removes unwanted characters from string fields using configurable cleaning modes and character sets.

Schema

- clean:
field: <ident>
target_field: <ident>
mode: <string>
chars: <string>
keep_chars: <string>
description: <text>
if: <script>
ignore_failure: <boolean>
ignore_missing: <boolean>
on_failure: <processor[]>
on_success: <processor[]>
tag: <string>

Configuration

The following fields are used to define the processor:

FieldRequiredDefaultDescription
fieldY-Source field to clean
target_fieldNfieldTarget field to store cleaned result
modeNcustomCleaning mode: alphanumeric, numeric, alpha, custom
charsNCommon delimitersCharacters to remove (used with custom mode)
keep_charsN-Additional characters to preserve in predefined modes
descriptionN-Explanatory note
ifN-Condition to run processor
ignore_failureNfalseContinue if processor fails
ignore_missingNfalseContinue if source field doesn't exist
on_failureN-Processors to run on failure
on_successN-Processors to run on success
tagN-Processor identifier

Details

The processor only processes string fields and string arrays, with non-string values automatically converted to strings before processing. Each element in string arrays is processed individually, and the processor removes unwanted characters from the beginning and end of strings through efficient trimming operations.

Unicode characters are properly handled in all cleaning modes, ensuring international character support. In custom mode without specifying chars, the processor removes common delimiters and special characters including quotes, brackets, and various punctuation marks.

The implementation uses efficient character-by-character processing suitable for high-volume log environments. This processor is essential for data sanitization, removing special characters from user input to ensure data integrity.

It excels at phone number normalization by extracting only digits from formatted phone numbers and ensures username cleaning by removing invalid characters while preserving valid ones. The processor is particularly effective for log message cleanup, removing formatting characters that may interfere with downstream processing.

It also supports identifier standardization, cleaning identifiers while allowing essential characters to be preserved through the keep_chars configuration. These capabilities make it valuable for input validation, preparing fields for downstream processing or storage systems that require clean, standardized data.

Cleaning Modes

alphanumeric

Keeps only letters and digits, removes all other characters.

numeric

Keeps only digits (0-9), removes all other characters.

alpha

Keeps only letters (a-z, A-Z), removes all other characters.

custom

Removes characters specified in the chars field. If chars is not provided, removes common delimiters and quotes.

Examples

Alphanumeric

Cleaning username to keep only letters and digits...

{
"username": "user@123!#$%name"
}
- clean:
field: username
mode: alphanumeric

removes special characters:

{
"username": "user123name"
}

Numeric Extraction

Extracting only digits from formatted phone number...

{
"phone_number": "+1 (555) 123-4567"
}
- clean:
field: phone_number
target_field: phone_digits
mode: numeric

creates field with digits only:

{
"phone_number": "+1 (555) 123-4567",
"phone_digits": "15551234567"
}

Custom Removal

Removing specific brackets and angle brackets from log message...

{
"log_message": "Error [404]: Resource not found {user: admin} <critical>"
}
- clean:
field: log_message
mode: custom
chars: "[](){}<>"

removes specified characters:

{
"log_message": "Error : Resource not found user: admin critical"
}

With Exceptions

Using alphanumeric mode while preserving specific characters...

{
"identifier": "server-01_prod.example.com!@#"
}
- clean:
field: identifier
mode: alphanumeric
keep_chars: "-_."

keeps allowed characters:

{
"identifier": "server-01_prod.example.com"
}

Arrays

Cleaning each element in an array of tags...

{
"tags": [
"\"production\"",
"[web-server]",
"(critical)"
]
}
- clean:
field: tags
mode: custom
chars: "[](){}\"'"

processes each array element:

{
"tags": [
"production",
"web-server",
"critical"
]
}

Default Mode

Using default custom mode without specifying chars...

{
"raw_data": "\"Hello, World!\" <message> [info] {data: value}"
}
- clean:
field: raw_data

removes common delimiters and quotes:

{
"raw_data": "Hello World message info data value"
}

Email

Cleaning email address while preserving dots...

{
"email_local": "[email protected]"
}
- clean:
field: email_local
mode: alphanumeric
keep_chars: "."

removes special chars except dots:

{
"email_local": "usertag.com"
}