Version: 1.2.0

Regex Replace

Parse Pattern Matching Data Transformation

Synopsis

A text processing processor that finds and replaces text patterns using regular expressions, providing powerful pattern-based text transformation capabilities for data cleaning, formatting, and normalization.

Schema

- regex_replace:
    field: <ident>
    regex: <string>
    replacement: <string>
    target_field: <ident>
    description: <text>
    if: <script>
    ignore_failure: <boolean>
    ignore_missing: <boolean>
    on_failure: <processor[]>
    on_success: <processor[]>
    tag: <string>

Configuration

The following fields are used to define the processor:

Field	Required	Default	Description
`field`	Y	-	Field containing the text to process
`regex`	Y	-	Regular expression pattern to match
`replacement`	Y	-	Replacement text or pattern
`target_field`	N	`field`	Field to store the modified text
`description`	N	-	Explanatory note
`if`	N	-	Condition to run
`ignore_failure`	N	`false`	Continue if regex processing fails
`ignore_missing`	N	`false`	Continue if source field doesn't exist
`on_failure`	N	-	See Handling Failures
`on_success`	N	-	See Handling Success
`tag`	N	-	Identifier

Details

The processor uses regular expressions to find and replace text patterns within string fields. It supports both simple text replacement and complex pattern matching with capture groups and backreferences.

note

This processor is an alias for the gsub processor, providing the same functionality with a more descriptive name.

Regular expression patterns support full regex syntax including character classes, quantifiers, anchors, and grouping. The replacement string can include backreferences ($1, $2, etc.) to captured groups from the regex pattern.

The processor handles all occurrences of the pattern within the text, making it suitable for comprehensive text cleaning and transformation tasks.

warning

Test regex patterns thoroughly to avoid unintended matches or performance issues with complex patterns.

Examples

Basic Text Replacement

Replacing simple text patterns...	`{ "message": "Error: Failed to connect to server" }` `- regex_replace: field: message regex: "Error:" replacement: "WARNING:"`
updates the error level:	`{ "message": "WARNING: Failed to connect to server" }`

Pattern Matching with Capture Groups

Using capture groups for reformatting...

{
  "timestamp": "2024-01-15 14:30:25"
}

- regex_replace:
    field: timestamp
    regex: "(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})"
    replacement: "$2/$3/$1 $4:$5:$6"
    target_field: formatted_timestamp

reformats the date:

{
  "timestamp": "2024-01-15 14:30:25",
  "formatted_timestamp": "01/15/2024 14:30:25"
}

Email Masking

Masking email addresses for privacy...

{
  "user_info": "Contact [email protected] for support"
}

- regex_replace:
    field: user_info
    regex: "([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})"
    replacement: "***@$2"

masks the username portion:

{
  "user_info": "Contact ***@example.com for support"
}

Log Level Normalization

Normalizing various log level formats...

{
  "log_line": "[ERR] Database connection failed",
  "log_line2": "WARN: Memory usage high"
}

- regex_replace:
    field: log_line
    regex: "\\[(ERR|ERROR)\\]"
    replacement: "[ERROR]"
- regex_replace:
    field: log_line2
    regex: "WARN:"
    replacement: "[WARNING]"

standardizes log levels:

{
  "log_line": "[ERROR] Database connection failed",
  "log_line2": "[WARNING] Memory usage high"
}

URL Path Extraction

Extracting paths from URLs...

{
  "request_url": "https://api.example.com/v1/users/123?param=value"
}

- regex_replace:
    field: request_url
    regex: "https?://[^/]+(/[^?]*)"
    replacement: "$1"
    target_field: url_path

extracts just the path:

{
  "request_url": "https://api.example.com/v1/users/123?param=value",
  "url_path": "/v1/users/123"
}

Multi-Pattern Replacement

Applying multiple regex replacements...

{
  "raw_text": "User ID: 12345, Phone: (555) 123-4567, Email: [email protected]"
}

- regex_replace:
    field: raw_text
    regex: "\\d{5}"
    replacement: "XXXXX"
- regex_replace:
    field: raw_text
    regex: "\\(\\d{3}\\) \\d{3}-\\d{4}"
    replacement: "XXX-XXX-XXXX"
- regex_replace:
    field: raw_text
    regex: "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
    replacement: "***@***.***"

sanitizes sensitive information:

{
  "raw_text": "User ID: XXXXX, Phone: XXX-XXX-XXXX, Email: ***@***.***"
}

Synopsis​

Schema​

Configuration​

Details​

Examples​

Basic Text Replacement​

Pattern Matching with Capture Groups​

Email Masking​

Log Level Normalization​

URL Path Extraction​

Multi-Pattern Replacement​