Skip to main content
Version: 1.3.0

Substring

Text Processing String Manipulation

Synopsis

Extracts substrings from string fields.

Schema

- substring:
field: <ident>
start: <integer>
end: <integer>
length: <integer>
target_field: <string>
description: <text>
if: <script>
ignore_failure: <boolean>
ignore_missing: <boolean>
on_failure: <processor[]>
on_success: <processor[]>
tag: <string>

Configuration

The following fields are used to define the processor:

FieldRequiredDefaultDescription
fieldY-Source field containing string to extract from
startY-Starting position (0-based index)
endN-Ending position (exclusive, use with start)
lengthN-Length of substring (use with start instead of end)
target_fieldNSame as fieldTarget field to store extracted substring
descriptionN-Explanatory note
ifN-Condition to run
ignore_failureNfalseContinue processing if extraction fails
ignore_missingNfalseSkip processing if referenced field doesn't exist
on_failureN-See Handling Failures
on_successN-See Handling Success
tagN-Identifier

Details

Extracts a portion of a string based on starting position and either ending position or length. The processor supports both zero-based indexing for precise character extraction and handles Unicode strings correctly.

You can specify the substring using either start + end parameters (where end is exclusive) or start + length parameters. Negative indices are supported for counting from the end of the string.

note

The processor uses zero-based indexing where the first character is at position 0. When using end, it's exclusive (not included in the result). When using length, it specifies how many characters to extract.

The processor handles Unicode characters properly and will not split multi-byte characters. If the specified range exceeds the string boundaries, it extracts only the available portion.

warning

If the starting position is beyond the string length or if start > end, the processor will return an empty string. Negative values that exceed the string length will be treated as position 0.

Examples

Basic Substring Extraction

Extracting characters 5-10 from a string...

{
"message": "Hello World DataStream"
}
- substring:
field: message
start: 6
end: 11
target_field: extracted

extracts "World":

{
"message": "Hello World DataStream",
"extracted": "World"
}

Length-based Extraction

Extracting 8 characters starting from position 12...

{
"log_entry": "2024-01-15 DataStream processing started"
}
- substring:
field: log_entry
start: 11
length: 10
target_field: component

extracts component name:

{
"log_entry": "2024-01-15 DataStream processing started",
"component": "DataStream"
}

Prefix Extraction

Extracting first 10 characters as prefix...

{
"transaction_id": "TXN-2024-01-15-14-30-45-ABC123"
}
- substring:
field: transaction_id
start: 0
length: 14
target_field: date_prefix

extracts date portion:

{
"transaction_id": "TXN-2024-01-15-14-30-45-ABC123",
"date_prefix": "TXN-2024-01-15"
}

In-place Trimming

Trimming string to specific length...

{
"long_description": "This is a very long description that needs to be shortened for display purposes"
}
- substring:
field: long_description
start: 0
length: 30

trims to 30 characters:

{
"long_description": "This is a very long descriptio"
}

URL Path Extraction

Extracting path from URL...

{
"full_url": "https://api.example.com/v1/users/profile"
}
- substring:
field: full_url
start: 22
target_field: api_path

extracts API path:

{
"full_url": "https://api.example.com/v1/users/profile",
"api_path": "/v1/users/profile"
}