Version: 1.3.0

Substring

Text Processing String Manipulation

Synopsis

Extracts substrings from string fields.

Schema

- substring:
    field: <ident>
    start: <integer>
    end: <integer>
    length: <integer>
    target_field: <string>
    description: <text>
    if: <script>
    ignore_failure: <boolean>
    ignore_missing: <boolean>
    on_failure: <processor[]>
    on_success: <processor[]>
    tag: <string>

Configuration

The following fields are used to define the processor:

Field	Required	Default	Description
`field`	Y	-	Source field containing string to extract from
`start`	Y	-	Starting position (0-based index)
`end`	N	-	Ending position (exclusive, use with start)
`length`	N	-	Length of substring (use with start instead of end)
`target_field`	N	Same as `field`	Target field to store extracted substring
`description`	N	-	Explanatory note
`if`	N	-	Condition to run
`ignore_failure`	N	`false`	Continue processing if extraction fails
`ignore_missing`	N	`false`	Skip processing if referenced field doesn't exist
`on_failure`	N	-	See Handling Failures
`on_success`	N	-	See Handling Success
`tag`	N	-	Identifier

Details

Extracts a portion of a string based on starting position and either ending position or length. The processor supports both zero-based indexing for precise character extraction and handles Unicode strings correctly.

You can specify the substring using either start + end parameters (where end is exclusive) or start + length parameters. Negative indices are supported for counting from the end of the string.

note

The processor uses zero-based indexing where the first character is at position 0. When using end, it's exclusive (not included in the result). When using length, it specifies how many characters to extract.

The processor handles Unicode characters properly and will not split multi-byte characters. If the specified range exceeds the string boundaries, it extracts only the available portion.

warning

If the starting position is beyond the string length or if start > end, the processor will return an empty string. Negative values that exceed the string length will be treated as position 0.

Examples

Basic Substring Extraction

Extracting characters 5-10 from a string...	`{ "message": "Hello World DataStream" }` `- substring: field: message start: 6 end: 11 target_field: extracted`
extracts "World":	`{ "message": "Hello World DataStream", "extracted": "World" }`

Length-based Extraction

Extracting 8 characters starting from position 12...

{
  "log_entry": "2024-01-15 DataStream processing started"
}

- substring:
    field: log_entry
    start: 11
    length: 10
    target_field: component

extracts component name:

{
  "log_entry": "2024-01-15 DataStream processing started",
  "component": "DataStream"
}

Prefix Extraction

Extracting first 10 characters as prefix...

{
  "transaction_id": "TXN-2024-01-15-14-30-45-ABC123"
}

- substring:
    field: transaction_id
    start: 0
    length: 14
    target_field: date_prefix

extracts date portion:

{
  "transaction_id": "TXN-2024-01-15-14-30-45-ABC123",
  "date_prefix": "TXN-2024-01-15"
}

In-place Trimming

Trimming string to specific length...

{
  "long_description": "This is a very long description that needs to be shortened for display purposes"
}

- substring:
    field: long_description
    start: 0
    length: 30

trims to 30 characters:

{
  "long_description": "This is a very long descriptio"
}

URL Path Extraction

Extracting path from URL...

{
  "full_url": "https://api.example.com/v1/users/profile"
}

- substring:
    field: full_url
    start: 22
    target_field: api_path

extracts API path:

{
  "full_url": "https://api.example.com/v1/users/profile",
  "api_path": "/v1/users/profile"
}

Synopsis​

Schema​

Configuration​

Details​

Examples​

Basic Substring Extraction​

Length-based Extraction​

Prefix Extraction​

In-place Trimming​

URL Path Extraction​