Skip to main content
Version: 1.2.0

Trim Last

Text Processing String Manipulation Data Cleaning

Synopsis

A text processing processor that removes a specified number of characters or predefined keywords from the end of strings, providing precise control over suffix removal for data cleaning and normalization tasks.

Schema

- trim_last:
field: <ident>
count: <integer>
keywords: <string[]>
target_field: <ident>
description: <text>
if: <script>
ignore_failure: <boolean>
ignore_missing: <boolean>
on_failure: <processor[]>
on_success: <processor[]>
tag: <string>

Configuration

The following fields are used to define the processor:

FieldRequiredDefaultDescription
fieldY-Field containing the string(s) to process
countN-Number of characters to remove from end
keywordsN-Keywords to remove from end
target_fieldNfieldField to store the trimmed result
descriptionN-Explanatory note
ifN-Condition to run
ignore_failureNfalseContinue if trimming fails
ignore_missingNfalseContinue if source field doesn't exist
on_failureN-See Handling Failures
on_successN-See Handling Success
tagN-Identifier

Details

The processor supports two trimming modes: character count-based trimming and keyword-based trimming. Both modes can be used together, with character trimming applied after keyword trimming to ensure precise suffix removal.

note

The processor supports both single strings and string arrays, applying the trimming operation to each string element.

Character count trimming removes the specified number of characters from the end of each string. If the count exceeds the string length, the entire string is removed, resulting in an empty string.

Keyword trimming removes matching suffixes from the end of strings. Multiple keywords can be specified, and each is checked sequentially for suffix matches.

warning

Ensure the count parameter contains valid numeric values to avoid processing errors.

Examples

Character Count Trimming

Removing last characters from strings...

{
"filename": "document.pdf.tmp",
"log_entry": "Connection established successfully."
}
- trim_last:
field: filename
count: "4"
target_field: clean_filename
- trim_last:
field: log_entry
count: "1"
target_field: no_period

removes the suffixes:

{
"filename": "document.pdf.tmp",
"log_entry": "Connection established successfully.",
"clean_filename": "document.pdf",
"no_period": "Connection established successfully"
}

Keyword Trimming

Removing specific keywords from end...

{
"temp_file": "backup_data.txt.bak",
"archive_file": "logs_2024.tar.gz"
}
- trim_last:
field: temp_file
keywords: [".bak"]
target_field: original_file
- trim_last:
field: archive_file
keywords: [".tar.gz", ".zip"]
target_field: base_name

removes the file extensions:

{
"temp_file": "backup_data.txt.bak",
"archive_file": "logs_2024.tar.gz",
"original_file": "backup_data.txt",
"base_name": "logs_2024"
}

Array Processing

Processing string arrays...

{
"urls": [
"https://api.example.com/users/",
"https://api.example.com/orders/",
"https://api.example.com/products/"
]
}
- trim_last:
field: urls
keywords: ["/"]
target_field: clean_urls

removes trailing slashes:

{
"urls": [
"https://api.example.com/users/",
"https://api.example.com/orders/",
"https://api.example.com/products/"
],
"clean_urls": [
"https://api.example.com/users",
"https://api.example.com/orders",
"https://api.example.com/products"
]
}

File Extension Removal

Removing various file extensions...

{
"documents": [
"report.pdf",
"data.xlsx",
"image.jpg",
"archive.tar.gz"
]
}
- trim_last:
field: documents
keywords: [".pdf", ".xlsx", ".jpg", ".tar.gz", ".zip"]

removes all file extensions:

{
"documents": [
"report",
"data",
"image",
"archive"
]
}

Combined Trimming

Using both keywords and character count...

{
"log_message": "Database connection timeout error "
}
- trim_last:
field: log_message
count: "3"
keywords: ["error"]
target_field: clean_message

applies both trimming methods:

{
"log_message": "Database connection timeout error ",
"clean_message": "Database connection timeout"
}

Conditional Trimming

Trimming based on conditions...

{
"request_path": "/api/v1/users.json",
"format": "json"
}
- trim_last:
field: request_path
keywords: [".json"]
if: "format == 'json'"
target_field: clean_path

applies trimming when condition matches:

{
"request_path": "/api/v1/users.json",
"format": "json",
"clean_path": "/api/v1/users"
}