Version: 1.2.0

Trim Last

Text Processing String Manipulation Data Cleaning

Synopsis

A text processing processor that removes a specified number of characters or predefined keywords from the end of strings, providing precise control over suffix removal for data cleaning and normalization tasks.

Schema

- trim_last:
    field: <ident>
    count: <integer>
    keywords: <string[]>
    target_field: <ident>
    description: <text>
    if: <script>
    ignore_failure: <boolean>
    ignore_missing: <boolean>
    on_failure: <processor[]>
    on_success: <processor[]>
    tag: <string>

Configuration

The following fields are used to define the processor:

Field	Required	Default	Description
`field`	Y	-	Field containing the string(s) to process
`count`	N	-	Number of characters to remove from end
`keywords`	N	-	Keywords to remove from end
`target_field`	N	`field`	Field to store the trimmed result
`description`	N	-	Explanatory note
`if`	N	-	Condition to run
`ignore_failure`	N	`false`	Continue if trimming fails
`ignore_missing`	N	`false`	Continue if source field doesn't exist
`on_failure`	N	-	See Handling Failures
`on_success`	N	-	See Handling Success
`tag`	N	-	Identifier

Details

The processor supports two trimming modes: character count-based trimming and keyword-based trimming. Both modes can be used together, with character trimming applied after keyword trimming to ensure precise suffix removal.

note

The processor supports both single strings and string arrays, applying the trimming operation to each string element.

Character count trimming removes the specified number of characters from the end of each string. If the count exceeds the string length, the entire string is removed, resulting in an empty string.

Keyword trimming removes matching suffixes from the end of strings. Multiple keywords can be specified, and each is checked sequentially for suffix matches.

warning

Ensure the count parameter contains valid numeric values to avoid processing errors.

Examples

Character Count Trimming

Removing last characters from strings...

{
  "filename": "document.pdf.tmp",
  "log_entry": "Connection established successfully."
}

- trim_last:
    field: filename
    count: "4"
    target_field: clean_filename
- trim_last:
    field: log_entry
    count: "1"
    target_field: no_period

removes the suffixes:

{
  "filename": "document.pdf.tmp",
  "log_entry": "Connection established successfully.",
  "clean_filename": "document.pdf",
  "no_period": "Connection established successfully"
}

Keyword Trimming

Removing specific keywords from end...

{
  "temp_file": "backup_data.txt.bak",
  "archive_file": "logs_2024.tar.gz"
}

- trim_last:
    field: temp_file
    keywords: [".bak"]
    target_field: original_file
- trim_last:
    field: archive_file
    keywords: [".tar.gz", ".zip"]
    target_field: base_name

removes the file extensions:

{
  "temp_file": "backup_data.txt.bak",
  "archive_file": "logs_2024.tar.gz",
  "original_file": "backup_data.txt",
  "base_name": "logs_2024"
}

Array Processing

Processing string arrays...

{
  "urls": [
    "https://api.example.com/users/",
    "https://api.example.com/orders/",
    "https://api.example.com/products/"
  ]
}

- trim_last:
    field: urls
    keywords: ["/"]
    target_field: clean_urls

removes trailing slashes:

{
  "urls": [
    "https://api.example.com/users/",
    "https://api.example.com/orders/",
    "https://api.example.com/products/"
  ],
  "clean_urls": [
    "https://api.example.com/users",
    "https://api.example.com/orders",
    "https://api.example.com/products"
  ]
}

File Extension Removal

Removing various file extensions...

{
  "documents": [
    "report.pdf",
    "data.xlsx",
    "image.jpg",
    "archive.tar.gz"
  ]
}

- trim_last:
    field: documents
    keywords: [".pdf", ".xlsx", ".jpg", ".tar.gz", ".zip"]

removes all file extensions:

{
  "documents": [
    "report",
    "data",
    "image",
    "archive"
  ]
}

Combined Trimming

Using both keywords and character count...

{
  "log_message": "Database connection timeout error   "
}

- trim_last:
    field: log_message
    count: "3"
    keywords: ["error"]
    target_field: clean_message

applies both trimming methods:

{
  "log_message": "Database connection timeout error   ",
  "clean_message": "Database connection timeout"
}

Conditional Trimming

Trimming based on conditions...

{
  "request_path": "/api/v1/users.json",
  "format": "json"
}

- trim_last:
    field: request_path
    keywords: [".json"]
    if: "format == 'json'"
    target_field: clean_path

applies trimming when condition matches:

{
  "request_path": "/api/v1/users.json",
  "format": "json",
  "clean_path": "/api/v1/users"
}

Synopsis​

Schema​

Configuration​

Details​

Examples​

Character Count Trimming​

Keyword Trimming​

Array Processing​

File Extension Removal​

Combined Trimming​

Conditional Trimming​