HTML Strip
Synopsis
Removes HTML tags from text fields while preserving the content between the tags.
Schema
html_strip:
- field: <ident>
- description: <text>
- if: <script>
- ignore_failure: <boolean>
- ignore_missing: <boolean>
- on_failure: <processor[]>
- on_success: <processor[]>
- tag: <string>
- target_field: <ident>
Configuration
Field | Required | Default | Description |
---|---|---|---|
field | Y | - | Source field containing HTML content |
description | N | - | Documentation note |
if | N | - | Conditional expression |
ignore_failure | N | false | Skip processing errors |
ignore_missing | N | false | Skip if input field missing |
on_failure | N | - | Error handling processors |
on_success | N | - | Success handling processors |
tag | N | - | Identifier for logging |
target_field | N | field | Output field for stripped text |
Details
The processor maintains text readability by preserving whitespace and text flow, making it useful for extracting plain text content from HTML-formatted fields.
While removing the markup tags, the text order and spacing between elements are preserved.
The processor is useful for extracting text from user-submitted emails or web content to prepare the content for display formatting, textual analysis and search indexing, or natural language processing.
The processor expects valid HTML input. Malformed HTML may produce unexpected results.
Examples
Simple HTML
Removing tags from HTML content... |
|
reveals the text content: |
|
Complex HTML
Handling nested tags and attributes... |
|
yields clean text content: |
|
Keep Original
Separating the original and stripped versions... |
|
preserves the original HTML: |
|
Conditionals
Stripping HTML only when needed... |
|
continues execution: |
|
Error Handling
Handling missing or invalid fields... |
|
continues execution: |
|