Skip to main content

HTML Strip

Mutate Elastic Compatible

Synopsis

Removes HTML tags from text fields while preserving the content between the tags.

Schema

html_strip:
- field: <ident>
- description: <text>
- if: <script>
- ignore_failure: <boolean>
- ignore_missing: <boolean>
- on_failure: <processor[]>
- on_success: <processor[]>
- tag: <string>
- target_field: <ident>

Configuration

FieldRequiredDefaultDescription
fieldY-Source field containing HTML content
descriptionN-Documentation note
ifN-Conditional expression
ignore_failureNfalseSkip processing errors
ignore_missingNfalseSkip if input field missing
on_failureN-Error handling processors
on_successN-Success handling processors
tagN-Identifier for logging
target_fieldNfieldOutput field for stripped text

Details

The processor maintains text readability by preserving whitespace and text flow, making it useful for extracting plain text content from HTML-formatted fields.

note

While removing the markup tags, the text order and spacing between elements are preserved.

The processor is useful for extracting text from user-submitted emails or web content to prepare the content for display formatting, textual analysis and search indexing, or natural language processing.

warning

The processor expects valid HTML input. Malformed HTML may produce unexpected results.

Examples

Simple HTML

Removing tags from HTML content...

{
"message": "<div>Hello <strong>World</strong>!</div>"
}
html_strip:
- field: message

reveals the text content:

{
"message": "Hello World!"
}

Complex HTML

Handling nested tags and attributes...

{
"html": "<div class='content'><h1>Title</h1><p>Text with <em>emphasis</em> and <a href='#'>links</a></p></div>"
}
html_strip:
- field: html
- target_field: text

yields clean text content:

{
"html": "<div class='content'><h1>Title</h1><p>Text with <em>emphasis</em> and <a href='#'>links</a></p></div>",
"text": "Title Text with emphasis and links"
}

Keep Original

Separating the original and stripped versions...

{
"content": "<p>First paragraph</p><p>Second paragraph</p>"
}
html_strip:
- field: content
- target_field: plain_text

preserves the original HTML:

{
"content": "<p>First paragraph</p><p>Second paragraph</p>",
"plain_text": "First paragraph Second paragraph"
}

Conditionals

Stripping HTML only when needed...

{
"message": "<span>Important notice</span>",
"should_strip": true
}
html_strip:
- field: message
- if: "ctx.should_strip == true"

continues execution:

{
"message": "Important notice",
"should_strip": true
}

Error Handling

Handling missing or invalid fields...

{
"other_field": "value"
}
html_strip:
- field: html_content
- ignore_missing: true
- on_failure:
- set:
- field: processing_status
- value: "field_missing"

continues execution:

{
"other_field": "value",
"processing_status": "field_missing"
}