HTML Strip

Filter Elastic Compatible

Synopsis

Removes HTML tags from text fields while preserving the content between the tags.

Schema

html_strip:
  - field: <ident>
  - description: <text>
  - if: <script>
  - ignore_failure: <boolean>
  - ignore_missing: <boolean>
  - on_failure: <processor[]>
  - on_success: <processor[]>
  - tag: <string>
  - target_field: <ident>

Configuration

Field	Required	Default	Description
`field`	Y	-	Source field containing HTML content
`description`	N	-	Documentation note
`if`	N	-	Conditional expression
`ignore_failure`	N	`false`	Skip processing errors
`ignore_missing`	N	`false`	Skip if input field missing
`on_failure`	N	-	Error handling processors
`on_success`	N	-	Success handling processors
`tag`	N	-	Identifier for logging
`target_field`	N	`field`	Output field for stripped text

Details

The processor maintains text readability by preserving whitespace and text flow, making it useful for extracting plain text content from HTML-formatted fields.

note

While removing the markup tags, the text order and spacing between elements are preserved.

The processor is useful for extracting text from user-submitted emails or web content to prepare the content for display formatting, textual analysis and search indexing, or natural language processing.

warning

The processor expects valid HTML input. Malformed HTML may produce unexpected results.

Examples

Simple HTML

Removing tags from HTML content...	`{ "message": "<div>Hello <strong>World</strong>!</div>" }` `html_strip: - field: message`
reveals the text content:	`{ "message": "Hello World!" }`

Complex HTML

Handling nested tags and attributes...

{
  "html": "<div class='content'><h1>Title</h1><p>Text with <em>emphasis</em> and <a href='#'>links</a></p></div>"
}

html_strip:
  - field: html
  - target_field: text

yields clean text content:

{
  "html": "<div class='content'><h1>Title</h1><p>Text with <em>emphasis</em> and <a href='#'>links</a></p></div>",
  "text": "Title Text with emphasis and links"
}

Keep Original

Separating the original and stripped versions...

{
  "content": "<p>First paragraph</p><p>Second paragraph</p>"
}

html_strip:
  - field: content
  - target_field: plain_text

preserves the original HTML:

{
  "content": "<p>First paragraph</p><p>Second paragraph</p>",
  "plain_text": "First paragraph Second paragraph"
}

Conditionals

Stripping HTML only when needed...	`{ "message": "<span>Important notice</span>", "should_strip": true }` `html_strip: - field: message - if: "ctx.should_strip == true"`
continues execution:	`{ "message": "Important notice", "should_strip": true }`

Error Handling

Handling missing or invalid fields...

{
  "other_field": "value"
}

html_strip:
  - field: html_content
  - ignore_missing: true
  - on_failure:
      - set:
          - field: processing_status
          - value: "field_missing"

continues execution:

{
  "other_field": "value",
  "processing_status": "field_missing"
}

Synopsis​

Schema​

Configuration​

Details​

Examples​

Simple HTML​

Complex HTML​

Keep Original​

Conditionals​

Error Handling​