IAttributeSequenceHandler
, ICDATASectionHandler
, ICommentHandler
, IDocTypeHandler
, IDocumentHandler
, IElementHandler
, IMarkupHandler
, IProcessingInstructionHandler
, ITextHandler
, IXMLDeclarationHandler
public final class TextOutputMarkupHandler extends AbstractMarkupHandler
Implementation of IMarkupHandler
used for writing received parsing events as text output,
by ignoring all events except the Text ones. This means this handler will effectively strip all
markup tags (and other structures like comments, CDATA, etc.) away.
Note that, as with most handlers, this class is not thread-safe. Also, instances of this class should not be reused across parsing operations.
Sample usage:
final Writer writer = new StringWriter();
final IMarkupHandler handler = new TextOutputMarkupHandler(writer);
parser.parse(document, handler);
return writer.toString();
Constructor | Description |
---|---|
TextOutputMarkupHandler(java.io.Writer writer) |
Creates a new instance of this handler.
|
Modifier and Type | Method | Description |
---|---|---|
void |
handleText(char[] buffer,
int offset,
int len,
int line,
int col) |
Called when a text artifact is found.
|
handleAttribute, handleAutoCloseElementEnd, handleAutoCloseElementStart, handleAutoOpenElementEnd, handleAutoOpenElementStart, handleCDATASection, handleCloseElementEnd, handleCloseElementStart, handleComment, handleDocType, handleDocumentEnd, handleDocumentStart, handleInnerWhiteSpace, handleOpenElementEnd, handleOpenElementStart, handleProcessingInstruction, handleStandaloneElementEnd, handleStandaloneElementStart, handleUnmatchedCloseElementEnd, handleUnmatchedCloseElementStart, handleXmlDeclaration, setParseConfiguration, setParseSelection, setParseStatus
public TextOutputMarkupHandler(java.io.Writer writer)
Creates a new instance of this handler.
writer
- the writer to which output will be written.public void handleText(char[] buffer, int offset, int len, int line, int col) throws ParseException
ITextHandler
Called when a text artifact is found.
A sequence of chars is considered to be text when no structures of any kind are contained inside it. In markup parsers, for example, this means no tags (a.k.a. elements), DOCTYPE's, processing instructions, etc. are contained in the sequence.
Text sequences might include any number of new line and/or control characters.
Text artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported texts should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).
Implementations of this handler should never modify the document buffer.
handleText
in interface ITextHandler
handleText
in class AbstractMarkupHandler
buffer
- the document buffer (not copied)offset
- the offset (position in buffer) where the text artifact starts.len
- the length (in chars) of the text artifact, starting in offset.line
- the line in the original document where this text artifact starts.col
- the column in the original document where this text artifact starts.ParseException
- if any exceptions occur during handling.Copyright © 2018 The ATTOPARSER team. All rights reserved.