|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface IAttoHandler
Common interface for all handler implementations. An object implementing this interface
has to be provided to IAttoParser
objects in order to parse a document.
Event Handling
At its most basic, a handler processes four events:
handleDocumentStart(int, int)
): triggered at the beginning of
document parsing.handleDocumentEnd(int, int)
): triggered at the end of document
parsing.handleText(char[], int, int, int, int)
): for texts inside the
document being parsed, containing no instructions or metainformation of any kind.handleStructure(char[], int, int, int, int)
): for all kinds of
directives, instructions, metainformation or formatting data inside the document.For example, a markup-specialized parser (HTML and XML) will consider tags (a.k.a. elements), DOCTYPE clauses, etc. as structures.
Even if document parsing events at their most basic level are only divided between texts and structures, some implementations of this interface might decide to specialize events even more, like for example differentiating between opening and closing elements, attributes, etc.
Event features
Most attohandler events have two important features:
Provided Handlers
Several IAttoHandler
implementations with diverse levels of detail are provided
out-of-the-box:
AbstractAttoHandler
: basic implementation only differentiating
between text and structuresAbstractBasicMarkupAttoHandler
: markup-specialized
(XML and HTML) abstract handler able to differentiate among different
types of markup structures: Elements, comments, CDATA, DOCTYPE, etc.AbstractDetailedMarkupAttoHandler
: markup-specialized
(XML and HTML) abstract handler able not only to differentiate among different
types of markup structures, but also of reporting lowel-level detail inside
elements (name, attributes, inner whitespace) and DOCTYPE clauses.AbstractStandardMarkupAttoHandler
: higher-level
markup-specialized (XML and HTML) abstract handler that offers an interface
more similar to the Standard SAX ContentHandler
s (use of
Strings instead of char[]'s, attribute maps, etc).AbstractDetailedXmlAttoHandler
: XML-specialized
abstract handler equivalent to AbstractDetailedMarkupAttoHandler
but only allowing XML markup.AbstractStandardXmlAttoHandler
: XML-specialized
abstract handler equivalent to AbstractStandardMarkupAttoHandler
but only allowing XML markup.DOMXmlAttoHandler
: handler implementation
(non-abstract) for building an attoDOM tree (DOM node tres based on classes
from the org.attoparser.markup.dom package) from XML markup.Creating handler implementations
The usual way to create an IAttoHandler
implementation for parsing documents is to
extend one of the provided abstract implementations (see above) and provide an implementation
for the methods that are relevant for parsing.
*Handling interfaces
Specific IAttoHandler
implementations (abstract or concrete) usually aggregate
event features by means of implementing *Handling interfaces that define
these features.
Thread safety
Unless contrary specified, implementations of this interface are not thread-safe.
Method Summary | |
---|---|
void |
handleDocumentEnd(int line,
int col)
Called at the end of document parsing. |
void |
handleDocumentStart(int line,
int col)
Called at the beginning of document parsing. |
void |
handleStructure(char[] buffer,
int offset,
int len,
int line,
int col)
Called when a structure artifact is found. |
void |
handleText(char[] buffer,
int offset,
int len,
int line,
int col)
Called when a text artifact is found. |
Method Detail |
---|
void handleDocumentStart(int line, int col) throws AttoParseException
Called at the beginning of document parsing.
line
- the line of the document where parsing starts (usually number 1)col
- the column of the document where parsing starts (usually number 1)
AttoParseException
void handleDocumentEnd(int line, int col) throws AttoParseException
Called at the end of document parsing.
line
- the line of the document where parsing ends (usually the last one)col
- the column of the document where the parsing ends (usually the last one)
AttoParseException
void handleText(char[] buffer, int offset, int len, int line, int col) throws AttoParseException
Called when a text artifact is found.
A sequence of chars is considered to be text when no structures of any kind are contained inside it. In markup parsers, for example, this means no tags (a.k.a. elements), DOCTYPE's, processing instructions, etc. are contained in the sequence.
Text sequences might include any number of new line and/or control characters.
Text artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported texts should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).
Implementations of this handler should never modify the document buffer.
buffer
- the document buffer (not copied)offset
- the offset (position in buffer) where the text artifact starts.len
- the length (in chars) of the text artifact, starting in offset.line
- the line in the original document where this text artifact starts.col
- the column in the original document where this text artifact starts.
AttoParseException
void handleStructure(char[] buffer, int offset, int len, int line, int col) throws AttoParseException
Called when a structure artifact is found.
Depending on the specific IAttoParser
implementation being used,
"structure" might have a different meaning. In markup-oriented parsers (like the default
MarkupAttoParser
) implementation provided, structures
like tags (a.k.a. elements), DOCTYPEs, XML Declarations, processing instructions,
etc. are reported using this event handler.
Lower-level IAttoHandler
implementations will usually provide a finer-grained
differentiation among the different types of structures (see for example
AbstractBasicMarkupAttoHandler
or
AbstractDetailedMarkupAttoHandler
).
Structure artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).
Implementations of this handler should never modify the document buffer.
buffer
- the document buffer (not copied)offset
- the offset (position in buffer) where the structure artifact starts.len
- the length (in chars) of the structure artifact, starting in offset.line
- the line in the original document where this structure artifact starts.col
- the column in the original document where this structure artifact starts.
AttoParseException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |