org.attoparser
Interface IAttoHandler

All Known Implementing Classes:
AbstractAttoHandler, AbstractBasicMarkupAttoHandler, AbstractDetailedMarkupAttoHandler, AbstractStandardMarkupAttoHandler, DOMMarkupAttoHandler, DuplicatingBasicMarkupAttoHandler, DuplicatingDetailedMarkupAttoHandler, TracingBasicMarkupAttoHandler, TracingDetailedMarkupAttoHandler, TracingStandardMarkupAttoHandler

public interface IAttoHandler

Common interface for all handler implementations. An object implementing this interface has to be provided to IAttoParser objects in order to parse a document.

Event Handling

At its most basic, a handler processes four events:

For example, a markup-specialized parser (HTML and XML) will consider tags (a.k.a. elements), DOCTYPE clauses, etc. as structures.

Even if document parsing events at their most basic level are only divided between texts and structures, some implementations of this interface might decide to specialize events even more, like for example differentiating between opening and closing elements, attributes, etc.

Event features

Most attohandler events have two important features:

Provided Handlers

Several IAttoHandler implementations with diverse levels of detail are provided out-of-the-box:

Creating handler implementations

The usual way to create an IAttoHandler implementation for parsing documents is to extend one of the provided abstract implementations (see above) and provide an implementation for the methods that are relevant for parsing.

*Handling interfaces

Specific IAttoHandler implementations (abstract or concrete) usually aggregate event features by means of implementing *Handling interfaces that define these features.

Thread safety

Unless contrary specified, implementations of this interface are not thread-safe.

Since:
1.0
Author:
Daniel Fernández

Method Summary
 void handleDocumentEnd()
           Called at the end of document parsing.
 void handleDocumentStart()
           Called at the beginning of document parsing.
 void handleStructure(char[] buffer, int offset, int len, int line, int col)
           Called when a structure artifact is found.
 void handleText(char[] buffer, int offset, int len, int line, int col)
           Called when a text artifact is found.
 

Method Detail

handleDocumentStart

void handleDocumentStart()
                         throws AttoParseException

Called at the beginning of document parsing.

Throws:
AttoParseException

handleDocumentEnd

void handleDocumentEnd()
                       throws AttoParseException

Called at the end of document parsing.

Throws:
AttoParseException

handleText

void handleText(char[] buffer,
                int offset,
                int len,
                int line,
                int col)
                throws AttoParseException

Called when a text artifact is found.

A sequence of chars is considered to be text when no structures of any kind are contained inside it. In markup parsers, for example, this means no tags (a.k.a. elements), DOCTYPE's, processing instructions, etc. are contained in the sequence.

Text sequences might include any number of new line and/or control characters.

Text artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported texts should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

Implementations of this handler should never modify the document buffer.

Parameters:
buffer - the document buffer (not copied)
offset - the offset (position in buffer) where the text artifact starts.
len - the length (in chars) of the text artifact, starting in offset.
line - the line in the original document where this text artifact starts.
col - the column in the original document where this text artifact starts.
Throws:
AttoParseException

handleStructure

void handleStructure(char[] buffer,
                     int offset,
                     int len,
                     int line,
                     int col)
                     throws AttoParseException

Called when a structure artifact is found.

Depending on the specific IAttoParser implementation being used, "structure" might have a different meaning. In markup-oriented parsers (like the default MarkupAttoParser) implementation provided, structures like tags (a.k.a. elements), DOCTYPEs, XML Declarations, processing instructions, etc. are reported using this event handler.

Lower-level IAttoHandler implementations will usually provide a finer-grained differentiation among the different types of structures (see for example AbstractBasicMarkupAttoHandler or AbstractDetailedMarkupAttoHandler).

Structure artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

Implementations of this handler should never modify the document buffer.

Parameters:
buffer - the document buffer (not copied)
offset - the offset (position in buffer) where the structure artifact starts.
len - the length (in chars) of the structure artifact, starting in offset.
line - the line in the original document where this structure artifact starts.
col - the column in the original document where this structure artifact starts.
Throws:
AttoParseException


Copyright © 2012 The ATTOPARSER team. All Rights Reserved.