java.io.Serializable
, Cloneable
public final class ParseConfiguration
extends Object
implements java.io.Serializable, Cloneable
Models a series of parsing configurations that can be applied during document parsing
by MarkupParser
and its variants
SimpleMarkupParser
and DOMMarkupParser
.
Among others, the parameters that can be configured are:
The htmlConfiguration()
and xmlConfiguration()
static methods act as starting points
for configuration. Once one of these pre-initialized configurations has been created, it can be
fine-tuned for the user's needs.
Note these configuration objects are mutable, so they should not be modified once they have been passed to a parser in order to initialize it.
Instances of this class can be cloned, so creating a variant of an already-tuned configuration is easy.
Modifier and Type | Class | Description |
---|---|---|
static class |
ParseConfiguration.ElementBalancing |
Enumeration representing the possible actions to be taken with regard to element balancing:
|
static class |
ParseConfiguration.ParsingMode |
Enumeration used for determining the parsing mode, which will affect the parser's behaviour.
|
static class |
ParseConfiguration.PrologParseConfiguration |
Class encapsulating the configuration parameters used for parsing
and validating the "prolog" section of a markup document.
|
static class |
ParseConfiguration.PrologPresence |
Enumeration used for determining whether an element in the document prolog (DOCTYPE, XML Declaration) or
the prolog itself should be allowed, required or even forbidden.
|
static class |
ParseConfiguration.UniqueRootElementPresence |
Enumeration used for determining the behaviour the parser should have with respect to the presence and
number of root elements in the parsed document.
|
Modifier and Type | Method | Description |
---|---|---|
ParseConfiguration |
clone() |
|
ParseConfiguration.ElementBalancing |
getElementBalancing() |
Returns the level of element balancing required at the document being parsed,
enabling auto-closing of elements if needed.
|
ParseConfiguration.ParsingMode |
getMode() |
Return the parsing mode to be used.
|
ParseConfiguration.PrologParseConfiguration |
getPrologParseConfiguration() |
Returns the
ParseConfiguration.PrologParseConfiguration object determining the
way in which prolog (XML Declaration, DOCTYPE) will be dealt with during parsing. |
ParseConfiguration.UniqueRootElementPresence |
getUniqueRootElementPresence() |
This value determines whether it will be required that the document has a unique
root element.
|
static ParseConfiguration |
htmlConfiguration() |
Return an instance of
ParseConfiguration containing a valid configuration
set for most HTML scenarios. |
boolean |
isCaseSensitive() |
Returns whether validations performed on the parsed document should be
case sensitive or not (e.g.
|
boolean |
isNoUnmatchedCloseElementsRequired() |
Returns whether unmatched close elements (those not matching any equivalent open elements) are
allowed or not.
|
boolean |
isTextSplittable() |
Returns whether text fragments in markup can be split in more than one text node, if it
occupies more than an entire buffer in size.
|
boolean |
isUniqueAttributesInElementRequired() |
Returns whether attributes should never appear duplicated in elements.
|
boolean |
isXmlWellFormedAttributeValuesRequired() |
Returns whether element attributes will be required to be well-formed from the XML
standpoint.
|
void |
setCaseSensitive(boolean caseSensitive) |
Specify whether validations performed on the parsed document should be
case sensitive or not (e.g.
|
void |
setElementBalancing(ParseConfiguration.ElementBalancing elementBalancing) |
Specify the level of element balancing required at the document being parsed,
enabling auto-closing of elements if needed.
|
void |
setMode(ParseConfiguration.ParsingMode mode) |
Specify the parsing mode to be used.
|
void |
setNoUnmatchedCloseElementsRequired(boolean noUnmatchedCloseElementsRequired) |
Specify whether unmatched close elements (those not matching any equivalent open elements) are
allowed or not.
|
void |
setTextSplittable(boolean textSplittable) |
Specify whether text fragments in markup can be split in more than one text node, if it
occupies more than an entire buffer in size.
|
void |
setUniqueAttributesInElementRequired(boolean uniqueAttributesInElementRequired) |
Returns whether attributes should never appear duplicated in elements.
|
void |
setUniqueRootElementPresence(ParseConfiguration.UniqueRootElementPresence uniqueRootElementPresence) |
This value determines whether it will be required that the document has a unique
root element.
|
void |
setXmlWellFormedAttributeValuesRequired(boolean xmlWellFormedAttributeValuesRequired) |
Specify whether element attributes will be required to be well-formed from the XML
standpoint.
|
static ParseConfiguration |
xmlConfiguration() |
Return an instance of
ParseConfiguration containing a valid configuration
set for most XML scenarios. |
public static ParseConfiguration htmlConfiguration()
Return an instance of ParseConfiguration
containing a valid configuration
set for most HTML scenarios.
ParseConfiguration.ParsingMode.HTML
ParseConfiguration.ElementBalancing.AUTO_CLOSE
ParseConfiguration.UniqueRootElementPresence.NOT_VALIDATED
public static ParseConfiguration xmlConfiguration()
Return an instance of ParseConfiguration
containing a valid configuration
set for most XML scenarios.
ParseConfiguration.ParsingMode.XML
ParseConfiguration.ElementBalancing.REQUIRE_BALANCED
ParseConfiguration.UniqueRootElementPresence.DEPENDS_ON_PROLOG_DOCTYPE
ParseConfiguration.PrologPresence.ALLOWED
ParseConfiguration.PrologPresence.ALLOWED
ParseConfiguration.PrologPresence.ALLOWED
public ParseConfiguration.ParsingMode getMode()
Return the parsing mode to be used. Can be XML or HTML.
Depending on the selected mode parsers will behave differently, given HTML has some specific rules which are not XML-compatible (like void elements which might appear unclosed like <meta>.
public void setMode(ParseConfiguration.ParsingMode mode)
Specify the parsing mode to be used. Can be XML or HTML.
Depending on the selected mode parsers will behave differently, given HTML has some specific rules which are not XML-compatible (like void elements which might appear unclosed like <meta>.
mode
- the parsing mode to be used.public boolean isCaseSensitive()
Returns whether validations performed on the parsed document should be case sensitive or not (e.g. attribute names, document root element name, element open vs close elements, etc.)
HTML requires this parameter to be false. Default for XML is true.
public void setCaseSensitive(boolean caseSensitive)
Specify whether validations performed on the parsed document should be case sensitive or not (e.g. attribute names, document root element name, element open vs close elements, etc.)
HTML requires this parameter to be false. Default for XML is true.
caseSensitive
- whether validations should be case sensitive or not.public boolean isTextSplittable()
Returns whether text fragments in markup can be split in more than one text node, if it occupies more than an entire buffer in size.
Default is false.
public void setTextSplittable(boolean textSplittable)
Specify whether text fragments in markup can be split in more than one text node, if it occupies more than an entire buffer in size.
Default is false.
textSplittable
- whether text fragments can be split or not.public ParseConfiguration.ElementBalancing getElementBalancing()
Returns the level of element balancing required at the document being parsed, enabling auto-closing of elements if needed.
Possible values are:
ParseConfiguration.ElementBalancing.NO_BALANCING
: Do not perform element balancing checks at all. Events will be
reported as they appear. There is no guarantee that a DOM tree can be built from the
fired events though.ParseConfiguration.ElementBalancing.REQUIRE_BALANCED
: Require that elements are already correctly balanced in markup,
throwing an exception if not. Note that when in HTML mode, this does not require the
specification of optional tags such as <tbody>. Also note that this
will automatically consider the
setNoUnmatchedCloseElementsRequired(boolean)
flag to be set to true.ParseConfiguration.ElementBalancing.AUTO_OPEN_CLOSE
: Auto open and close elements, which includes both those elements that,
according to the HTML spec (when in HTML mode) have optional start or end tags (see
http://www.w3.org/html/wg/drafts/html/master/syntax.html#optional-tags)
and those that simply are unclosed at the moment a parent element needs to be closed (so their closing
is forced). As an example of optional tags, the HTML5 spec
establishes that <html>, <body> and <tbody> are optional, and
that an <li> will close any currently
open <li> elements. This is not really
ill-formed code, but something allowed by the spec. All of these will be
reported as auto-* events by the parser.ParseConfiguration.ElementBalancing.AUTO_CLOSE
: Equivalent to ParseConfiguration.ElementBalancing.AUTO_OPEN_CLOSE
but not performing any auto-open
operations, so that processing of HTML fragments is possible (no <html> or
<body> elements are automatically added).public void setElementBalancing(ParseConfiguration.ElementBalancing elementBalancing)
Specify the level of element balancing required at the document being parsed, enabling auto-closing of elements if needed.
Possible values are:
ParseConfiguration.ElementBalancing.NO_BALANCING
: Do not perform element balancing checks at all. Events will be
reported as they appear. There is no guarantee that a DOM tree can be built from the
fired events though.ParseConfiguration.ElementBalancing.REQUIRE_BALANCED
: Require that elements are already correctly balanced in markup,
throwing an exception if not. Note that when in HTML mode, this does not require the
specification of optional tags such as <tbody>. Also note that this
will automatically consider the
setNoUnmatchedCloseElementsRequired(boolean)
flag to be set to true.ParseConfiguration.ElementBalancing.AUTO_OPEN_CLOSE
: Auto open and close elements, which includes both those elements that,
according to the HTML spec (when in HTML mode) have optional start or end tags (see
http://www.w3.org/html/wg/drafts/html/master/syntax.html#optional-tags)
and those that simply are unclosed at the moment a parent element needs to be closed (so their closing
is forced). As an example of optional tags, the HTML5 spec
establishes that <html>, <body> and <tbody> are optional, and
that an <li> will close any currently
open <li> elements. This is not really
ill-formed code, but something allowed by the spec. All of these will be
reported as auto-* events by the parser.ParseConfiguration.ElementBalancing.AUTO_CLOSE
: Equivalent to ParseConfiguration.ElementBalancing.AUTO_OPEN_CLOSE
but not performing any auto-open
operations, so that processing of HTML fragments is possible (no <html> or
<body> elements are automatically added).elementBalancing
- the level of element balancing.public ParseConfiguration.PrologParseConfiguration getPrologParseConfiguration()
Returns the ParseConfiguration.PrologParseConfiguration
object determining the
way in which prolog (XML Declaration, DOCTYPE) will be dealt with during parsing.
public boolean isNoUnmatchedCloseElementsRequired()
Returns whether unmatched close elements (those not matching any equivalent open elements) are allowed or not.
public void setNoUnmatchedCloseElementsRequired(boolean noUnmatchedCloseElementsRequired)
Specify whether unmatched close elements (those not matching any equivalent open elements) are allowed or not.
noUnmatchedCloseElementsRequired
- whether unmatched close elements will be allowed
(false) or not (true).public boolean isXmlWellFormedAttributeValuesRequired()
Returns whether element attributes will be required to be well-formed from the XML standpoint. This means:
public void setXmlWellFormedAttributeValuesRequired(boolean xmlWellFormedAttributeValuesRequired)
Specify whether element attributes will be required to be well-formed from the XML standpoint. This means:
xmlWellFormedAttributeValuesRequired
- whether attributes should be XML-well-formed or not.public boolean isUniqueAttributesInElementRequired()
Returns whether attributes should never appear duplicated in elements.
public void setUniqueAttributesInElementRequired(boolean uniqueAttributesInElementRequired)
Returns whether attributes should never appear duplicated in elements.
uniqueAttributesInElementRequired
- whether attributes should never appear duplicated in elements.public ParseConfiguration.UniqueRootElementPresence getUniqueRootElementPresence()
This value determines whether it will be required that the document has a unique root element.
If set to ParseConfiguration.UniqueRootElementPresence.REQUIRED_ALWAYS
, then a document with
more than one elements at the root level will never be considered valid. And if
ParseConfiguration.PrologParseConfiguration.isValidateProlog()
is true and there is a DOCTYPE
clause, it will be checked that the root name established at the DOCTYPE clause
is the same as the document's element root.
If set to ParseConfiguration.UniqueRootElementPresence.DEPENDS_ON_PROLOG_DOCTYPE
, then:
ParseConfiguration.PrologParseConfiguration.isValidateProlog()
is false, multiple
document root elements will be allowed.ParseConfiguration.PrologParseConfiguration.isValidateProlog()
is true:
If set to ParseConfiguration.UniqueRootElementPresence.NOT_VALIDATED
, then nothing will be checked
regarding the name of the root element/s.
Default value is ParseConfiguration.UniqueRootElementPresence.DEPENDS_ON_PROLOG_DOCTYPE
.
public void setUniqueRootElementPresence(ParseConfiguration.UniqueRootElementPresence uniqueRootElementPresence)
This value determines whether it will be required that the document has a unique root element.
If set to ParseConfiguration.UniqueRootElementPresence.REQUIRED_ALWAYS
, then a document with
more than one elements at the root level will never be considered valid. And if
ParseConfiguration.PrologParseConfiguration.isValidateProlog()
is true and there is a DOCTYPE
clause, it will be checked that the root name established at the DOCTYPE clause
is the same as the document's element root.
If set to ParseConfiguration.UniqueRootElementPresence.DEPENDS_ON_PROLOG_DOCTYPE
, then:
ParseConfiguration.PrologParseConfiguration.isValidateProlog()
is false, multiple
document root elements will be allowed.ParseConfiguration.PrologParseConfiguration.isValidateProlog()
is true:
If set to ParseConfiguration.UniqueRootElementPresence.NOT_VALIDATED
, then nothing will be checked
regarding the name of the root element/s.
Default value is ParseConfiguration.UniqueRootElementPresence.DEPENDS_ON_PROLOG_DOCTYPE
.
uniqueRootElementPresence
- the configuration value for validating the presence of a unique root element.public ParseConfiguration clone() throws CloneNotSupportedException
clone
in class Object
CloneNotSupportedException
Copyright © 2018 The ATTOPARSER team. All rights reserved.