javax.swing.text.html.parser
public class Parser extends Object implements DTDConstants
A simple error-tolerant HTML parser that uses a DTD document to access data on the possible tokens, arguments and syntax.
The parser reads an HTML content from a Reader and calls various notifying methods (which should be overridden in a subclass) when tags or data are encountered.
Some HTML elements need no opening or closing tags. The task of this parser is to invoke the tag handling methods also when the tags are not explicitly specified and must be supposed using information, stored in the DTD. For example, parsing the document
<table><tr><td>a<td>b<td>c</tr>
will invoke exactly the handling methods exactly in the same order
(and with the same parameters) as if parsing the document:
<html><head></head><body><table><
tbody><tr><td>a</td><td>b
</td><td>c</td></tr><
/tbody></table></body></html>
Field Summary | |
---|---|
protected DTD | dtd
The document template description that will be used to parse the documents. |
protected boolean | strict
The value of this field determines whether or not the Parser will be
strict in enforcing SGML compatibility. |
Constructor Summary | |
---|---|
Parser(DTD a_dtd)
Creates a new parser that uses the given DTD to access data on the
possible tokens, arguments and syntax. |
Method Summary | |
---|---|
protected void | endTag(boolean omitted)
The method is called when the HTML end (closing) tag is found or if
the parser concludes that the one should be present in the
current position. |
protected void | error(String msg)
Invokes the error handler. |
protected void | error(String msg, String invalid)
Invokes the error handler. |
protected void | error(String parm1, String parm2, String parm3)
Invokes the error handler. |
protected void | error(String parm1, String parm2, String parm3, String parm4)
Invokes the error handler. |
protected void | flushAttributes()
In this implementation, this is never called and returns without action. |
protected SimpleAttributeSet | getAttributes()
Get the attributes of the current tag. |
protected int | getCurrentLine()
Get the number of the document line being parsed. |
protected int | getCurrentPos()
Get the current position in the document being parsed. |
protected void | handleComment(char[] comment)
Handle HTML comment. |
protected void | handleEmptyTag(TagElement tag)
Handle the tag with no content, like <br>. |
protected void | handleEndTag(TagElement tag)
The method is called when the HTML closing tag ((like </table>)
is found or if the parser concludes that the one should be present
in the current position. |
protected void | handleEOFInComment()
This is additionally called in when the HTML content terminates
without closing the HTML comment. |
protected void | handleError(int line, String message) |
protected void | handleStartTag(TagElement tag)
The method is called when the HTML opening tag ((like <table>)
is found or if the parser concludes that the one should be present
in the current position. |
protected void | handleText(char[] text)
Handle the text section.
|
protected void | handleTitle(char[] title)
Handle HTML <title> tag. |
protected TagElement | makeTag(Element element)
Constructs the tag from the given element. |
protected TagElement | makeTag(Element element, boolean isSupposed)
Constructs the tag from the given element. |
protected void | markFirstTime(Element element)
This is called when the tag, representing the given element,
occurs first time in the document. |
void | parse(Reader reader)
Parse the HTML text, calling various methods in response to the
occurence of the corresponding HTML constructions. |
String | parseDTDMarkup()
Parses DTD markup declaration. |
protected boolean | parseMarkupDeclarations(StringBuffer strBuff)
Parse DTD document declarations. |
protected void | startTag(TagElement tag)
The method is called when the HTML opening tag ((like <table>)
is found or if the parser concludes that the one should be present
in the current position. |
Parameters: a_dtd A DTD to use.
Parameters: omitted True if the tag is no actually present in the document, but is supposed by the parser (like </html> at the end of the document).
Returns: The attribute set, representing the attributes of the current tag.
Returns: The current line.
Returns: The current position.
Parameters: comment The comment being handled
Parameters: tag The tag being handled.
Throws: javax.swing.text.ChangedCharSetException
Parameters: tag The tag being handled
Parameters: tag The tag being handled
For non-preformatted section, the parser replaces \t, \r and \n by spaces and then multiple spaces by a single space. Additionaly, all whitespace around tags is discarded.
For pre-formatted text (inside TEXAREA and PRE), the parser preserves all tabs and spaces, but removes one bounding \r, \n or \r\n, if it is present. Additionally, it replaces each occurence of \r or \r\n by a single \n.
Parameters: text A section text.
Parameters: title The title text.
Parameters: element the base element of the tag.
Returns: the tag
Parameters: element the tag base {@link javax.swing.text.html.parser.Element} isSupposed true if the tag is not actually present in the html input, but the parser supposes that it should to occur in the current location.
Returns: the tag
Parameters: element
Parameters: reader The reader to read the source HTML from.
Throws: IOException If the reader throws one.
Returns: null.
Throws: java.io.IOException
Parameters: strBuff
Returns: true if this is a valid DTD markup declaration.
Throws: IOException
Parameters: tag The tag