javax.swing.text.html.parser
Class Parser
- DTDConstants
A simple error-tolerant HTML parser that uses a DTD document
to access data on the possible tokens, arguments and syntax.
The parser reads an HTML content from a Reader and calls various
notifying methods (which should be overridden in a subclass)
when tags or data are encountered.
Some HTML elements need no opening or closing tags. The
task of this parser is to invoke the tag handling methods also when
the tags are not explicitly specified and must be supposed using
information, stored in the DTD.
For example, parsing the document
<table><tr><td>a<td>b<td>c</tr>
will invoke exactly the handling methods exactly in the same order
(and with the same parameters) as if parsing the document:
<html><head></head><body><table><
tbody><tr><td>a
</td><td>b
</td><td>c
</td></tr><
/tbody></table></body></html>
(supposed tags are given in italics). The parser also supports
obsolete elements of HTML syntax.
protected DTD | dtd - The document template description that will be used to parse the documents.
|
protected boolean | strict - The value of this field determines whether or not the Parser will be
strict in enforcing SGML compatibility.
|
ANY , CDATA , CONREF , CURRENT , DEFAULT , EMPTY , ENDTAG , ENTITIES , ENTITY , FIXED , GENERAL , ID , IDREF , IDREFS , IMPLIED , MD , MODEL , MS , NAME , NAMES , NMTOKEN , NMTOKENS , NOTATION , NUMBER , NUMBERS , NUTOKEN , NUTOKENS , PARAMETER , PI , PUBLIC , RCDATA , REQUIRED , SDATA , STARTTAG , SYSTEM |
Parser(DTD a_dtd) - Creates a new parser that uses the given DTD to access data on the
possible tokens, arguments and syntax.
|
protected void | endTag(boolean omitted) - The method is called when the HTML end (closing) tag is found or if
the parser concludes that the one should be present in the
current position.
|
protected void | error(String msg) - Invokes the error handler.
|
protected void | error(String msg, String invalid) - Invokes the error handler.
|
protected void | error(String parm1, String parm2, String parm3) - Invokes the error handler.
|
protected void | error(String parm1, String parm2, String parm3, String parm4) - Invokes the error handler.
|
protected void | flushAttributes() - In this implementation, this is never called and returns without action.
|
protected SimpleAttributeSet | getAttributes() - Get the attributes of the current tag.
|
protected int | getCurrentLine() - Get the number of the document line being parsed.
|
protected int | getCurrentPos() - Get the current position in the document being parsed.
|
protected void | handleComment(char[] comment) - Handle HTML comment.
|
protected void | handleEOFInComment() - This is additionally called in when the HTML content terminates
without closing the HTML comment.
|
protected void | handleEmptyTag(TagElement tag) - Handle the tag with no content, like <br>.
|
protected void | handleEndTag(TagElement tag) - The method is called when the HTML closing tag ((like </table>)
is found or if the parser concludes that the one should be present
in the current position.
|
protected void | handleError(int line, String message)
|
protected void | handleStartTag(TagElement tag) - The method is called when the HTML opening tag ((like <table>)
is found or if the parser concludes that the one should be present
in the current position.
|
protected void | handleText(char[] text) - Handle the text section.
|
protected void | handleTitle(char[] title) - Handle HTML <title> tag.
|
protected TagElement | makeTag(Element element) - Constructs the tag from the given element.
|
protected TagElement | makeTag(Element element, boolean isSupposed) - Constructs the tag from the given element.
|
protected void | markFirstTime(Element element) - This is called when the tag, representing the given element,
occurs first time in the document.
|
void | parse(Reader reader) - Parse the HTML text, calling various methods in response to the
occurence of the corresponding HTML constructions.
|
String | parseDTDMarkup() - Parses DTD markup declaration.
|
protected boolean | parseMarkupDeclarations(StringBuffer strBuff) - Parse DTD document declarations.
|
protected void | startTag(TagElement tag) - The method is called when the HTML opening tag ((like <table>)
is found or if the parser concludes that the one should be present
in the current position.
|
clone , equals , extends Object> getClass , finalize , hashCode , notify , notifyAll , toString , wait , wait , wait |
dtd
protected DTD dtd
The document template description that will be used to parse the documents.
strict
protected boolean strict
The value of this field determines whether or not the Parser will be
strict in enforcing SGML compatibility. The default value is false,
stating that the parser should do everything to parse and get at least
some information even from the incorrectly written HTML input.
Parser
public Parser(DTD a_dtd)
Creates a new parser that uses the given DTD to access data on the
possible tokens, arguments and syntax. There is no single - step way
to get a default DTD; you must either refer to the implementation -
specific packages, write your own DTD or obtain the working instance
of parser in other way, for example, by calling
HTMLEditorKit.getParser()
.
endTag
protected void endTag(boolean omitted)
The method is called when the HTML end (closing) tag is found or if
the parser concludes that the one should be present in the
current position. The method is called immediatly
before calling the handleEndTag().
omitted
- True if the tag is no actually present in the document,
but is supposed by the parser (like </html> at the end of the
document).
error
protected void error(String msg)
Invokes the error handler. The default method in this implementation
finally delegates the call to handleError, also providing the number of the
current line.
error
protected void error(String msg,
String invalid)
Invokes the error handler. The default method in this implementation
finally delegates the call to error (msg+": '"+invalid+"'").
error
protected void error(String parm1,
String parm2,
String parm3)
Invokes the error handler. The default method in this implementation
finally delegates the call to error (parm1+" "+ parm2+" "+ parm3).
error
protected void error(String parm1,
String parm2,
String parm3,
String parm4)
Invokes the error handler. The default method in this implementation
finally delegates the call to error
(parm1+" "+ parm2+" "+ parm3+" "+ parm4).
flushAttributes
protected void flushAttributes()
In this implementation, this is never called and returns without action.
getAttributes
protected SimpleAttributeSet getAttributes()
Get the attributes of the current tag.
- The attribute set, representing the attributes of the current tag.
getCurrentLine
protected int getCurrentLine()
Get the number of the document line being parsed.
getCurrentPos
protected int getCurrentPos()
Get the current position in the document being parsed.
handleComment
protected void handleComment(char[] comment)
Handle HTML comment. The default method returns without action.
comment
- The comment being handled
handleEOFInComment
protected void handleEOFInComment()
This is additionally called in when the HTML content terminates
without closing the HTML comment. This can only happen if the
HTML document contains errors (for example, the closing --;gt is
missing. The default method calls the error handler.
handleEmptyTag
protected void handleEmptyTag(TagElement tag)
throws ChangedCharSetException
Handle the tag with no content, like <br>. The method is
called for the elements that, in accordance with the current DTD,
has an empty content.
tag
- The tag being handled.
handleEndTag
protected void handleEndTag(TagElement tag)
The method is called when the HTML closing tag ((like </table>)
is found or if the parser concludes that the one should be present
in the current position.
tag
- The tag being handled
handleStartTag
protected void handleStartTag(TagElement tag)
The method is called when the HTML opening tag ((like <table>)
is found or if the parser concludes that the one should be present
in the current position.
tag
- The tag being handled
handleText
protected void handleText(char[] text)
Handle the text section.
For non-preformatted section, the parser replaces
\t, \r and \n by spaces and then multiple spaces
by a single space. Additionaly, all whitespace around
tags is discarded.
For pre-formatted text (inside TEXAREA and PRE), the parser preserves
all tabs and spaces, but removes
one bounding \r, \n or \r\n,
if it is present. Additionally, it replaces each occurence of \r or \r\n
by a single \n.
handleTitle
protected void handleTitle(char[] title)
Handle HTML <title> tag. This method is invoked when
both title starting and closing tags are already behind.
The passed argument contains the concatenation of all
title text sections.
makeTag
protected TagElement makeTag(Element element)
Constructs the tag from the given element. In this implementation,
this is defined, but never called.
element
- the base element of the tag.
makeTag
protected TagElement makeTag(Element element,
boolean isSupposed)
Constructs the tag from the given element.
element
- the tag base Element
isSupposed
- true if the tag is not actually present in the
html input, but the parser supposes that it should to occur in
the current location.
markFirstTime
protected void markFirstTime(Element element)
This is called when the tag, representing the given element,
occurs first time in the document.
parse
public void parse(Reader reader)
throws IOException
Parse the HTML text, calling various methods in response to the
occurence of the corresponding HTML constructions.
reader
- The reader to read the source HTML from.
parseMarkupDeclarations
protected boolean parseMarkupDeclarations(StringBuffer strBuff)
throws IOException
Parse DTD document declarations. Currently only parses the document
type declaration markup.
- true if this is a valid DTD markup declaration.
startTag
protected void startTag(TagElement tag)
throws ChangedCharSetException
The method is called when the HTML opening tag ((like <table>)
is found or if the parser concludes that the one should be present
in the current position. The method is called immediately before
calling the handleStartTag.
Parser.java -- HTML parser
Copyright (C) 2005 Free Software Foundation, Inc.
This file is part of GNU Classpath.
GNU Classpath is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.
GNU Classpath is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with GNU Classpath; see the file COPYING. If not, write to the
Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301 USA.
Linking this library statically or dynamically with other modules is
making a combined work based on this library. Thus, the terms and
conditions of the GNU General Public License cover the whole
combination.
As a special exception, the copyright holders of this library give you
permission to link this library with independent modules to produce an
executable, regardless of the license terms of these independent
modules, and to copy and distribute the resulting executable under
terms of your choice, provided that you also meet, for each linked
independent module, the terms and conditions of the license of that
module. An independent module is a module which is not derived from
or based on this library. If you modify this library, you may extend
this exception to your version of the library, but you are not
obligated to do so. If you do not wish to do so, delete this
exception statement from your version.