java.io
Class StreamTokenizer
This class parses streams of characters into tokens. There are a
million-zillion flags that can be set to control the parsing, as
described under the various method headings.
static int | TT_EOF - A constant indicating that the end of the stream has been read.
|
static int | TT_EOL - A constant indicating that the end of the line has been read.
|
static int | TT_NUMBER - A constant indicating that a number token has been read.
|
static int | TT_WORD - A constant indicating that a word token has been read.
|
double | nval - The numeric value associated with number tokens.
|
String | sval - The String associated with word and string tokens.
|
int | ttype - Contains the type of the token read resulting from a call to nextToken
The rules are as follows:
- For a token consisting of a single ordinary character, this is the
value of that character.
- For a quoted string, this is the value of the quote character
- For a word, this is TT_WORD
- For a number, this is TT_NUMBER
- For the end of the line, this is TT_EOL
- For the end of the stream, this is TT_EOF
|
void | commentChar(int ch) - This method sets the comment attribute on the specified
character.
|
void | eolIsSignificant(boolean flag) - This method sets a flag that indicates whether or not the end of line
sequence terminates and is a token.
|
int | lineno() - This method returns the current line number.
|
void | lowerCaseMode(boolean flag) - This method sets a flag that indicates whether or not alphabetic
tokens that are returned should be converted to lower case.
|
int | nextToken() - This method reads the next token from the stream.
|
void | ordinaryChar(int ch) - This method makes the specified character an ordinary character.
|
void | ordinaryChars(int low, int hi) - This method makes all the characters in the specified range, range
terminators included, ordinary.
|
void | parseNumbers() - This method sets the numeric attribute on the characters '0' - '9' and
the characters '.' and '-'.
|
void | pushBack() - Puts the current token back into the StreamTokenizer so
nextToken will return the same value on the next call.
|
void | quoteChar(int ch) - This method sets the quote attribute on the specified character.
|
void | resetSyntax() - This method removes all attributes (whitespace, alphabetic, numeric,
quote, and comment) from all characters.
|
void | slashSlashComments(boolean flag) - This method sets a flag that indicates whether or not "C++" language style
comments ("//" comments through EOL ) are handled by the parser.
|
void | slashStarComments(boolean flag) - This method sets a flag that indicates whether or not "C" language style
comments (with nesting not allowed) are handled by the parser.
|
String | toString() - This method returns the current token value as a
String in
the form "Token[x], line n", where 'n' is the current line numbers and
'x' is determined as follows.
|
void | whitespaceChars(int low, int hi) - This method sets the whitespace attribute for all characters in the
specified range, range terminators included.
|
void | wordChars(int low, int hi) - This method sets the alphabetic attribute for all characters in the
specified range, range terminators included.
|
clone , equals , extends Object> getClass , finalize , hashCode , notify , notifyAll , toString , wait , wait , wait |
TT_EOF
public static final int TT_EOF
A constant indicating that the end of the stream has been read.
TT_EOL
public static final int TT_EOL
A constant indicating that the end of the line has been read.
TT_NUMBER
public static final int TT_NUMBER
A constant indicating that a number token has been read.
TT_WORD
public static final int TT_WORD
A constant indicating that a word token has been read.
nval
public double nval
The numeric value associated with number tokens.
sval
public String sval
The String associated with word and string tokens.
ttype
public int ttype
Contains the type of the token read resulting from a call to nextToken
The rules are as follows:
- For a token consisting of a single ordinary character, this is the
value of that character.
- For a quoted string, this is the value of the quote character
- For a word, this is TT_WORD
- For a number, this is TT_NUMBER
- For the end of the line, this is TT_EOL
- For the end of the stream, this is TT_EOF
StreamTokenizer
public StreamTokenizer(InputStream is)
Since JDK 1.1.
This method reads bytes from an InputStream
and tokenizes
them. For details on how this method operates by default, see
StreamTokenizer(Reader)
.
is
- The InputStream
to read from
StreamTokenizer
public StreamTokenizer(Reader r)
This method initializes a new
StreamTokenizer
to read
characters from a
Reader
and parse them. The char values
have their hight bits masked so that the value is treated a character
in the range of 0x0000 to 0x00FF.
This constructor sets up the parsing table to parse the stream in the
following manner:
- The values 'A' through 'Z', 'a' through 'z' and 0xA0 through 0xFF
are initialized as alphabetic
- The values 0x00 through 0x20 are initialized as whitespace
- The values '\'' and '"' are initialized as quote characters
- '/' is a comment character
- Numbers will be parsed
- EOL is not treated as significant
- C and C++ (//) comments are not recognized
r
- The Reader
to read chars from
commentChar
public void commentChar(int ch)
This method sets the comment attribute on the specified
character. Other attributes for the character are cleared.
ch
- The character to set the comment attribute for, passed as an int
eolIsSignificant
public void eolIsSignificant(boolean flag)
This method sets a flag that indicates whether or not the end of line
sequence terminates and is a token. The defaults to false
flag
- true
if EOF is significant, false
otherwise
lineno
public int lineno()
This method returns the current line number. Note that if the
pushBack()
method is called, it has no effect on the
line number returned by this method.
lowerCaseMode
public void lowerCaseMode(boolean flag)
This method sets a flag that indicates whether or not alphabetic
tokens that are returned should be converted to lower case.
flag
- true
to convert to lower case,
false
otherwise
nextToken
public int nextToken()
throws IOException
This method reads the next token from the stream. It sets the
ttype
variable to the appropriate token type and
returns it. It also can set
sval
or
nval
as described below. The parsing strategy is as follows:
- Skip any whitespace characters.
- If a numeric character is encountered, attempt to parse a numeric
value. Leading '-' characters indicate a numeric only if followed by
another non-'-' numeric. The value of the numeric token is terminated
by either the first non-numeric encountered, or the second occurrence of
'-' or '.'. The token type returned is TT_NUMBER and
nval
is set to the value parsed. - If an alphabetic character is parsed, all subsequent characters
are read until the first non-alphabetic or non-numeric character is
encountered. The token type returned is TT_WORD and the value parsed
is stored in
sval
. If lower case mode is set, the token
stored in sval
is converted to lower case. The end of line
sequence terminates a word only if EOL signficance has been turned on.
The start of a comment also terminates a word. Any character with a
non-alphabetic and non-numeric attribute (such as white space, a quote,
or a commet) are treated as non-alphabetic and terminate the word. - If a comment character is parsed, then all remaining characters on
the current line are skipped and another token is parsed. Any EOL or
EOF's encountered are not discarded, but rather terminate the comment.
- If a quote character is parsed, then all characters up to the
second occurrence of the same quote character are parsed into a
String
. This String
is stored as
sval
, but is not converted to lower case, even if lower case
mode is enabled. The token type returned is the value of the quote
character encountered. Any escape sequences
(\b (backspace), \t (HTAB), \n (linefeed), \f (form feed), \r
(carriage return), \" (double quote), \' (single quote), \\
(backslash), \XXX (octal esacpe)) are converted to the appropriate
char values. Invalid esacape sequences are left in untranslated.
Unicode characters like ('\ u0000') are not recognized. - If the C++ comment sequence "//" is encountered, and the parser
is configured to handle that sequence, then the remainder of the line
is skipped and another token is read exactly as if a character with
the comment attribute was encountered.
- If the C comment sequence "/*" is encountered, and the parser
is configured to handle that sequence, then all characters up to and
including the comment terminator sequence are discarded and another
token is parsed.
- If all cases above are not met, then the character is an ordinary
character that is parsed as a token by itself. The char encountered
is returned as the token type.
ordinaryChar
public void ordinaryChar(int ch)
This method makes the specified character an ordinary character. This
means that none of the attributes (whitespace, alphabetic, numeric,
quote, or comment) will be set on this character. This character will
parse as its own token.
ch
- The character to make ordinary, passed as an int
ordinaryChars
public void ordinaryChars(int low,
int hi)
This method makes all the characters in the specified range, range
terminators included, ordinary. This means the none of the attributes
(whitespace, alphabetic, numeric, quote, or comment) will be set on
any of the characters in the range. This makes each character in this
range parse as its own token.
low
- The low end of the range of values to set the whitespace
attribute forhi
- The high end of the range of values to set the whitespace
attribute for
parseNumbers
public void parseNumbers()
This method sets the numeric attribute on the characters '0' - '9' and
the characters '.' and '-'.
When this method is used, the result of giving other attributes
(whitespace, quote, or comment) to the numeric characters may
vary depending on the implementation. For example, if
parseNumbers() and then whitespaceChars('1', '1') are called,
this implementation reads "121" as 2, while some other implementation
will read it as 21.
pushBack
public void pushBack()
Puts the current token back into the StreamTokenizer so
nextToken
will return the same value on the next call.
May cause the lineno method to return an incorrect value
if lineno is called before the next call to nextToken.
quoteChar
public void quoteChar(int ch)
This method sets the quote attribute on the specified character.
Other attributes for the character are cleared.
ch
- The character to set the quote attribute for, passed as an int.
resetSyntax
public void resetSyntax()
This method removes all attributes (whitespace, alphabetic, numeric,
quote, and comment) from all characters. It is equivalent to calling
ordinaryChars(0x00, 0xFF)
.
slashSlashComments
public void slashSlashComments(boolean flag)
This method sets a flag that indicates whether or not "C++" language style
comments ("//" comments through EOL ) are handled by the parser.
If this is true
commented out sequences are skipped and
ignored by the parser. This defaults to false
.
flag
- true
to recognized and handle "C++" style
comments, false
otherwise
slashStarComments
public void slashStarComments(boolean flag)
This method sets a flag that indicates whether or not "C" language style
comments (with nesting not allowed) are handled by the parser.
If this is true
commented out sequences are skipped and
ignored by the parser. This defaults to false
.
flag
- true
to recognized and handle "C" style comments,
false
otherwise
toString
public String toString()
This method returns the current token value as a
String
in
the form "Token[x], line n", where 'n' is the current line numbers and
'x' is determined as follows.
- If no token has been read, then 'x' is "NOTHING" and 'n' is 0
- If
ttype
is TT_EOF, then 'x' is "EOF" - If
ttype
is TT_EOL, then 'x' is "EOL" - If
ttype
is TT_WORD, then 'x' is sval
- If
ttype
is TT_NUMBER, then 'x' is "n=strnval" where
'strnval' is String.valueOf(nval)
. - If
ttype
is a quote character, then 'x' is
sval
- For all other cases, 'x' is
ttype
- toString in interface Object
whitespaceChars
public void whitespaceChars(int low,
int hi)
This method sets the whitespace attribute for all characters in the
specified range, range terminators included.
low
- The low end of the range of values to set the whitespace
attribute forhi
- The high end of the range of values to set the whitespace
attribute for
wordChars
public void wordChars(int low,
int hi)
This method sets the alphabetic attribute for all characters in the
specified range, range terminators included.
low
- The low end of the range of values to set the alphabetic
attribute forhi
- The high end of the range of values to set the alphabetic
attribute for
StreamTokenizer.java -- parses streams of characters into tokens
Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003 Free Software Foundation
This file is part of GNU Classpath.
GNU Classpath is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.
GNU Classpath is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with GNU Classpath; see the file COPYING. If not, write to the
Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301 USA.
Linking this library statically or dynamically with other modules is
making a combined work based on this library. Thus, the terms and
conditions of the GNU General Public License cover the whole
combination.
As a special exception, the copyright holders of this library give you
permission to link this library with independent modules to produce an
executable, regardless of the license terms of these independent
modules, and to copy and distribute the resulting executable under
terms of your choice, provided that you also meet, for each linked
independent module, the terms and conditions of the license of that
module. An independent module is a module which is not derived from
or based on this library. If you modify this library, you may extend
this exception to your version of the library, but you are not
obligated to do so. If you do not wish to do so, delete this
exception statement from your version.