java.io

Class StreamTokenizer


public class StreamTokenizer
extends Object

This class parses streams of characters into tokens. There are a million-zillion flags that can be set to control the parsing, as described under the various method headings.

Field Summary

static int
TT_EOF
A constant indicating that the end of the stream has been read.
static int
TT_EOL
A constant indicating that the end of the line has been read.
static int
TT_NUMBER
A constant indicating that a number token has been read.
static int
TT_WORD
A constant indicating that a word token has been read.
double
nval
The numeric value associated with number tokens.
String
sval
The String associated with word and string tokens.
int
ttype
Contains the type of the token read resulting from a call to nextToken The rules are as follows:
  • For a token consisting of a single ordinary character, this is the value of that character.
  • For a quoted string, this is the value of the quote character
  • For a word, this is TT_WORD
  • For a number, this is TT_NUMBER
  • For the end of the line, this is TT_EOL
  • For the end of the stream, this is TT_EOF

Constructor Summary

StreamTokenizer(InputStream is)
Deprecated. Since JDK 1.1.
StreamTokenizer(Reader r)
This method initializes a new StreamTokenizer to read characters from a Reader and parse them.

Method Summary

void
commentChar(int ch)
This method sets the comment attribute on the specified character.
void
eolIsSignificant(boolean flag)
This method sets a flag that indicates whether or not the end of line sequence terminates and is a token.
int
lineno()
This method returns the current line number.
void
lowerCaseMode(boolean flag)
This method sets a flag that indicates whether or not alphabetic tokens that are returned should be converted to lower case.
int
nextToken()
This method reads the next token from the stream.
void
ordinaryChar(int ch)
This method makes the specified character an ordinary character.
void
ordinaryChars(int low, int hi)
This method makes all the characters in the specified range, range terminators included, ordinary.
void
parseNumbers()
This method sets the numeric attribute on the characters '0' - '9' and the characters '.' and '-'.
void
pushBack()
Puts the current token back into the StreamTokenizer so nextToken will return the same value on the next call.
void
quoteChar(int ch)
This method sets the quote attribute on the specified character.
void
resetSyntax()
This method removes all attributes (whitespace, alphabetic, numeric, quote, and comment) from all characters.
void
slashSlashComments(boolean flag)
This method sets a flag that indicates whether or not "C++" language style comments ("//" comments through EOL ) are handled by the parser.
void
slashStarComments(boolean flag)
This method sets a flag that indicates whether or not "C" language style comments (with nesting not allowed) are handled by the parser.
String
toString()
This method returns the current token value as a String in the form "Token[x], line n", where 'n' is the current line numbers and 'x' is determined as follows.
void
whitespaceChars(int low, int hi)
This method sets the whitespace attribute for all characters in the specified range, range terminators included.
void
wordChars(int low, int hi)
This method sets the alphabetic attribute for all characters in the specified range, range terminators included.

Methods inherited from class java.lang.Object

clone, equals, extends Object> getClass, finalize, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details

TT_EOF

public static final int TT_EOF
A constant indicating that the end of the stream has been read.
Field Value:
-1

TT_EOL

public static final int TT_EOL
A constant indicating that the end of the line has been read.
Field Value:
10

TT_NUMBER

public static final int TT_NUMBER
A constant indicating that a number token has been read.
Field Value:
-2

TT_WORD

public static final int TT_WORD
A constant indicating that a word token has been read.
Field Value:
-3

nval

public double nval
The numeric value associated with number tokens.

sval

public String sval
The String associated with word and string tokens.

ttype

public int ttype
Contains the type of the token read resulting from a call to nextToken The rules are as follows:
  • For a token consisting of a single ordinary character, this is the value of that character.
  • For a quoted string, this is the value of the quote character
  • For a word, this is TT_WORD
  • For a number, this is TT_NUMBER
  • For the end of the line, this is TT_EOL
  • For the end of the stream, this is TT_EOF

Constructor Details

StreamTokenizer

public StreamTokenizer(InputStream is)

Deprecated. Since JDK 1.1.

This method reads bytes from an InputStream and tokenizes them. For details on how this method operates by default, see StreamTokenizer(Reader).
Parameters:
is - The InputStream to read from

StreamTokenizer

public StreamTokenizer(Reader r)
This method initializes a new StreamTokenizer to read characters from a Reader and parse them. The char values have their hight bits masked so that the value is treated a character in the range of 0x0000 to 0x00FF.

This constructor sets up the parsing table to parse the stream in the following manner:

  • The values 'A' through 'Z', 'a' through 'z' and 0xA0 through 0xFF are initialized as alphabetic
  • The values 0x00 through 0x20 are initialized as whitespace
  • The values '\'' and '"' are initialized as quote characters
  • '/' is a comment character
  • Numbers will be parsed
  • EOL is not treated as significant
  • C and C++ (//) comments are not recognized
Parameters:
r - The Reader to read chars from

Method Details

commentChar

public void commentChar(int ch)
This method sets the comment attribute on the specified character. Other attributes for the character are cleared.
Parameters:
ch - The character to set the comment attribute for, passed as an int

eolIsSignificant

public void eolIsSignificant(boolean flag)
This method sets a flag that indicates whether or not the end of line sequence terminates and is a token. The defaults to false
Parameters:
flag - true if EOF is significant, false otherwise

lineno

public int lineno()
This method returns the current line number. Note that if the pushBack() method is called, it has no effect on the line number returned by this method.
Returns:
The current line number

lowerCaseMode

public void lowerCaseMode(boolean flag)
This method sets a flag that indicates whether or not alphabetic tokens that are returned should be converted to lower case.
Parameters:
flag - true to convert to lower case, false otherwise

nextToken

public int nextToken()
            throws IOException
This method reads the next token from the stream. It sets the ttype variable to the appropriate token type and returns it. It also can set sval or nval as described below. The parsing strategy is as follows:
  • Skip any whitespace characters.
  • If a numeric character is encountered, attempt to parse a numeric value. Leading '-' characters indicate a numeric only if followed by another non-'-' numeric. The value of the numeric token is terminated by either the first non-numeric encountered, or the second occurrence of '-' or '.'. The token type returned is TT_NUMBER and nval is set to the value parsed.
  • If an alphabetic character is parsed, all subsequent characters are read until the first non-alphabetic or non-numeric character is encountered. The token type returned is TT_WORD and the value parsed is stored in sval. If lower case mode is set, the token stored in sval is converted to lower case. The end of line sequence terminates a word only if EOL signficance has been turned on. The start of a comment also terminates a word. Any character with a non-alphabetic and non-numeric attribute (such as white space, a quote, or a commet) are treated as non-alphabetic and terminate the word.
  • If a comment character is parsed, then all remaining characters on the current line are skipped and another token is parsed. Any EOL or EOF's encountered are not discarded, but rather terminate the comment.
  • If a quote character is parsed, then all characters up to the second occurrence of the same quote character are parsed into a String. This String is stored as sval, but is not converted to lower case, even if lower case mode is enabled. The token type returned is the value of the quote character encountered. Any escape sequences (\b (backspace), \t (HTAB), \n (linefeed), \f (form feed), \r (carriage return), \" (double quote), \' (single quote), \\ (backslash), \XXX (octal esacpe)) are converted to the appropriate char values. Invalid esacape sequences are left in untranslated. Unicode characters like ('\ u0000') are not recognized.
  • If the C++ comment sequence "//" is encountered, and the parser is configured to handle that sequence, then the remainder of the line is skipped and another token is read exactly as if a character with the comment attribute was encountered.
  • If the C comment sequence "/*" is encountered, and the parser is configured to handle that sequence, then all characters up to and including the comment terminator sequence are discarded and another token is parsed.
  • If all cases above are not met, then the character is an ordinary character that is parsed as a token by itself. The char encountered is returned as the token type.
Returns:
The token type
Throws:
IOException - If an I/O error occurs

ordinaryChar

public void ordinaryChar(int ch)
This method makes the specified character an ordinary character. This means that none of the attributes (whitespace, alphabetic, numeric, quote, or comment) will be set on this character. This character will parse as its own token.
Parameters:
ch - The character to make ordinary, passed as an int

ordinaryChars

public void ordinaryChars(int low,
                          int hi)
This method makes all the characters in the specified range, range terminators included, ordinary. This means the none of the attributes (whitespace, alphabetic, numeric, quote, or comment) will be set on any of the characters in the range. This makes each character in this range parse as its own token.
Parameters:
low - The low end of the range of values to set the whitespace attribute for
hi - The high end of the range of values to set the whitespace attribute for

parseNumbers

public void parseNumbers()
This method sets the numeric attribute on the characters '0' - '9' and the characters '.' and '-'. When this method is used, the result of giving other attributes (whitespace, quote, or comment) to the numeric characters may vary depending on the implementation. For example, if parseNumbers() and then whitespaceChars('1', '1') are called, this implementation reads "121" as 2, while some other implementation will read it as 21.

pushBack

public void pushBack()
Puts the current token back into the StreamTokenizer so nextToken will return the same value on the next call. May cause the lineno method to return an incorrect value if lineno is called before the next call to nextToken.

quoteChar

public void quoteChar(int ch)
This method sets the quote attribute on the specified character. Other attributes for the character are cleared.
Parameters:
ch - The character to set the quote attribute for, passed as an int.

resetSyntax

public void resetSyntax()
This method removes all attributes (whitespace, alphabetic, numeric, quote, and comment) from all characters. It is equivalent to calling ordinaryChars(0x00, 0xFF).

slashSlashComments

public void slashSlashComments(boolean flag)
This method sets a flag that indicates whether or not "C++" language style comments ("//" comments through EOL ) are handled by the parser. If this is true commented out sequences are skipped and ignored by the parser. This defaults to false.
Parameters:
flag - true to recognized and handle "C++" style comments, false otherwise

slashStarComments

public void slashStarComments(boolean flag)
This method sets a flag that indicates whether or not "C" language style comments (with nesting not allowed) are handled by the parser. If this is true commented out sequences are skipped and ignored by the parser. This defaults to false.
Parameters:
flag - true to recognized and handle "C" style comments, false otherwise

toString

public String toString()
This method returns the current token value as a String in the form "Token[x], line n", where 'n' is the current line numbers and 'x' is determined as follows.

  • If no token has been read, then 'x' is "NOTHING" and 'n' is 0
  • If ttype is TT_EOF, then 'x' is "EOF"
  • If ttype is TT_EOL, then 'x' is "EOL"
  • If ttype is TT_WORD, then 'x' is sval
  • If ttype is TT_NUMBER, then 'x' is "n=strnval" where 'strnval' is String.valueOf(nval).
  • If ttype is a quote character, then 'x' is sval
  • For all other cases, 'x' is ttype
Overrides:
toString in interface Object

whitespaceChars

public void whitespaceChars(int low,
                            int hi)
This method sets the whitespace attribute for all characters in the specified range, range terminators included.
Parameters:
low - The low end of the range of values to set the whitespace attribute for
hi - The high end of the range of values to set the whitespace attribute for

wordChars

public void wordChars(int low,
                      int hi)
This method sets the alphabetic attribute for all characters in the specified range, range terminators included.
Parameters:
low - The low end of the range of values to set the alphabetic attribute for
hi - The high end of the range of values to set the alphabetic attribute for

StreamTokenizer.java -- parses streams of characters into tokens Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003 Free Software Foundation This file is part of GNU Classpath. GNU Classpath is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. GNU Classpath is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with GNU Classpath; see the file COPYING. If not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Linking this library statically or dynamically with other modules is making a combined work based on this library. Thus, the terms and conditions of the GNU General Public License cover the whole combination. As a special exception, the copyright holders of this library give you permission to link this library with independent modules to produce an executable, regardless of the license terms of these independent modules, and to copy and distribute the resulting executable under terms of your choice, provided that you also meet, for each linked independent module, the terms and conditions of the license of that module. An independent module is a module which is not derived from or based on this library. If you modify this library, you may extend this exception to your version of the library, but you are not obligated to do so. If you do not wish to do so, delete this exception statement from your version.