Package org.apache.regexp
Class RECompiler
- java.lang.Object
-
- org.apache.regexp.RECompiler
-
- Direct Known Subclasses:
REDebugCompiler
public class RECompiler extends java.lang.Object
A regular expression compiler class. This class compiles a pattern string into a regular expression program interpretable by the RE evaluator class. The 'recompile' command line tool uses this compiler to pre-compile regular expressions for use with RE. For a description of the syntax accepted by RECompiler and what you can do with regular expressions, see the documentation for the RE matcher class.- Version:
- $Id: RECompiler.java 518156 2007-03-14 14:31:26Z vgritsenko $
- Author:
- Jonathan Locke, Michael McCallum
- See Also:
RE
,recompile
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) class
RECompiler.RERange
Local, nested class for maintaining character ranges for character classes.
-
Field Summary
Fields Modifier and Type Field Description (package private) int
bracketMin
(package private) int
bracketOpt
(package private) static int
bracketUnbounded
(package private) static int
ESC_BACKREF
(package private) static int
ESC_CLASS
(package private) static int
ESC_COMPLEX
(package private) static int
ESC_MASK
(package private) static java.util.Hashtable
hashPOSIX
(package private) int
idx
(package private) char[]
instruction
(package private) int
len
(package private) int
lenInstruction
(package private) static int
NODE_NORMAL
(package private) static int
NODE_NULLABLE
(package private) static int
NODE_TOPLEVEL
(package private) int
parens
(package private) java.lang.String
pattern
-
Constructor Summary
Constructors Constructor Description RECompiler()
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description (package private) int
atom()
Absorb an atomic character string.(package private) void
bracket()
Match bracket {m,n} expression put results in bracket member variables(package private) int
branch(int[] flags)
Compile body of one branch of an or operator (implements concatenation)(package private) int
characterClass()
Compile a character class(package private) int
closure(int[] flags)
Compile a possibly closured terminalREProgram
compile(java.lang.String pattern)
Compiles a regular expression pattern into a program runnable by the pattern matcher class 'RE'.(package private) void
emit(char c)
Emit a single character into the program stream.(package private) void
ensure(int n)
Ensures that n more characters can fit in the program buffer.(package private) int
escape()
Match an escape sequence.(package private) int
expr(int[] flags)
Compile an expression with possible parens around it.(package private) void
internalError()
Throws a new internal error exception(package private) int
node(char opcode, int opdata)
Adds a new node(package private) void
nodeInsert(char opcode, int opdata, int insertAt)
Inserts a node with a given opcode and opdata at insertAt.(package private) void
setNextOfEnd(int node, int pointTo)
Appends a node to the end of a node chain(package private) void
syntaxError(java.lang.String s)
Throws a new syntax error exception(package private) int
terminal(int[] flags)
Match a terminal node.
-
-
-
Field Detail
-
instruction
char[] instruction
-
lenInstruction
int lenInstruction
-
pattern
java.lang.String pattern
-
len
int len
-
idx
int idx
-
parens
int parens
-
NODE_NORMAL
static final int NODE_NORMAL
- See Also:
- Constant Field Values
-
NODE_NULLABLE
static final int NODE_NULLABLE
- See Also:
- Constant Field Values
-
NODE_TOPLEVEL
static final int NODE_TOPLEVEL
- See Also:
- Constant Field Values
-
ESC_MASK
static final int ESC_MASK
- See Also:
- Constant Field Values
-
ESC_BACKREF
static final int ESC_BACKREF
- See Also:
- Constant Field Values
-
ESC_COMPLEX
static final int ESC_COMPLEX
- See Also:
- Constant Field Values
-
ESC_CLASS
static final int ESC_CLASS
- See Also:
- Constant Field Values
-
bracketUnbounded
static final int bracketUnbounded
- See Also:
- Constant Field Values
-
bracketMin
int bracketMin
-
bracketOpt
int bracketOpt
-
hashPOSIX
static final java.util.Hashtable hashPOSIX
-
-
Method Detail
-
ensure
void ensure(int n)
Ensures that n more characters can fit in the program buffer. If n more can't fit, then the size is doubled until it can.- Parameters:
n
- Number of additional characters to ensure will fit.
-
emit
void emit(char c)
Emit a single character into the program stream.- Parameters:
c
- Character to add
-
nodeInsert
void nodeInsert(char opcode, int opdata, int insertAt)
Inserts a node with a given opcode and opdata at insertAt. The node relative next pointer is initialized to 0.- Parameters:
opcode
- Opcode for new nodeopdata
- Opdata for new node (only the low 16 bits are currently used)insertAt
- Index at which to insert the new node in the program
-
setNextOfEnd
void setNextOfEnd(int node, int pointTo)
Appends a node to the end of a node chain- Parameters:
node
- Start of node chain to traversepointTo
- Node to have the tail of the chain point to
-
node
int node(char opcode, int opdata)
Adds a new node- Parameters:
opcode
- Opcode for nodeopdata
- Opdata for node (only the low 16 bits are currently used)- Returns:
- Index of new node in program
-
internalError
void internalError() throws java.lang.Error
Throws a new internal error exception- Throws:
java.lang.Error
- Thrown in the event of an internal error.
-
syntaxError
void syntaxError(java.lang.String s) throws RESyntaxException
Throws a new syntax error exception- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
bracket
void bracket() throws RESyntaxException
Match bracket {m,n} expression put results in bracket member variables- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
escape
int escape() throws RESyntaxException
Match an escape sequence. Handles quoted chars and octal escapes as well as normal escape characters. Always advances the input stream by the right amount. This code "understands" the subtle difference between an octal escape and a backref. You can access the type of ESC_CLASS or ESC_COMPLEX or ESC_BACKREF by looking at pattern[idx - 1].- Returns:
- ESC_* code or character if simple escape
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
characterClass
int characterClass() throws RESyntaxException
Compile a character class- Returns:
- Index of class node
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
atom
int atom() throws RESyntaxException
Absorb an atomic character string. This method is a little tricky because it can un-include the last character of string if a closure operator follows. This is correct because *+? have higher precedence than concatentation (thus ABC* means AB(C*) and NOT (ABC)*).- Returns:
- Index of new atom node
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
terminal
int terminal(int[] flags) throws RESyntaxException
Match a terminal node.- Parameters:
flags
- Flags- Returns:
- Index of terminal node (closeable)
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
closure
int closure(int[] flags) throws RESyntaxException
Compile a possibly closured terminal- Parameters:
flags
- Flags passed by reference- Returns:
- Index of closured node
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
branch
int branch(int[] flags) throws RESyntaxException
Compile body of one branch of an or operator (implements concatenation)- Parameters:
flags
- Flags passed by reference- Returns:
- Pointer to first node in the branch
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
expr
int expr(int[] flags) throws RESyntaxException
Compile an expression with possible parens around it. Paren matching is done at this level so we can tie the branch tails together.- Parameters:
flags
- Flag value passed by reference- Returns:
- Node index of expression in instruction array
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.
-
compile
public REProgram compile(java.lang.String pattern) throws RESyntaxException
Compiles a regular expression pattern into a program runnable by the pattern matcher class 'RE'.- Parameters:
pattern
- Regular expression pattern to compile (see RECompiler class for details).- Returns:
- A compiled regular expression program.
- Throws:
RESyntaxException
- Thrown if the regular expression has invalid syntax.- See Also:
RECompiler
,RE
-
-