Class RECompiler

  • Direct Known Subclasses:
    REDebugCompiler

    public class RECompiler
    extends java.lang.Object
    A regular expression compiler class. This class compiles a pattern string into a regular expression program interpretable by the RE evaluator class. The 'recompile' command line tool uses this compiler to pre-compile regular expressions for use with RE. For a description of the syntax accepted by RECompiler and what you can do with regular expressions, see the documentation for the RE matcher class.
    Version:
    $Id: RECompiler.java 518156 2007-03-14 14:31:26Z vgritsenko $
    Author:
    Jonathan Locke, Michael McCallum
    See Also:
    RE, recompile
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      (package private) class  RECompiler.RERange
      Local, nested class for maintaining character ranges for character classes.
    • Constructor Summary

      Constructors 
      Constructor Description
      RECompiler()
      Constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      (package private) int atom()
      Absorb an atomic character string.
      (package private) void bracket()
      Match bracket {m,n} expression put results in bracket member variables
      (package private) int branch​(int[] flags)
      Compile body of one branch of an or operator (implements concatenation)
      (package private) int characterClass()
      Compile a character class
      (package private) int closure​(int[] flags)
      Compile a possibly closured terminal
      REProgram compile​(java.lang.String pattern)
      Compiles a regular expression pattern into a program runnable by the pattern matcher class 'RE'.
      (package private) void emit​(char c)
      Emit a single character into the program stream.
      (package private) void ensure​(int n)
      Ensures that n more characters can fit in the program buffer.
      (package private) int escape()
      Match an escape sequence.
      (package private) int expr​(int[] flags)
      Compile an expression with possible parens around it.
      (package private) void internalError()
      Throws a new internal error exception
      (package private) int node​(char opcode, int opdata)
      Adds a new node
      (package private) void nodeInsert​(char opcode, int opdata, int insertAt)
      Inserts a node with a given opcode and opdata at insertAt.
      (package private) void setNextOfEnd​(int node, int pointTo)
      Appends a node to the end of a node chain
      (package private) void syntaxError​(java.lang.String s)
      Throws a new syntax error exception
      (package private) int terminal​(int[] flags)
      Match a terminal node.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • RECompiler

        public RECompiler()
        Constructor. Creates (initially empty) storage for a regular expression program.
    • Method Detail

      • ensure

        void ensure​(int n)
        Ensures that n more characters can fit in the program buffer. If n more can't fit, then the size is doubled until it can.
        Parameters:
        n - Number of additional characters to ensure will fit.
      • emit

        void emit​(char c)
        Emit a single character into the program stream.
        Parameters:
        c - Character to add
      • nodeInsert

        void nodeInsert​(char opcode,
                        int opdata,
                        int insertAt)
        Inserts a node with a given opcode and opdata at insertAt. The node relative next pointer is initialized to 0.
        Parameters:
        opcode - Opcode for new node
        opdata - Opdata for new node (only the low 16 bits are currently used)
        insertAt - Index at which to insert the new node in the program
      • setNextOfEnd

        void setNextOfEnd​(int node,
                          int pointTo)
        Appends a node to the end of a node chain
        Parameters:
        node - Start of node chain to traverse
        pointTo - Node to have the tail of the chain point to
      • node

        int node​(char opcode,
                 int opdata)
        Adds a new node
        Parameters:
        opcode - Opcode for node
        opdata - Opdata for node (only the low 16 bits are currently used)
        Returns:
        Index of new node in program
      • internalError

        void internalError()
                    throws java.lang.Error
        Throws a new internal error exception
        Throws:
        java.lang.Error - Thrown in the event of an internal error.
      • syntaxError

        void syntaxError​(java.lang.String s)
                  throws RESyntaxException
        Throws a new syntax error exception
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
      • bracket

        void bracket()
              throws RESyntaxException
        Match bracket {m,n} expression put results in bracket member variables
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
      • escape

        int escape()
            throws RESyntaxException
        Match an escape sequence. Handles quoted chars and octal escapes as well as normal escape characters. Always advances the input stream by the right amount. This code "understands" the subtle difference between an octal escape and a backref. You can access the type of ESC_CLASS or ESC_COMPLEX or ESC_BACKREF by looking at pattern[idx - 1].
        Returns:
        ESC_* code or character if simple escape
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
      • characterClass

        int characterClass()
                    throws RESyntaxException
        Compile a character class
        Returns:
        Index of class node
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
      • atom

        int atom()
          throws RESyntaxException
        Absorb an atomic character string. This method is a little tricky because it can un-include the last character of string if a closure operator follows. This is correct because *+? have higher precedence than concatentation (thus ABC* means AB(C*) and NOT (ABC)*).
        Returns:
        Index of new atom node
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
      • terminal

        int terminal​(int[] flags)
              throws RESyntaxException
        Match a terminal node.
        Parameters:
        flags - Flags
        Returns:
        Index of terminal node (closeable)
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
      • closure

        int closure​(int[] flags)
             throws RESyntaxException
        Compile a possibly closured terminal
        Parameters:
        flags - Flags passed by reference
        Returns:
        Index of closured node
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
      • branch

        int branch​(int[] flags)
            throws RESyntaxException
        Compile body of one branch of an or operator (implements concatenation)
        Parameters:
        flags - Flags passed by reference
        Returns:
        Pointer to first node in the branch
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
      • expr

        int expr​(int[] flags)
          throws RESyntaxException
        Compile an expression with possible parens around it. Paren matching is done at this level so we can tie the branch tails together.
        Parameters:
        flags - Flag value passed by reference
        Returns:
        Node index of expression in instruction array
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
      • compile

        public REProgram compile​(java.lang.String pattern)
                          throws RESyntaxException
        Compiles a regular expression pattern into a program runnable by the pattern matcher class 'RE'.
        Parameters:
        pattern - Regular expression pattern to compile (see RECompiler class for details).
        Returns:
        A compiled regular expression program.
        Throws:
        RESyntaxException - Thrown if the regular expression has invalid syntax.
        See Also:
        RECompiler, RE