NorKen Technologies, Inc.


Home Page of NorKen Technologies, Inc. Information About NorKen Technologies, Inc. Information About NorKen Technologies' Products Information About NorKen Technologies' Services Information About Ordering NorKen Technologies' Products How to Contact NorKen Technologies


FAQs



Free Trial

Order



Downloads

Grammars






































































































Regular Expressions

A regular expression is a pattern that describes the format of a string. In ProGrammar Grammar Definition Language, regular expressions are delimited by single quotes. Since regular expressions are considered to be grammar terms, they may occur anywhere in a production rule that a term is legal. For example:

s ::= '[a-z]+' | some_other_symbol ;

Regular expressions consist of a combination of "regular" characters, which are taken literally, and "meta" characters, which have special meaning within the expression. A regular expression that is composed of only regular characters is equivalent to a literal; e.g., the term 'foo bar' is equivalent to the literal term: "foo bar".

All letters and digits are interpreted literally, whereas most punctuation characters have special interpretations. For example, the period matches any single character.

'...' matches any three characters
'a.b' matches strings "aab", "abb", "axb", "a2b", etc.

In order to match a literal period, the period character must be preceded by a backslash, which escapes its meta-interpretation.

'\..' matches a literal period, followed by any character
'a\.b' matches string "a.b" only

The following characters have special meaning within regular expressions:

. A period matches any single character, except for NULL ('\0').
' ' Single quotes delimit a regular expression term.
" " Double quotes declares a literal string within the expression. This string must be matched exactly, following all the ordinary rules for literal strings. Note that all special characters lose their meta-interpretations within the literal string.
( ) Parentheses group one or more regular expressions together as a single expression.
* An asterisk matches zero or more occurrences of the expression that precedes it. For example, 'a*' matches the strings "a", "aaaaaa", and "" (empty string).
+ A plus sign matches one or more occurrences of the expression that precedes it. For example, 'a+' matches the strings "a", and "aaaaaa"; but not "" (the empty string), since at least one occurrence is required.
? A question mark matches exactly zero or one occurrence of the preceding expression. For example, 'a?' matches the strings "a" and "", and only those strings.
[ ] Square brackets delimit a character list, which matches any single character in the list. For example, regular expression '[0123456789]' matches any single digit. Within a character list, the following additional meta-characters are defined:

- The dash indicates a range of matching character values. For example, '[0-9]' matches any single digit, and '[a-z]' matches any single lower-case letter. The dash is interpreted literally when it's the first or last character in the list.
^ When a caret is the first character in a character list it's interpreted as a negation symbol, which matches any character that is not in the list. For example, '[^abc]' matches any character except for 'a', 'b' or 'c'.

All other meta-characters, except for the backslash, lose their special interpretations when included in a character list, and are taken literally.

Because the right square bracket delimits the end of the character list, it must be escaped by a backslash when included as part of the list; e.g. '[a-z\]]'

\ The backslash is the escape character, which overrides any special meaning associated with the character that follows it. For example, '\[' is interpreted as a literal "[" (left square bracket) character, not the beginning of a character list. Standard C escape-sequences are also recognized (e.g., '\n' is interpreted as a newline).


Examples of Regular Expressions

alpha ::= '[a-zA-Z]+';
numeric ::= '[0-9]+';
alphanumeric ::= '[a-zA-Z0-9]+';
identifier ::= '[a-zA-Z_]+[a-zA-Z0-9_]*';
hex_number ::= '0[xX][a-fA-F0-9]+';
octal_number ::= '0[oO][0-8]+';
real ::= '-?(([0-9]*\.[0-9]+)
      ([eE][-+]?[0-9]+)?|([0-9]+))';



For comments or questions about this site, please contact
webmaster@programmar.com
Copyright © 1998-2008 NorKen Technologies, Inc. All rights reserved.