NorKen Technologies, Inc.


Home Page of NorKen Technologies, Inc. Information About NorKen Technologies, Inc. Information About NorKen Technologies' Products Information About NorKen Technologies' Services Information About Ordering NorKen Technologies' Products How to Contact NorKen Technologies


FAQs



Free Trial

Order



Downloads

Grammars














































































































































GDL Tutorial - The Phonebook Example

The Phonebook grammar parses text files that contain a list of names and phone numbers, such as:

    Adams, John Q. (900) 555-1234
    Smith, John Q. (800) 765-4321
    Doe, John      (999) 999-9999
    Doe, Jane J.   (000) 012-3456

A grammar is made up of a list of production rules. Each rule contains the name of a symbol, followed by the production symbol "::=", followed by the definition of the symbol, followed by a semi-colon. Named symbols are also called "nonterminal" symbols, since they are defined in terms of other symbols.

The basic approach for defining a grammar is to start with a high-level description of the input, then incrementally define each symbol in terms of other symbols, until the lowest level symbols are defined by terminals. Terminals are primitive types that the parser recognizes implicitly and already "knows" how to parse. Examples include literal strings, regular expressions, and any of the predefined types such as "alpha" and "numeric". Once all of the symbols in a grammar have been defined in terms of other symbols or terminal types, the grammar definition is "complete" and may be used for parsing.

The following grammar describes the syntax of this file.


    PhoneBook ::= { PhoneBookLine };

    PhoneBookLine ::= Name PhoneNumber;

    Name ::=    Lastname
                ","
                Firstname
                [MiddleInit "."]  // optional term
                ;

    Lastname    ::= alpha;  // one or more letters
    Firstname   ::= alpha;  // one or more letters
    MiddleInit  ::= alpha;  // one or more letters

    PhoneNumber ::= "(" AreaCode ")"
                    PhonePrefix
                    "-"
                    PhoneSuffix ;
			
    AreaCode    ::= numeric<3>;	// exactly three digits
    PhonePrefix ::= numeric<3>;	// exactly three digits
    PhoneSuffix ::= numeric<4>;	// exactly four digits


The first production rule:

    PhoneBook ::= { PhoneBookLine };
uses a repeater construct (delimited by '{' and '}') to indicate that multiple occurrences of PhoneBookLine may exist in the input. PhoneBookLine is subequently defined as a conjunction of two other symbols:
    PhoneBookLine ::= Name PhoneNumber;
Because these names are separated by whitespace, they are considered conjunctive. Conjunction is analogous to a logical AND, indicating that both terms occur in the input, in the order they are listed. In this case, each occurrence of PhoneBookLine consists of an occurrence of Name, followed by an occurrence of PhoneNumber.

In the next production:

    Name ::= Lastname 
             "," 
             Firstname 
            [MiddleInit "."] 	// optional term
            ;
the two quoted strings ("," and ".") are called literals. Literals must match characters from the input exactly. The square brackets around the last term indicate that the enclosed conjunction of MiddleInit and "." is optional.

The next set of productions defines three symbols:

    Lastname    ::= alpha;   // one or more letters
    Firstname   ::= alpha;   // one or more letters
    MiddleInit  ::= alpha;   // one or more letters
Each of these symbols is defined as a symbol named "alpha". Keyword alpha is a predefined type, meaning that the parse engine already knows how to parse it. The predefined types in ProGrammar are:

Predefined type Contains characters of type...
alpha upper- and lower-case letters
alpha_ upper- and lower-case letters; the underscore ('_')
alphanumeric upper- and lower-case letters; digits
alnumblank upper- and lower-case letters; digits; whitespace
identifier upper- and lower-case letters; digits; the underscore ('_'). The first character cannot be a digit.
numeric digits
quotedstring any string of characters enclosed by quotation marks
whitespace spaces, tabs, newlines, carriage-returns

The predefined type numeric is used in the following production rules:

    AreaCode    ::= numeric<3>;	// exactly three digits
    PhonePrefix ::= numeric<3>;	// exactly three digits
    PhoneSuffix ::= numeric<4>;	// exactly four digits
Note the use of length constraints, denoted by "<" and ">". This construct limits the minimum and maximum length, in character positions, of the term that precedes it. The generalized notation for a length constraint is:
    any-term < min-length, max-length > 
In the preceding production rules, the length constraints are interpreted as follows:
    numeric<3>		exactly three digits
    numeric<4>		exactly four digits
By default, the minimum length of a term is one, and there is no maximum length. There are several usage variations for length constraints, as shown in the following examples:

Usage Min Length Max Length
numeric <3, 4> 3 4
numeric <3, > 3 unbounded
numeric <3> 3 3
numeric < , 4> 0 4
numeric < , > 0 unbounded
numeric 1 unbounded

The following table summarizes the GDL constructs discussed in this example:

GDL Construct Notation Description
Repeater { repeat-term } Indicates that a term may have multiple successive occurrences in the input.
Literal "some string" A value that must match the input exactly, in order to parse successfully.
Conjunction implicit Operates like a logical-AND. All terms in a conjunction must match the input, in the order they are listed, for the parse to succeed.
Optional Term [ ] Indicates that the enclosed term is optional in the input.
Length Constraint < min, max > Limits the minimum and maximum length, in character positions, of a term.
Predefined Type type name Any of the predefined types; including alpha, numeric, and alphanumeric.






For comments or questions about this site, please contact
webmaster@programmar.com
Copyright © 1998-2008 NorKen Technologies, Inc. All rights reserved.