NorKen Technologies, Inc.
A parser breaks data into smaller elements, according to a set of rules that describe its structure.  ProGrammar simplifies that task of building parsers.

Home Page of NorKen Technologies, Inc. Information About NorKen Technologies, Inc. Information About NorKen Technologies' Products Information About NorKen Technologies' Services Information About Ordering NorKen Technologies' Products How to Contact NorKen Technologies


FAQs



Free Trial

Order



Downloads

Grammars


















































































What is a Parser?

A parser breaks data into smaller elements, according to a set of rules that describe its structure. Most data can be decomposed to some degree. For example, a phone number consists of an area code, prefix and suffix; and a mailing address consists of a street address, city, state, country and zip code.

Consider the following data:


(800) 555-1234
(123) 555-4321
(999) 888-7777

Because of the way these items are formatted, we recognize them as a list of phone numbers. The structure of these items may be described informally as:

"A phone number consists of a three-digit area code, enclosed by parentheses, followed by a three-digit prefix, followed by a dash, followed by a four-digit suffix."

This description can be expressed more formally as a grammar. A grammar is a set of rules that describe the structure, or syntax, of a particular type of data. The following grammar describes the syntax of phone numbers:


phone_number ::= "(" area_code ")" prefix "-" suffix;
area_code ::= numeric<3>;
prefix ::= numeric<3>;
suffix ::= numeric<4> ;

Each rule in the grammar, known as a production rule, describes the composition of a named symbol. The "::=" notation may be interpreted as "is composed of". Hence, the first production rule states that a phone_number is composed of a left parenthesis, followed by an area_code, followed by a right parenthesis, and so on. The next rule states that an area_code is composed of exactly three digits. Note how closely the grammar corresponds to the informal description of phone numbers.

Once the syntax of a data source has been described by grammar rules, a parser can use the grammar to parse the data source; that is, to break data elements such as phone numbers into smaller elements, such as area codes.

The output of the parser is a parse tree. The parse tree expresses the hierarchical structure of the input data. For example, the following parse tree is generated when phone number "(800) 555-1234" is parsed, using the grammar shown above:

Parsing is the process of matching grammar symbols to elements in the input data, according to the rules of the grammar. The resulting parse tree is a mapping of grammar symbols to data elements. Each node in the tree has a label, which is the name of a grammar symbol; and a value, which is an element from the input data.

While parsers have traditionally been used in the construction of compilers, they're also quite useful in routine programming tasks, such as reading comma-delimited files, extracting data from formatted reports, and verifying the correctness of data formats. In fact, as applications continue to become more information-centric, the need for robust parsing technologies continues to grow. Common uses for parsers include:

  • Programming languages (Java, Basic)
  • Markup languages (HTML, XML)
  • Industry standard formats (IDL, ODL)
  • File formats (RTF, Postscript)
  • Database languages (SQL)
  • Modeling languages (VRML)
  • Command-line processing
  • Scripting languages
  • Special purpose proprietary languages
  • Protocols (HTTP, Internet RFC's)


Related Topics

How do I build a Parser using ProGrammar?
What grammars are already available?



For comments or questions about this site, please contact
webmaster@programmar.com
Copyright © 1998-2008 NorKen Technologies, Inc. All rights reserved.