2007-04-22

Writing a parser: ADL tokens

ADL only has 4 different tokens:
Word
A word starts with a letter, followed by zero or more letters or numbers.
Examples: x, abc, f2, else.
Integer
An integer is a sequence of one or more digits.
Examples: 1, 42, 3141592654.
String
A string is a sequence of characters enclosed in quotes.
Examples: "x", "abc", "Quid pro quo.".
Symbol
A symbol is one of the following sequences:
+ - * / ( ) , := == < > <> <= >=
To identify tokens, we'll use an enum called TokenType:
namespace TC.Adl
{
    public enum TokenType
    {
        None = 0,
        Word,
        Integer,
        String,
        Symbol
    }
}
(The value None is the default value and should not occur.)

A token has a type (of type TokenType) and a value (of type string). The value is the sequence of characters that represent the token.

using System;
using System.Collections.Generic;
using System.Text;

namespace TC.Adl
{
    public class Token
    {
        public Token(TokenType type, string value)
        {
            fType = type;
            fValue = value;
        }

        readonly TokenType fType;
        public TokenType Type { get { return fType; } }

        readonly string fValue;
        public string Value { get { return fValue; } }

        public bool Equals(TokenType type, string value)
        {
            return fType == type && fValue == value;
        }
    }
}

You may have noticed that there are no comments or argument validation code. This is just to make the code simpler and easier to understand at first sight. The code I'm writing in Visual Studio is fully commented and has all the necessary argument validation code. I'll release the entire library afterwards.

Next time, we'll write the tokenizer.

Comments: Post a Comment

Links to this post:

Create a Link