Scheming in Perl

Mayukh Bose

How to write your own little language in perl. We will implement a tiny scheme-based interpreter in perl.

Themes

Adjust presentation theme to suit projector:
Black (default) - White - League - Sky - Beige - Simple
Serif - Blood - Night - Moon - Solarized

Why?

  • Just because!
  • It makes a person a better programmer when one understand what's going on under the covers.
  • Gives you an insight about other programming languages as well.

Language Basics

  • We have a source file that goes through a
         scanner → lexer → parser
    and is interpreted or compiled.
  • We will examine the functions of each bit briefly in the next few slides.
  • For now, we have a picture of how the various bits fit together.

The Scanner

  • Fetches one character at a time from a source file.
  • Keeps track of where it is in the file (line and column number).
  • May need to be able to back up one character (we'll see why in a bit).
  • Largely ignorant of a language's syntax.

The Lexer

  • Breaks the source text into tokens.
  • Needs to understand the language syntax somewhat.
  • Needs to maintain state.
  • E.g. Lexing the statements:
    ``` cx = 15.23 * ax + 10; print cx; ```
  • Returns token and token type.

The Lexer - Part II

Reads a character and decides what action to take:

  • Whitespace or comment character? Skip characters.
  • Is is a letter? Then it is an identifier or keyword. Continue to read more letters or numbers until a white space or symbol is reached.
  • Is it a digit? Then it is a number. Continue to read more digits or decimal point until a non-digit is read. Determine if integer or real number.
  • Is it a double quote (")? Then it is a string. Continue to read more characters until another " is reached.
continued...

The Lexer - Part III

  • Is it a symbol? See if you can read more to see if it is < vs. <= or + vs. ++ etc. Identify the type of symbol read in (assignment, addition, less than etc.)


Of course, this is a simple representation and we haven't taken other considerations, such as escape characters (e.g. \), line continuation characters (e.g. \ or 1, 2 etc.), complex numbers etc. into account. Again, a lot of these depend on the language features that we are designing.

The Parser

  • Calls the lexer to get one token at a time.
  • Really understands the language syntax rules and builds a representation of the code, based on tokens and token types.
  • For example:
    ``` cx = 15.23 * ax + 10; ```
    Checks types and syntax and makes sure this expression makes sense.
  • Builds a tree to evaluate the expression.

The Parser - II

  • Parser: Give me a token.
  • Lexer: 'cx', which is an identifier.
  • Parser: cx is declared as numeric. Next token.
  • Lexer: '=', which is a symbol.
    • Parser: Ok, something is assigning to cx. Next token.
    • Lexer: '15.23', which is a float.
    • Parser: Ok, float is assigning to numeric. Next token.
    • Lexer: '*', which is a symbol.
      • Parser: Ok, wants me to multiply something. Next token.

The Parser - III

      • Lexer: Next token is 'ax', which is an identifier.
        • Parser: Ok, ax is numeric. Next token.
        • Lexer: Next token is ';', which is end of statement.
      • Parser: Ok, I'll get the value of 'ax' from symbol table.
    • Parser: Ok, I'll now multiply 15.23 and ax.
  • Parser: Ok, I'll now assign the result of the multiplication to 'cx'.


As you can see, the parser recursively descends into each stage. It also checks types and syntax rules as it goes.

Symbol Table

  • Keeps track of symbols (variables and function names), and the types and values that they represent. Usually implemented as a hash table.
  • For some languages, it needs to be able to distinguish between locally defined symbols and those defined by an earlier routine. (FORTRAN and C are exceptions.)

Designing your own language

Why design your own?

  • You get to decide the syntax and features.
  • Therefore, you get to decide the complexity, or lack thereof.
  • Perl is a very hard language to lex and parse. Sigils and special variables make it harder.
  • Have to consider things like operator precedence, 'if' at beginning or end of statement, state of special $ and % variables etc.
  • By contrast, scheme is a lot easier to lex and parse.

Scheme Basics

  • Based off of LISP, one of the oldest programming languages aside from FORTRAN (Note the ALL CAPS :))
  • Language with (lots (of parenthesis)).
  • Simple syntax.
  • Special characters are only '(', ')', and whitespace. Makes it easy to lex.
  • No operator precedence. Order of evaluation is enforced by parenthesis.

Scheme Basics - II

  • Only two types: atom and list. Atom = scalar in perl. List = array in perl.
  • In a list, first symbol of the list determines what to do with the rest of the list.
    ``` (+ 1 2 35) (display (+ 1 2 (* 4 65))) ```
  • In an atom, we just need to determine if this is a constant or an identifier. If it is an identifier, we look at the symbol table for its value.

tinyScheme

  • Only implements a subset of scheme, just for learning purposes. Very little error checking present.
  • Only numeric type is implemented.
  • Only a few functions are implemented, but enough to get some work done. More functions can be easily added as needed.
  • One goal was to not use any perl modules that do not come with a standard distribution and to keep it simple.
  • NOW ON TO THE IMPLEMENTATION!!!

Scanner for tinyScheme

Scanner?? We don't need no steenkin' scanner for our language. We'll just pass it straight to the lexer to sort it out.

Lexer for tinyScheme

  • Replace '(' and ')' in the string with ' ( ' and ' ) '.
  • Split string by whitespace into an array and return that array.
  • That's all there is to it (the beauty of simple syntax and very few special characters.)

Parser for tinyScheme

  • Go through array items.
    • If item is '(', create a new array and read more tokens until you hit ')', then return the array created.
    • Otherwise, simply return the item read in.
  • Basically creates arrays of arrays.
  • Order of evaluation is already determined due to parenthesis.

Evaluator for tinyScheme

  • Takes two arguments (expression and symbol table object.)
  • Recursive (see recursive)
  • First thing it does is check if the expression is a list or an atom.
  • If atom, check if it is a numeric constant or a token.
    • If numeric constant, return its value.
    • If token, look up its value in the symbol table object and return that.
  • If list, see next slide.

Evaluator for tinyScheme

  • For a list, the first item determines the action to take with the rest of the items in the list.
  • We just pop the first item off the list and examine it and determine what action to take.
  • There are 6 cases which must be handled differently (these are called special forms).There is also a general form. So, we have a total of 7 different forms.
  • We will see how to handle each in turn.

Down arrow

Evaluator for tinyScheme

Special Form #1

If the first item is 'quote', simply return the list unevaluated.


                                          (quote 1 2 ax)
                                      

Evaluator for tinyScheme

Special Form #2

If the first item is 'define', then it is of the form:


                                          (define var expression)
                                      
Pop the var and expression from the array. Recursively evaluate expression and then set var in the symbol table to the result of the expression.

Evaluator for tinyScheme

Special Form #3

If the first item is 'set!', then it is of the form:


                                          (set! var expression)
                                      
Similar to handling define, except that it doesn't create a new variable in the symbol table, only replaces an existing symbol.

Also note that ! has no special meaning in scheme. Therefore the "!" in "set!" is merely part of the keyword.

Evaluator for tinyScheme

Special Form #4

If the first item is 'if', then it is of the form:


                                          (if condition true_exp false_exp)
                                      
Pop the condition, true_exp and false_exp from the array. Then, recursively evaluate the value of condition and depending on the return value, evaluate true_exp or false_exp.

Evaluator for tinyScheme

Special Form #5

If the first item is 'begin', then it is of the form:


                                          (begin exp exp ...)
                                      
Simply read the rest of the array items in a for loop and evaluate each one, then return the value of the last one.

Evaluator for tinyScheme

Special Form #6

If the first item is 'lambda', then it is of the form:


                                          (lambda (args) exp)
                                      
Return an anonymous subroutine that creates a new symbol table based on the args and evaluates the expression.

Evaluator for tinyScheme

General Form

If the first item is none of the above, then it is of the form:


                                          (proc_name arg1 arg2 ...)
                                      
Evaluate this recursively (which will fetch it out of the symbol table) and pass it the arguments.

Evaluator for tinyScheme

And that's all there is for the evaluator!
Up arrow

Symbol Table for tinyScheme

This is implemented as a linked list of objects, each of which has a couple of member variables and methods.

  • One member variable is a hash and this keeps track of symbols and their values.
  • Another member variable is a reference to an object's parent.
  • The object has an add() method, which simply adds/updates its hash member variable.
  • The object has a find() method to find the value corresponding to a symbol. (see below ↓)

Symbol Table for tinyScheme

The find() method works like this:

  • First, look at the hash member variable to see if we can find the symbol in it.
  • If not found in own hash member variable, see if the object has a parent and recursively call the parent's find() method.
  • If the object has no parent, we've reached the end of the symbol table and not found the symbol. Throw an error in this case.
  • This is how we can get local variables to override global variables and implement a hierarchy of variable scopes.

tinyScheme Globals

  • Implement a hash of some basic global functions for scheme (such as +, -, *, /, and, or, not, <, <= etc.). Not much error checking here.
  • Not all functions need to be implemented in perl. For instance, with <, =, or and not, we can implement the rest of the comparison operators in tinyScheme.
    • > can be implemented as (not <)
    • <= can be implemented as (< or =)
    • >= can be implemented as (not <) etc.

tinyScheme Interpreter

  • Creates a global symbol table object and copies the scheme globals into it.
  • Reads a line from standard input.
  • Calls the parser to convert it to an array of arrays.
  • Calls the evaluator and passes it the array and global symbol table object. Prints out the returned value.
  • That's all there is to it!

Does it actually work?

  • Simple arithmetic.
  • Define variable and use in expression.
  • Define a simple function.
  • Define a function that calls a previously declared function.
  • Define a recursive function (to simulate looping).
  • Equation solver using Newton-Raphson.
  • Implement comparison operators in tinyScheme.

What does it not do?

  • Need to add more functions to implement more of scheme as needed.
  • Uses 1 and 0 instead of true and false constants (hey, my language, my rules!). Can be easily fixed to do that, if needed.
  • Does NOT do tail recursion, so can blow stack.
  • No string type, though that can be easily added.
  • No error checking, no comments, but it works!

Links and References

Questions/Comments?