An informal language specification

Intro

I am developing an interpreter to close some gaps in my CS education. Using C I’ve started putting together a small object-oriented language. The language was inspired by wondering what would happen if I replaced the standard Eval-Apply loop in a Lisp with a Eval-Send loop that uses the type of a receiver to decide how to evaluate a call. This gives me a kind of SmallTalk with a Lispy syntax. Hence the language is christened HIST(Homo-Iconic SmallTalk).

At the moment I’m working on a lexer&parser to make it easy for me to load data structures into memory from text representations. Re-learning C and hand building a parser using TDD has lead me to realized that HIST needs more of a specification than, ‘Scheme but with Apply changed’. Putting it into spec is almost entirely about helping me find my next few test cases so I can start interpreting the data structures and get HIST to a REPL.

Spec

Data

All data must can be represented using a literal, and the print and lex functions are one another’s inverses. I intend only to build those types I NEED to build an interpreter,

Scalars

Numbers

For the moment we will borrow C’s number tools, and define our numbers as either Long Ints, or doubles written just as in C. (The lexer will need to

Chars

See strings. All a char is in our system is a single char length string. (This is intended to make utf-8 support easier)

Atom

Special self-evaluating objects, any string prefixed with a :. They answer all messages with themselves and test equality using the global function eq?

Bytes

This may be a later thing, but the notion of using < > to surround a number to indicate that the object has byte semantics and not math semantics seems like a good idea for

Collections

Strings, Lists & Hashes literals should be the minimum needed, as methods are defined as lists, and Hashes should be good for the symbol table. I will eventually need some way to define literal arrays, to make bootstrapping easier.

The last priority here would be syntactic sugar to display Objects that represent collections.

Strings

Lists are bounded by " ", or ' ', with those literals escaped by \. Under the hood I think I will handle them as lists of Chars in the first implementation, but I’d like to intern and have an immutable string type much like literally every other dynamic language ever.

Lists

Lists are opened with an unannotated ( each element is a white-space separated literal, and may be any HIST object. Lists are closed with ). I intend to implement this using a linked list structure, such that one could manipulate a list with all the standard Lisp functions as methods in the list type.

Hashes

Hashes syntax is borrowed from clojure. The unannotated ‘{‘ opens the hash, and the hash has the enforced structure of key1 val1 key2 val2. It is strongly recommended that hashes be written like so

{ key1 val1
  key2 val2
  key3 ( Some
         composite)}

Object Literals

Borrow from the hash syntax but prefix the opening brace with @ followed by the type-symbol like so.

@class-name{ 
  :values { Key1 val1
            key2 val2}
  :visible-methods { name docstring }}

Read as “A class name, with values … and methods …”

Evaluation

Quoting

An essential technique in lisps, the symbol quote and the special quote annotation “`()” on a list prevent the list from being evaluates and interpreted as a message send. Essentially quote is object that responds to all messages with the identity.

Eval

(receiver args...). When encountering a structure like this eval will essentially do the following pseudo-lisp

( define eval [expression]
  ( let (( receiver (expression :head)
        ( args (expression :tail)) 
    (if ( has-class? receiver) 
        ( let ( class (receiver :class))
           ( if (is-fn? class)
                ( apply receiver message)
                ( let ( msg (msg :head)
                        msg-args (msg :tail))
                  (apply ((class msg) receiver  msg-args))))
        ( do-literals receiver args))))

This is super sloppy but get’s the general idea across.

There seem to basically be three patterns for how an object can interpret a message.
1. If it never looks at the message it is universally self-evaluating. The only type I expect to be likely to do this is the Atom type.
2. If it treats the entire message as a single argument, parsing it as a list it is a function. These behave exactly like we would expect a lisp function to behave.
3. If the object changes its behavior based the contents of args it is a classic OOP dynamic object.

Be the first to leave a comment. Don’t be shy.

Join the Discussion

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>