Introduction to Rabbit-vm

A first script

Data types

rvm construction

Glossary

A first script

Commandline as Program

Rabbit-vm (vm) takes whatever comes in it's command line as program, runs it and exits. The canonical sanity check is to add 3 and 4 and print the result. Chances are, if that works out to 7 other numbers and other operations may work, too. If some values remain on stack, when program execution finished, these values are discarded. So, in order to see some results we must print them explicitly or use the atom 'Stacks.print'.

Bash does command line substitution. Therefore we need to quote some characters: '( ) *'. In the following we quote all commands with '...', if necessary or not.

rabbit 3 4 + iprint lf 
7
rabbit 1 2 3 4 
rabbit 1 2 3 4 Stacks.print

4 DS   1 2 3 4 
0 SS   
9 RS   
rabbit '('1 2 3 4')'lprint lf 
(1 2 3 4 ) 

Strings as Lists

Strings are represented as lists. There is no support for other characters than plain ASCII. String delimiters may be " or ' and strings may be nested.

"string 'in' string"
'string "in" string'
{docstrings are strings delimited with curly brackets.}

String escape characters like \n or \t are not recognized by the build in parser.

Docstrings are strings that are delimited by '{' and '}'. Docstrings are represented as lists with the symbol 'docstring' as first element. They are used, opposed to comments, when the documentation should be available in the running vm.

rabbit '  "Hello world!" print lf  '
Hello world!
rabbit '  (72 101 108 108 111 10)print  '
Hello
rabbit '  "Hello" lprint lf  '
(72 101 108 108 111 ) 
rabbit '  {Hello} lprint  '
(docstring 72 101 108 108 111 ) 

Comments

Comments start with ';' and go to end of line. They are ignored.

rabbit ' 3 4 + iprint lf    ; add 3 and 4 and print result  '
7

Documentation is build in.

Documentation of atoms is built into vm. Later on docstrings are used to provide documentation of definitions in a running vm.

rabbit '   `dup Symbol.docstring print lf  '
x -- x x
rabbit '   `zap dup Symbol.name print spc spc "{"print Symbol.docstring print "}" print lf  '
zap  {x -- }

A first definition to get documentation strings.

The word define is used to define new definitions.

`name {some docstring} (the quotation) define
rabbit '   `docstring-of {print docstring} ( dup Symbol.name print spc spc "{"print Symbol.docstring print "}" print lf )define  `nip docstring-of '
nip  {x y -- y}

A first rabbit script.

Create a rabbit script: local/snake.rabbit

; snake.rabbit
 
 
`docstring-of {u -- // print docstring of word u} (
	        dup
	        Symbol.name print spc spc
	        "{" print Symbol.docstring print "}" print lf
        )define
        
`doc { --  //  hiccup symbol and print it`s docstring} (
	        hiccup
                docstring-of
        )define
 
`docs {us -- // print docstring of symbols} (
                ( docstring-of )each
        )define
 

We can cat our script into the commandline, read the file and parse, or load it.

rabbit "   ` cat local/snake.rabbit ` "  ' `zap docstring-of '  
zap  {x -- }
rabbit '   "local/snake.rabbit" File.fetch   "snake" Parse.rabbit i   `File.fetch docstring-of  '
File.fetch  {s --  // read file s into sourcebuffer}
rabbit '   "local/snake.rabbit" load   `load docstring-of      '
load  {s --  // read file into sourcebuffer and execute.}

A first bash script to simplify invocation.

Create a bash script: snake

#! /bin/bash
rabbit '  "local/snake.rabbit" load  ' $@ 
snake ' `dup docstring-of '
dup  {x -- x x}
snake   doc leap   doc over
leap  {x y z -- x y z x}
over  {x y -- x y x}

Summary of words used so far.

snake '  ( ` hiccup load Parse.rabbit i Symbol.name print lprint iprint spc each docstring-of doc )docs  '
`  { -- u  // put    next item from running quotation on stack.}
hiccup  { -- x  // hiccup next item from calling quotation on stack}
load  {s --  // read file into sourcebuffer and execute.}
Parse.rabbit  {s -- l  // parse sourcebuffer.  s is name of source for errmess.}
i  {q -- // apply quotation}
Symbol.name  {u -- s}
print  {l --  // print string}
lprint  {l --  // print list raw}
iprint  {i --  // print integer}
spc  { --   // print space}
each  {l p --  // apply p to each element of l put on stack}
docstring-of  {u -- // print docstring of word u}
doc  { --  //  hiccup symbol and print it`s docstring}

Data types

Integers

The current implementation provides 62-bit-signed-integer.
There is currently no overflow checking and no exact definition of maxint. The idea is to provide at least 32-bit-signed-integer.

Symbols

Except special characters (see: Parser), all ASCII characters between space (32) and backspace (127) may be used in symbol names.

When vm sees a symbol for the first time it get's interned as if defined like:

`myname {} ( nodef myname ) define

If such a symbol is called 'nodef' causes a not-defined-error.
Symbols may be redefined. The first use of define on a symbol actually is a redefine.

rabbit  '  `myname Symbol.def lprint  '
(nodef myname ) 
rabbit  '  `myname {} () define   `myname Symbol.def lprint  '
(nop ) 
rabbit  '  `myname {} (3 4 +) define   `myname Symbol.def lprint  '
(3 4 + ) 

Symbols are stored in a symbol table. The current implementation has 5 slots for each symbol:

 Symbol.def {u -- q  // return quotation of symbol u}
 Symbol.docstring {u -- s}
 Symbol.name {u -- s}
 Symbol.source {u -- s i  // filename and line}

Lists

Lists are singly linked list of arbitrary nested structure. Lists are garbage collected.
Lists are denoted with round brackets. Square brackets are (part of) symbol names. Curly braces are string delimiters.

rvm construction

Rabbit-vm is a stack-machine, working as AST-interpreter, with a parser built-in.

Parser built-in

The parser sees symbols, numbers and lists. Strings are returned as lists of integers. Comments are ignored. Tokens are separated by whitespace. No whitespace is required between special characters. The special case is '^': if ^ comes at beginning of a word two symbols are returned.

 special characters:
        ( ) | : # ` , " " ' ' { }
rabbit   '( a b c 1 2 3 a1 2a a-14 a_14 A [ / . .. [..]  ) lprint '
(a b c 1 2 3 a1 2a a-14 a_14 A [ / . .. [..] ) 
rabbit   '( special:`|#, :name,name *name+name) lprint '
(special : ` | # , : name , name *name+name ) 
rabbit   '( otto ^otto ^^otto ^^^otto ^^otto^^speaking^^) lprint  '
(otto ^ otto ^^ otto ^^^ otto ^^ otto^^speaking^^ ) 

Stack machine

Rabbit-vm has three stacks: data stack ds, software stack ss or upstack and return stack rs. In Forth return stack is used for loop counters or other local used variables. vm uses the return stack with dip operations. Therefore a third stack comes in handy. Sometimes working with upstack feels like throwing a ball up and catching it at the right place when it falls down again.

rabbit '   15 up 3 4 + down * iprint lf'
105
rabbit '   1 2 3 4 5  ( dup * )dip2 Stacks.print  '

5 DS   1 2 9 4 5 
0 SS   
9 RS   
rabbit '   1 2 3 4 5  ^^* Stacks.print  '

4 DS   1 6 4 5 
0 SS   
9 RS   
snake '   ( ^ ^^ up down cpup cpdown uzap consup unconsdown )docs  '
^  {x -- x  // hiccup next word from instruction stream and dip it}
^^  {x y -- x y  // hiccup word and dip2 it}
up  {x --    //  [  -- x]    move x up to upstack}
down  {  -- x  //  [x --  ]    move x down from upstack}
cpup  {x -- x  //  [  -- x]    copy x to upstack}
cpdown  {  -- x  //  [x -- x]    copy x from upstack}
uzap  {  --    //  [x --  ]    zap at upstack}
consup  {x --    //  [l -- l]    cons x to list at upstack}
unconsdown  {  -- x  //  [l -- l]    uncons x from list at upstack}
snake ' doc Stacks.size   Stacks.size triple lprint '
Stacks.size  { -- i i i  // size of ds ss rs}
(256 256 1048576 ) 

AST interpreter

All programs are represented as singly linked list. vm moves through lists, instruction pointer stepping from list head to tail of the currently running program. There is no compilation or optimization between parser and vm. Vm exits, when the program read from commandline ends.

The current implementation is written in fasm assembly language. Simple atoms need one machine instruction:

 swap:
        xchg tos, sos   ; swap top of stack and second of stack
        exec            ; move to next instruction 

Moving to the next instruction needs 2 sequential 64-bit reads from memory and 3 branches.


macro exec { ; (  // continue executing prog:l )
	local .loop, .call_atom, .recur_on_list, .just_cons, .done

        jmp .done
    .loop:
    	mov a,    [prog]	; this quotation-element-value: int|sym|lst
	mov prog, [prog + ws]	; next quotation-element
        nosym? al, .just_cons
	cmp eax, dsp32
	jg .recur_on_list	; symbol points into list space   -->  call definition
    .call_atom:
	jmp a			; symbol points below list space  -->  call atom

    .recur_on_list:
	call f.execute
        jmp .done
        
    .just_cons:
        shiftup a               ; push quotation element to stack
    .done:
        notzero? prog, .loop    ; quotation has more elements?
	pop prog                ; pop calling quotation
	unlst prog
	ret                     ; return to calling atom
}


f.execute: ; ( a:l -- // exec l on stack )
	check_rs_overflow
	lst prog                ; push running quotation
	push prog
	mov prog, a             ; continue with new called quotation
the_real_exec:
        exec


Glossary

Atom
A build in function.
Definition
A function defined at runtime.
Function
A function from stack to stack.
It takes zero or more parameters from stack, does something with them or not and returns zero or more parameters back to stack.
Homoiconic
Code is data.
Code == data == list == stack == program == quotation == function == definition.
The same list of tokens can be data, program, definition or stack.
Quotation
A list intended to be executed as function.
Symbol
An interned identifier. As in lisp.
Token
Integer, symbol or list.
Characters are represented as integers. Strings are represented as lists of integers. False and true are 0 and 1. Nil is the empty list. Special characters are represented as symbols.
Word
Atom or Definition.

2019-12-01 12:34:39 http://sts-q.bitbucket.io