Introduction to Rabbit-vm
A first script
Commandline as Program
Rabbit-vm (vm) takes whatever comes in it's command line as program, runs it and exits.
The canonical sanity check is to add 3 and 4 and print the result. Chances are, if that works
out to 7 other numbers and other operations may work, too.
If some values remain on stack, when program execution finished, these values are discarded.
So, in order to see some results we must print them explicitly or use the atom 'Stacks.print'.
Bash does command line substitution.
Therefore we need to quote some characters: '( ) *'.
In the following we quote all commands with '...', if necessary or not.
rabbit 1 2 3 4 Stacks.print
rabbit '('1 2 3 4')'lprint lf
Strings as Lists
Strings are represented as lists. There is no support for other characters than plain ASCII.
String delimiters may be " or ' and strings may be nested.
"string 'in' string"
'string "in" string'
{docstrings are strings delimited with curly brackets.}
String escape characters like \n or \t are not recognized by the build in parser.
Docstrings are strings that are delimited by '{' and '}'.
Docstrings are represented as lists with the symbol 'docstring' as first element.
They are used, opposed to comments, when the documentation should be available in the running vm.
rabbit ' "Hello world!" print lf '
rabbit ' (72 101 108 108 111 10)print '
rabbit ' "Hello" lprint lf '
rabbit ' {Hello} lprint '
(docstring 72 101 108 108 111 )
Comments
Comments start with ';' and go to end of line. They are ignored.
rabbit ' 3 4 + iprint lf ; add 3 and 4 and print result '
Documentation is build in.
Documentation of atoms is built into vm. Later on docstrings are used to provide documentation
of definitions in a running vm.
rabbit ' `dup Symbol.docstring print lf '
rabbit ' `zap dup Symbol.name print spc spc "{"print Symbol.docstring print "}" print lf '
A first definition to get documentation strings.
The word define is used to define new definitions.
`name {some docstring} (the quotation) define
rabbit ' `docstring-of {print docstring} ( dup Symbol.name print spc spc "{"print Symbol.docstring print "}" print lf )define `nip docstring-of '
A first rabbit script.
Create a rabbit script: local/snake.rabbit
; snake.rabbit
`docstring-of {u -- // print docstring of word u} (
dup
Symbol.name print spc spc
"{" print Symbol.docstring print "}" print lf
)define
`doc { -- // hiccup symbol and print it`s docstring} (
hiccup
docstring-of
)define
`docs {us -- // print docstring of symbols} (
( docstring-of )each
)define
We can cat our script into the commandline, read the file and parse, or load it.
rabbit " ` cat local/snake.rabbit ` " ' `zap docstring-of '
rabbit ' "local/snake.rabbit" File.fetch "snake" Parse.rabbit i `File.fetch docstring-of '
File.fetch {s -- // read file s into sourcebuffer}
rabbit ' "local/snake.rabbit" load `load docstring-of '
load {s -- // read file into sourcebuffer and execute.}
A first bash script to simplify invocation.
Create a bash script: snake
#! /bin/bash
rabbit ' "local/snake.rabbit" load ' $@
snake ' `dup docstring-of '
leap {x y z -- x y z x}
over {x y -- x y x}
Summary of words used so far.
snake ' ( ` hiccup load Parse.rabbit i Symbol.name print lprint iprint spc each docstring-of doc )docs '
` { -- u // put next item from running quotation on stack.}
hiccup { -- x // hiccup next item from calling quotation on stack}
load {s -- // read file into sourcebuffer and execute.}
Parse.rabbit {s -- l // parse sourcebuffer. s is name of source for errmess.}
i {q -- // apply quotation}
Symbol.name {u -- s}
print {l -- // print string}
lprint {l -- // print list raw}
iprint {i -- // print integer}
spc { -- // print space}
each {l p -- // apply p to each element of l put on stack}
docstring-of {u -- // print docstring of word u}
doc { -- // hiccup symbol and print it`s docstring}
Data types
Integers
The current implementation provides 62-bit-signed-integer.
There is currently no overflow checking and no exact definition of maxint.
The idea is to provide at least 32-bit-signed-integer.
Symbols
Except special characters (see: Parser),
all ASCII characters between space (32) and backspace (127) may be used in symbol names.
When vm sees a symbol for the first time it get's interned as if defined like:
`myname {} ( nodef myname ) define
If such a symbol is called 'nodef' causes a not-defined-error.
Symbols may be redefined. The first use of define on a symbol actually is a redefine.
rabbit ' `myname Symbol.def lprint '
rabbit ' `myname {} () define `myname Symbol.def lprint '
rabbit ' `myname {} (3 4 +) define `myname Symbol.def lprint '
Symbols are stored in a symbol table.
The current implementation has 5 slots for each symbol:
Symbol.def {u -- q // return quotation of symbol u}
Symbol.docstring {u -- s}
Symbol.name {u -- s}
Symbol.source {u -- s i // filename and line}
Lists
Lists are singly linked list of arbitrary nested structure.
Lists are garbage collected.
Lists are denoted with round brackets. Square brackets are (part of) symbol names. Curly braces
are string delimiters.
rvm construction
Rabbit-vm is a stack-machine, working as AST-interpreter, with a parser built-in.
Parser built-in
The parser sees symbols, numbers and lists.
Strings are returned as lists of integers.
Comments are ignored.
Tokens are separated by whitespace.
No whitespace is required between special characters.
The special case is '^': if ^ comes at beginning of a word two symbols are returned.
special characters:
( ) | : # ` , " " ' ' { }
rabbit '( a b c 1 2 3 a1 2a a-14 a_14 A [ / . .. [..] ) lprint '
(a b c 1 2 3 a1 2a a-14 a_14 A [ / . .. [..] )
rabbit '( special:`|#, :name,name *name+name) lprint '
(special : ` | # , : name , name *name+name )
rabbit '( otto ^otto ^^otto ^^^otto ^^otto^^speaking^^) lprint '
(otto ^ otto ^^ otto ^^^ otto ^^ otto^^speaking^^ )
Stack machine
Rabbit-vm has three stacks: data stack ds, software stack ss or upstack and return stack rs.
In Forth return stack is used for loop counters or other local used variables.
vm uses the return stack with dip operations.
Therefore a third stack comes in handy.
Sometimes working with upstack feels like throwing a ball up and catching it at the right place when
it falls down again.
rabbit ' 15 up 3 4 + down * iprint lf'
rabbit ' 1 2 3 4 5 ( dup * )dip2 Stacks.print '
rabbit ' 1 2 3 4 5 ^^* Stacks.print '
snake ' ( ^ ^^ up down cpup cpdown uzap consup unconsdown )docs '
^ {x -- x // hiccup next word from instruction stream and dip it}
^^ {x y -- x y // hiccup word and dip2 it}
up {x -- // [ -- x] move x up to upstack}
down { -- x // [x -- ] move x down from upstack}
cpup {x -- x // [ -- x] copy x to upstack}
cpdown { -- x // [x -- x] copy x from upstack}
uzap { -- // [x -- ] zap at upstack}
consup {x -- // [l -- l] cons x to list at upstack}
unconsdown { -- x // [l -- l] uncons x from list at upstack}
snake ' doc Stacks.size Stacks.size triple lprint '
Stacks.size { -- i i i // size of ds ss rs}
(256 256 1048576 )
AST interpreter
All programs
are represented as singly linked list. vm moves through lists, instruction pointer
stepping from list head to tail of the currently running program.
There is no compilation or optimization between parser and vm. Vm exits, when the program
read from commandline ends.
The current implementation is written in fasm assembly language. Simple atoms need
one machine instruction:
swap:
xchg tos, sos ; swap top of stack and second of stack
exec ; move to next instruction
Moving to the next instruction needs 2 sequential 64-bit reads from memory and 3 branches.
macro exec { ; ( // continue executing prog:l )
local .loop, .call_atom, .recur_on_list, .just_cons, .done
jmp .done
.loop:
mov a, [prog] ; this quotation-element-value: int|sym|lst
mov prog, [prog + ws] ; next quotation-element
nosym? al, .just_cons
cmp eax, dsp32
jg .recur_on_list ; symbol points into list space --> call definition
.call_atom:
jmp a ; symbol points below list space --> call atom
.recur_on_list:
call f.execute
jmp .done
.just_cons:
shiftup a ; push quotation element to stack
.done:
notzero? prog, .loop ; quotation has more elements?
pop prog ; pop calling quotation
unlst prog
ret ; return to calling atom
}
f.execute: ; ( a:l -- // exec l on stack )
check_rs_overflow
lst prog ; push running quotation
push prog
mov prog, a ; continue with new called quotation
the_real_exec:
exec
Glossary
- Atom
- A build in function.
- Definition
- A function defined at runtime.
- Function
- A function from stack to stack.
It takes zero or more parameters from stack, does something with them or not and returns zero or more parameters back to stack.
- Homoiconic
- Code is data.
Code == data == list == stack == program == quotation == function == definition.
The same list of tokens can be data, program, definition or stack.
- Quotation
- A list intended to be executed as function.
- Symbol
- An interned identifier. As in lisp.
- Token
- Integer, symbol or list.
Characters are represented as integers.
Strings are represented as lists of integers.
False and true are 0 and 1. Nil is the empty list.
Special characters are represented as symbols.
- Word
- Atom or Definition.
2019-12-01 12:34:39
http://sts-q.bitbucket.io