I mentioned that parsing is hard on the Scratch forums, and I was asked the following about tosh:
Lol, are you using some sort of parser generator or just making it from scratch? pun intended
The answer is complicated, and seemed like a good excuse for a longer post here.
As I've said before, tosh is a difficult language to parse.
In particular, the syntax highlighter has to be based on the same parser as the language itself. Most languages use a simpler system for highlighting, based on regular expressions and/or state machines. But I want the different operations to reflect the colour of the block in Scratch, so I need a full parser.
The code editor pane uses the excellent CodeMirror library. To implement syntax highlighting, tosh presents a custom mode to CodeMirror. CodeMirror gives me a single line at a time; I split the line into tokens and return the colour of each one.
So this informs the architecture of the parser: it must operate on a single line at a time, and it must use a tokenizer.
Here's what it looks like at the moment:
The language is defined using a context-free grammar, split into two parts:
- The core grammar is defined by hand, and contains things like arithmetic.
- It's then augmented with automatically-generated rules to define the rest of the Scratch blocks.