everything is programming ♪♫

String syntax


May 29, 2015 • 4 min read • Filed under tosh

When I first wrote about tosh, I mentioned that design is difficult.

As it turns out, designing a programming language is hard. Even when you're only designing the syntax, and not also the semantics. There are so many little details! And every one of them matters, because they all affect how nice the language is to use.

I'm going to talk about one such decision: strings.

Tosh needs a way to delimit text inputs (strings), since otherwise it can't tell where a text input ends. Take the following example:

say join Hello, name

Is name supposed to be a text input or a variable reporter? We can't tell without knowing what variables are defined.

So for tosh I have chosen to delimit string literals. This makes the code easier to read--you don't have to know what variables are defined in order to understand it--as well as easier to parse.

Nathan seemed to disagree with me on this point:

...you shouldn't need to delimit anything, including string literals.

He's often right, which makes me uneasy...

Anyway: what syntax to delimit strings should tosh use?

At first I used single- or double-quoted strings, in imitation of Python and many other programming languages.

Then I read A Short Quiz About Language Design, which made me think more carefully about the decision. James Hague points out that quotes are a bad choice of delimiter, since quote characters are often used in English text.

Of course, if we try and write a double quote inside a string, Python will think it marks the end of the string! (I'm using Python as an example in what follows, but a lot of languages work the same way.)

"He asked "What is it?", and waited expectantly."

It wants to interpret 'What is it?' as Python code, and so the input will fail to parse.

We can use backslash escaping to avoid this. Putting a backslash in front of the quote makes Python interpret it as part of the string, instead of its end:

"He asked \"What is it?\", and waited expectantly."

But the backslashes look ugly.

We can mitigate this by using single-quoted strings:

'He asked "What is it?", and waited expectantly.'

But eventually we'll want to use both kinds of quote character, so we'll need backslashes again:

'She replied, "It\'s a boy!"'

We're always going to need escaping sometimes, whatever delimiter character we use. Otherwise you can never write that character in a string! And since tosh aims to be compatible with all Scratch projects, it needs to support writing any character, including backslashes.

(Fortunately Scratch disallows newlines in text inputs, so the only things we need to escape are the delimiter and the escape character, eg. quote " and backslash \.)

Still: if I chose a different delimiter, we wouldn't need to use escaping so often.

Options

He suggests the pipe character |, since it's rarely used in English text.

scratchblocks uses square brackets [ ] to delimit text inputs, so I could also use that.

Backslash-escaping has the disadvantage that it makes the backslash character special, in addition to the delimiter. To write a backslash, you must backslash-escape it: "\\". Could I use a different escaping method instead, such as doubling-up the delimiter?

Here are all the options I came up with:

code

It was hard to choose. Summarising the options:

  • Quotes: these are "traditional", so they'll be familiar to professional coders.

    And since a goal of tosh is to help beginners transition from Scratch to text-based coding, using quotes rather than a more unusual syntax would avoid them having to learn another syntax for strings later on.

  • Square brackets: this is what scratchblocks uses, so it would be familiar to forum users. I expect most of tosh's initial audience will be active forum users.

    I like this option: square brackets are sensible and rarely used in English. Combined with backslash-escaping, they'd still have familiar rules for programmers.

  • Pipes: These look nice, and are also rarely used. But they're more awkward to type: will beginners be able to type them quickly?

    In addition, doubling the delimiter looks confusing when the opening and closing symbols are the same.

I studied a variety of projects by advanced Scratch users, to see if either of the characters |/] were less common than the other (and therefore more suitable). But in my small sample neither option was significantly better.

None of the options make escaping easy to discover. But I could add helpful error messages to mitigate this.

I have design principles for tosh which can help with making choices like this. But like most design problems, it comes down a tradeoff between different factors. Should I favour:

  • aesthetics?
  • usability?
  • similarity to scratchblocks?
  • similarity to Python?

In the end, I'm going to stick with my original choice: quoted strings with backslash escaping.

New programmers are going to have to learn how escaping works eventually. Using something unusual instead of backslash-escaping will upset experienced programmers, without helping anyone.

I'm using quoted strings for similar reasons. Helping beginners learn to use Python-like strings is more important than similarity to scratchblocks. Superficial similarity would actually be bad, since they behave subtly differently! I'm already trying hard to communicate that tosh is not scratchblocks.

I think there's a principle of interface design: similar things should look similar; things that behave differently should look different. (This is related to the principle of least astonishment--maybe a corollary?) So making tosh look different to scratchblocks is beneficial, since they do behave differently.

Aftermath

So after all that, I didn't change my mind. But thinking through all the design choices was still useful.

And I came up with a couple of subtle improvements:

  • Allow only valid escape characters.

    Some languages allow you to write "foo\bar" (but they may warn you). This is just confusing: should "foo\\bar" parse the same or differently? It's hard to say.

    In tosh, this is a syntax error: you must write "foo\\bar".

  • Use syntax-highlighted backslashes.

    code

    This reduces emphasis on the backslash, so the content of the string is easier to see. And it should help beginners understand how escaping works.

    I'm highlighting only the backslash and not the escaped character, because the backslash is part of the representation; while the character itself is part of the content.

    (Admittedly the backslash colour needs work.)

I hope this was an interesting insight into some of the thinking behind tosh's design. If you disagree with any of the above, let me know!