Wednesday, June 04, 2008

Multi-line Strings in JavaScript

Warning: uber-geekery ahead...

In searching for an answer to the question "Does VB.NET support multi-line String literals" (short answer: no; slightly longer answer: you can do a gross hack in Visual Basic 9), I came across a blog post by a guy named Jovan Milosevic talking about Multiline strings in JavaScript.

Apparently, the JavaScript implementations of most (all?) modern web browsers support the following construct for multi-line String literals:

alert('foobar' == 'foo bar'); // false
alert('foobar' == 'foobar'); // true
alert('foobar' == 'foo\
bar'); // true!

This Jovan fellow writes:

Simply put, you can use backslash character to ignore end of line in JavaScript strings. This directly contradicts my JavaScript books. For example, David Flanagan's [JavaScript: The Definitive Guide], my JavaScript bible for almost a decade and counting, specifically says: "backslash escape cannot be used before a line break to continue a string." This is clearly incorrect. (I am so proud to have found an error in that book, I want to let everyone know.) I tested this technique in all browsers I could think of, and it works in all of them.

JavaScript (or, perhaps more accurately, ECMAScript) is, perhaps (and maybe even unreasonably), my favorite language. And I've actually read through the spec and I never realized you could do this. So, with Jovan's claim that "This is clearly incorrect; (I am so proud to have found an error in that book, I want to let everyone know)" hanging out there, I wanted to investigate a bit.

My testing in both FireFox and Rhino showed they both did support these multi-line string literals.

But, sure enough, as best I can tell, the (ECMAScript 3rd edition) spec does not allow for this construct for string literals. Here's the relevant grammar productions for the syntax from the spec:

StringLiteral ::
" DoubleStringCharactersopt "
' SingleStringCharactersopt '

DoubleStringCharacters ::
DoubleStringCharacter DoubleStringCharactersopt

SingleStringCharacters ::
SingleStringCharacter SingleStringCharactersopt

DoubleStringCharacter ::
SourceCharacter but not double-quote " or backslash \ or LineTerminator
\ EscapeSequence

SingleStringCharacter ::
SourceCharacter but not single-quote ' or backslash \ or LineTerminator
\ EscapeSequence

EscapeSequence ::
0 [lookahead ∉ DecimalDigit]

CharacterEscapeSequence ::

SingleEscapeCharacter :: one of
' " \ b f n r t v
NonEscapeCharacter ::
SourceCharacter but not EscapeCharacter or LineTerminator

EscapeCharacter ::

HexEscapeSequence ::
x HexDigit HexDigit

UnicodeEscapeSequence ::
u HexDigit HexDigit HexDigit HexDigit

And, at the end of section 7.8.4, there is the explicit note:

A 'LineTerminator' character cannot appear in a string literal, even if preceded by a backslash \. The correct way to cause a line terminator character to be part of the string value of a string literal is to use an escape sequence such as \n or \u000A.

So, where do these Rhino-and-browser-implementation-supported multi-line string literals come from?

Well, to be fully accurate, ECMAScript and JavaScript are not identical. To quote the Mozilla Developer Center wiki: "ECMAScript is the scripting language that forms the basis of JavaScript. JavaScript is a superset of ECMAScript, i.e. JavaScript is ECMAScript plus some additional features."

It took a bit of sleuthing to figure out if these multi-line string literals are part of those "additional features". The grammar Mozilla provides (for JavaScript 1.4) doesn't bother to define String, the Core JavaScript 1.5 Reference doesn't bother to define string literals beyond 'stringText' and "stringText", and the Core JavaScript Guide also doesn't mention anything about multi-line string literals.

I was eventually able to find a link to "JScript Deviations from ES3" (the differences between ECMAScript 3rd edition and Microsoft's JavaScript implementation) which provided the following:

2.3 String Literals: §7.8.4
JScript allows C-style escaped newlines in string literals; according to ES3 this would be a syntax error for unterminated string constant

var s = "this is a \
multiline string";

IE: taken as a single string.
FF: same as IE
Opera: same as IE
Safari: same as IE

IE eats away the '\' and the character following it. The s.length will evaluate to 34 (this includes the leading blanks in the second line). FF, Opera and Safari emulate IE.

Ah ha. (<smarm>It's Microsoft's fault! Why wasn't that obvious from the start?!</smarm>).

So, it's still a little fuzzy to me as to if one can say definitely that multi-line string literals are officially part of the JavaScript language. They definitely are not part of ECMAScript (3). They definitely are part of JScript. And, I suppose, they are perfectly valid to use in your JavaScript code that will be run under implementations that intend to mirror this functionality from JScript.

But, I would caution Jovan not to be quite so "proud" about David Flanagan's book being "incorrect". The support of multi-line string literals in JavaScript is not at all clear cut.


sleepnova said...

It doesn't escape the line break of a string, it escape the line break of the source! The line break doesn't not include in the resulting string.

Glenn Nilsson said...

Thank you, I just wondered the same thing. Where did it come from since it's not part of any standard...