25.01.12

SpiderMonkey no longer supports sharp variables

Tags: , , , , — Jeff @ 10:17

ECMAScript object literals

ECMAScript, the standard underlying the JavaScript language, can represent simple objects and arrays using literals.

var arr = [1, 2, 3];
var obj = { property: "ohai" };

But it can’t represent all objects and arrays.

Cyclic and non-tree objects

ECMAScript literals can’t represent circular objects and arrays which (perhaps at some nesting distance) have properties referring to themselves (in other words, the objects form cyclic graphs). Nor can they (faithfully) represent objects which contain some other object multiple times (in other words, the objects form a directed acyclic graph which is not also a tree).

var arr = [1, "overwritten", 3];
arr[1] = arr; // cyclic
var obj = { property: "ohai", nest: {} };
obj.nest.parent = obj; // cyclic
obj.secondCopy = obj.nest; // non-tree, obj.nest is repeated

Sharp variables

SpiderMonkey historically supported extension syntax to represent such graphs under the name sharp variables. Sharp variables were inspired by Common Lisp syntax, and they enabled naming an object or array literal before it had been fully evaluated (even, to a limited extent, interacting with it). Netscape proposed sharp variables for inclusion in ES3, but the proposal was rejected as being too domain-specific and being arguably ugly. Since then the extension has lingered in SpiderMonkey but has seen very little use.

// Identical semantics using sharp variables
var arr = #1=[1, #1#, 3]; // #n= names an object being created, #n# refers to it
var obj = #1={ property: "ohai", nest: #2={ parent: #1# }, secondCopy: #2# };

No other ECMAScript implementer has since shown interest in implementing sharp variables. And with renewed efforts to evolve ECMAScript syntax, special characters like # are increasingly precious. Thus we’ve decided it’s time to remove sharp variable support from SpiderMonkey.

One benefit to removing sharp variables is that we can remove a good chunk of rarely-used code (and attack surface: sharp variables have been a source of some number of likely vulnerabilities) from SpiderMonkey. A syntax-removal patch added 79 lines and removed 1112 lines, including tests; not including tests, it added 42 lines and removed 677 lines. A subsequent patch to remove generation of sharp variable syntax from object decompilation added 65 lines and removed 128 lines. Removing sharp variables will also permit some simplifications now that evaluating a literal can’t have side effects beyond those in any nested property initializers.

Alternatives to sharp variables

Sharp variables may have been sometimes convenient, but they were mere syntactic sugar. It should be simple to convert any use of sharp variables to an equivalent sequence of property additions. If you were sufficiently aware of sharp variables to use them to represent non-tree objects, I trust I don’t have to explain how to do this.

Somewhat more interesting are the cases where decompiling an object produced sharp variable syntax, as when decompiling a cyclic object during debugging. (It’s worth noting in passing that decompilation will not infinitely recur: instead, it’ll bottom out with an empty object or an omitted property.) Jason Orendorff has written a sharps mini-library implementing decompilation of cyclic and non-tree objects which may be useful for this task.

When to expect this change

The sharp variable documentation on MDN has long noted that sharp variables were deprecated and would likely to be removed; a few months ago that warning was upgraded to a firm statement that they would be removed. The sharp variable removal patch that landed yesterday completes the process. The removal will first appear in either today or tomorrow’s nightly; in a week’s time it’ll make its way into the aurora branch, then the beta branch, and finally into Firefox 12. Versions of Firefox prior to 12 will not be affected by this removal, including the extended-support release Firefox 10.

06.06.11

I feel the need…the need for JSON parsing correctness and speed!

JSON and SpiderMonkey

JSON is a handy serialization format for passing data between servers and browsers and between independent, cooperating web pages. It’s increasingly the format of choice for website APIs.

ECMAScript 5 (the standard underlying JavaScript) includes built-in support for producing and parsing JSON. SpiderMonkey has included such support since before ES5 added it.

SpiderMonkey’s support, because it predated ES5, hasn’t always agreed with ES5. Also, because JSON support was added before it became ubiquitous on the web, it wasn’t written with raw speed in mind.

Improving JSON.parse

We’ve now improved JSON parsing in Firefox 5 to be fast and fully conformant with ES5. For awhile we’ve made improvements to JSON by piecemeal change. This worked for small bug fixes, and it probably would have worked to fix the remaining conformance bugs. But performance is different: to improve performance we needed to parse in a fundamentally different way. It was time for a rewrite.

What parsing bugs got fixed?

The bugs the new parser fixes are quite small and generally shouldn’t affect sites, in part because other browsers overwhelmingly don’t have these bugs. We’ve had no compatibility reports for these fixes in the month and a half they’ve been in the tree:

  • The number syntax is properly stricter:
    • Octal numbers are now syntax errors.
    • Numbers containing a decimal point must now include a fractional component (i.e. 1. is no longer accepted).
  • JSON.parse("this") now throws a SyntaxError rather than evaluate to true, due to a mistake reusing our keyword parser. (Hysterically, because we used our JSON parser to optimize eval in certain cases, this change means that eval("(this)") will no longer evaluate to true.)
  • Strings can’t contain tab characters: JSON.parse('"\t"') now properly throws a SyntaxError.

This list of changes should be complete, but it’s possible I’ve missed others. Parsing might be a solved problem in the compiler literature, but it’s still pretty complicated. I could have missed lurking bugs in the old parser, and it’s possible (although I think less likely) that I’ve introduced bugs in the new parser.

What about speed?

The new parser is much faster than the old one. Exactly how fast depends on the data you’re parsing. For example, on Opera’s simple parse test, I get around 156000 times/second in Firefox 4, but in Firefox 5 with the new JSON parser I get around 339000 times/second (bigger is better). On a second testcase, Kraken’s JSON.parse test (json-parse-financial, to be precise), I get a 4.0 time of around 140ms and a 5.0 time of around 100ms (smaller is better). (In both cases I’m comparing builds containing far more JavaScript changes than just the new parser, to be sure. But I’m pretty sure the bulk of the performance improvements in these two cases are due to the new parser.) The new JSON parser puts us solidly in the center of the browser pack.

It’ll only get better in the future as we wring even more speed out of SpiderMonkey. After all, on the same system used to generate the above numbers, IE gets around 510000 times/second. I expect further speedup will happen during more generalized performance improvements: improving the speed of defining new properties, improving the speed with which objects are allocated, improving the speed of creating a property name from a string, and so on. As we perform such streamlining, we’ll parse JSON even faster.

Side benefit: better error messages

The parser rewrite also gives JSON.parse better error messages. With the old parser it would have been difficult to provide useful feedback, but in the new parser it’s easy to briefly describe the reason for syntax errors.

js> JSON.parse('{ foo: 17 }'); // unquoted property name
(old) typein:1: SyntaxError: JSON.parse
(new) typein:1: SyntaxError: JSON.parse: expected property name or '}'

We can definitely do more here, perhaps by including context for the error from the provided string, but this is nevertheless a marked improvement over the old parser’s error messages.

Bottom line

JSON.parse in Firefox 5 is faster, follows the spec, and tells you what went wrong if you give it bad data. ’nuff said.

06.03.11

JavaScript change in Firefox 5 (not 4), and in other browsers: regular expressions can’t be called like functions

Callable regular expressions

Way back in the day when Netscape implemented regular expressions in JavaScript, it made them callable. If you slapped an argument list after a regular expression, it’d act as if you called RegExp.prototype.exec on it with the provided arguments.

var r = /abc/, res;

res = r("abc");
assert(res.length === 1);

res = r("def");
assert(res === null);

Why? Beats me. I’d have thought .exec was easy enough to type and clearer to boot, myself. Hopefully readers familiar with the history can explain in comments.

Problems

Callable regular expressions present one immediate problem to a “naive” implementation: their behavior with typeof. According to ECMAScript, the typeof for any object which is callable should be "function", and Netscape and Mozilla for a long time faithfully implemented this. This tended to cause much confusion in practice, so browsers that implemented callable regular expressions eventually changed typeof to arguably “lie” for regular expressions and return "object". In SpiderMonkey the “fix” was an utterly inelegant hack which distinguished callables as either regular expressions or not, to determine typeof behavior.

Past this, callable regular expressions complicate implementing callability and optimizations of it. Implementations supporting getters and setters (once purely as an extension, now standardized in ES5) must consider the case where the getter or setter is a regular expression and do something appropriate. And of course they must handle regular old calls, qualified (/a/()) and unqualified (({ p: /a/ }).p()) both. Mozilla’s had a solid trickle of bugs involving callable regular expressions, almost always filed as a result of Jesse‘s evil fuzzers (and not due to actual sites breaking).

It’s also hard to justify callable regular expressions as an extension. While ECMAScript explicitly permits extensions, it generally prefers extensions to be new methods or properties of existing objects. Regular expression callability is neither of these: instead it’s adding an internal hook to regular expressions to make them callable. This might not technically be contrary to the spec, but it goes against its spirit.

Regular expressions won’t be callable in Firefox 5

No one’s ever really used callable regular expressions. They’re non-standard, not all browsers implement them, and they unnecessarily complicate implementations. So, in concert with other browser engines like WebKit, we’re making regular expressions non-callable in Firefox 5. (Regular expressions are callable in Firefox 4, but of course don’t rely on this.)

You can experiment with a version of Firefox with these changes by downloading a TraceMonkey nightly build. Trunk’s still locked down for Firefox 4, so it won’t pick up the change until Firefox 4 branches and trunk reopens for changes targeted at the next release. (Don’t forget to use the profile manager if you want to keep the settings you use with your primary Firefox installation pristine.)

03.02.11

Working on the JS engine, Episode IV

A testcase submitted to us today:

([][(![]+[])[!+[]+!+[]+!+[]]+(!![]+[][(![]+[])[+[]]+(![]+[]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[+!+[]+[+[]]]+(!![]+[])[+!+[]]+(!![]+[])[+[]]][([][(![]+[])[+[]]+(![]+[]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]]+[])[!+[]+!+[]+!+[]]+(![]+[])[+!+[]]+(![]+[])[!+[]+!+[]]+(![]+[])[!+[]+!+[]]]()+[])[!+[]+!+[]+!+[]]+(!![]+[][(![]+[])[+[]]+(![]+[]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[+!+[]+[+[]]]+([][(![]+[])[!+[]+!+[]+!+[]]+(!![]+[][(![]+[])[+[]]+(![]+[]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[+!+[]+[+[]]]+(!![]+[])[+!+[]]+(!![]+[])[+[]]][([][(![]+[])[+[]]+(![]+[]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]]+[])[!+[]+!+[]+!+[]]+(![]+[])[+!+[]]+(![]+[])[!+[]+!+[]]+(![]+[])[!+[]+!+[]]]()+[])[!+[]+!+[]]

The result according to ES3, plus a common implementation-specific behavior, is the string "job".

The result according to ES5, plus a common implementation-specific behavior, is a thrown TypeError.

24.01.11

New ES5 strict mode requirement: function statements not at top level of a program or function are prohibited

Function statements in ECMAScript

What’s the effect of this program according to ECMAScript?

function foo() { }

If you said that it defines a function as a property of the global object, congratulations! You’ve mastered a basic part of JavaScript syntax.

Let’s go a little trickier: what is the effect of the function defined in this program according to ECMAScript?

function foo()
{
  return g;
  function g() { }
}

This function, when called, defines a local variable g whose value is the specified function. Then it returns that function as the value of that variable. If you knew this as well, give yourself a gold star.

Now let’s try something even harder: what’s the effect of these programs?

if (true)
  function bar() { }
function g() { }
function foo()
{
  if (true)
    function g() { }
  return g;
}

Shenanigans!

Trick question! They fail to run due to syntax errors.

ECMAScript permits function statements in exactly two places: directly within the list of statements that make up a program, and directly within the list of statements that make up the contents of a function body. These are the first two examples. (A function statement also looks like an expression, but if it appears in expression context it’s a function expression, not a function statement.) Engines which permit a function statement anywhere else — as the child of a block statement enclosed by curly braces, as the child of a loop or condition, as the child of a with, or as the child of a case or default in a switch statement — do so by extending ES5.

Spec requirements aside, what are the semantics of extensionland function statements?

Now you’re just messing with me

Which semantics?

Browsers all implement extensionland function statements differently, with different semantics. Use them just so and they’ll work the same way across browsers. Use them in any way where the function statement conditionally executes, or where you start capturing the binding for the function in different locations, and you’ll find any semblance of cross-browser compatibility disappears. This example by Rich Dougherty, used with permission, demonstrates some of the incompatibilities (and I wonder whether function statements in with might present more):

var result = [];
result.push(f());
function f() { return 1; }
result.push(f());
if (1)
{
  result.push(f());
  function f() { return 2; }
  result.push(f());
}
result.push(f());
function y()
{
  result.push(g());
  function g() { return 3; }
  result.push(g());
  if (1)
  {
    result.push(g());
    function g() { return 4; }
    result.push(g());
  }
  result.push(g());
}
y();
print(result);

Results in different browsers vary a fair bit, although there’s a little more consensus on behavior now than at the time this example was originally written:

Browser Output
Firefox 1.5 and 2 1,1,1,2,2,3,3,3,3,3
Firefox 4 1,1,1,2,2,3,3,3,4,4
Opera 2,2,2,2,2,4,4,4,4,4
Internet Explorer 7 2,2,2,2,2,4,4,4,4,4
Safari 3 1,1,2,2,2,3,3,4,4,4
Safari 4 2,2,2,2,2,4,4,4,4,4
Chrome 2,2,2,2,2,4,4,4,4,4

Why not specify semantics?

Blindly specifying some particular behavior won’t work. Many sites these days (and different browser-specific implementations of those sites) rely on engine-specific behavior with user-agent-conditioned code. Changing browser behavior breaks that pretty hard. Specification will break any browsers not already implementing it at time of specification.

A way forward

The next version of ECMAScript would like to specify semantics for this case — quite possibly semantics not implemented by any browser. How to do it, if implementations irreconcilably disagree? The solution comes in two parts. First, “ES6” will require affirmative opt-in to enable new syntax and semantics, including for currently-nonstandard function statements. Second, in anticipation of that change, the ECMA committee recommends that non-standard function statements be forbidden in strict mode code, to open up a future path down which ES6 can walk.

To permit ES6 to standardize semantics, the ECMA committee recommends forbidding non-standard function statements in strict mode code. Thus these examples are syntax errors:

"use strict";
{
  function foo() { }
}
"use strict";
if (true)
  function bar() { }
"use strict";
with (obj)
  function foo() { }
"use strict";
for (;;)
  function foo() { }
"use strict";
switch (v)
{
  case 10:
    function bar() { }
  default:
    function baz() { }
}

Both Firefox and WebKit now implement this restriction, and other engines will follow as they too implement strict mode.

Conclusion

In order for future versions of ECMAScript to be able to define semantics for extensionland functions, strict mode “clears the deck” and forbids them entirely. Instead, assign functions to variables, a la var f = function() { };. Semantics for this are completely defined and compatibly implemented across browsers.

You can experiment with a version of Firefox with these changes by downloading a nightly build. (Don’t forget to use the profile manager if you want to keep the settings you use with your primary Firefox installation pristine.)

« NewerOlder »