03.02.11

Working on the JS engine, Episode IV

A testcase submitted to us today:

([][(![]+[])[!+[]+!+[]+!+[]]+(!![]+[][(![]+[])[+[]]+(![]+[]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[+!+[]+[+[]]]+(!![]+[])[+!+[]]+(!![]+[])[+[]]][([][(![]+[])[+[]]+(![]+[]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]]+[])[!+[]+!+[]+!+[]]+(![]+[])[+!+[]]+(![]+[])[!+[]+!+[]]+(![]+[])[!+[]+!+[]]]()+[])[!+[]+!+[]+!+[]]+(!![]+[][(![]+[])[+[]]+(![]+[]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[+!+[]+[+[]]]+([][(![]+[])[!+[]+!+[]+!+[]]+(!![]+[][(![]+[])[+[]]+(![]+[]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[+!+[]+[+[]]]+(!![]+[])[+!+[]]+(!![]+[])[+[]]][([][(![]+[])[+[]]+(![]+[]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]]+[])[!+[]+!+[]+!+[]]+(![]+[])[+!+[]]+(![]+[])[!+[]+!+[]]+(![]+[])[!+[]+!+[]]]()+[])[!+[]+!+[]]

The result according to ES3, plus a common implementation-specific behavior, is the string "job".

The result according to ES5, plus a common implementation-specific behavior, is a thrown TypeError.

24.01.11

New ES5 strict mode requirement: function statements not at top level of a program or function are prohibited

Function statements in ECMAScript

What’s the effect of this program according to ECMAScript?

function foo() { }

If you said that it defines a function as a property of the global object, congratulations! You’ve mastered a basic part of JavaScript syntax.

Let’s go a little trickier: what is the effect of the function defined in this program according to ECMAScript?

function foo()
{
  return g;
  function g() { }
}

This function, when called, defines a local variable g whose value is the specified function. Then it returns that function as the value of that variable. If you knew this as well, give yourself a gold star.

Now let’s try something even harder: what’s the effect of these programs?

if (true)
  function bar() { }
function g() { }
function foo()
{
  if (true)
    function g() { }
  return g;
}

Shenanigans!

Trick question! They fail to run due to syntax errors.

ECMAScript permits function statements in exactly two places: directly within the list of statements that make up a program, and directly within the list of statements that make up the contents of a function body. These are the first two examples. (A function statement also looks like an expression, but if it appears in expression context it’s a function expression, not a function statement.) Engines which permit a function statement anywhere else — as the child of a block statement enclosed by curly braces, as the child of a loop or condition, as the child of a with, or as the child of a case or default in a switch statement — do so by extending ES5.

Spec requirements aside, what are the semantics of extensionland function statements?

Now you’re just messing with me

Which semantics?

Browsers all implement extensionland function statements differently, with different semantics. Use them just so and they’ll work the same way across browsers. Use them in any way where the function statement conditionally executes, or where you start capturing the binding for the function in different locations, and you’ll find any semblance of cross-browser compatibility disappears. This example by Rich Dougherty, used with permission, demonstrates some of the incompatibilities (and I wonder whether function statements in with might present more):

var result = [];
result.push(f());
function f() { return 1; }
result.push(f());
if (1)
{
  result.push(f());
  function f() { return 2; }
  result.push(f());
}
result.push(f());
function y()
{
  result.push(g());
  function g() { return 3; }
  result.push(g());
  if (1)
  {
    result.push(g());
    function g() { return 4; }
    result.push(g());
  }
  result.push(g());
}
y();
print(result);

Results in different browsers vary a fair bit, although there’s a little more consensus on behavior now than at the time this example was originally written:

Browser Output
Firefox 1.5 and 2 1,1,1,2,2,3,3,3,3,3
Firefox 4 1,1,1,2,2,3,3,3,4,4
Opera 2,2,2,2,2,4,4,4,4,4
Internet Explorer 7 2,2,2,2,2,4,4,4,4,4
Safari 3 1,1,2,2,2,3,3,4,4,4
Safari 4 2,2,2,2,2,4,4,4,4,4
Chrome 2,2,2,2,2,4,4,4,4,4

Why not specify semantics?

Blindly specifying some particular behavior won’t work. Many sites these days (and different browser-specific implementations of those sites) rely on engine-specific behavior with user-agent-conditioned code. Changing browser behavior breaks that pretty hard. Specification will break any browsers not already implementing it at time of specification.

A way forward

The next version of ECMAScript would like to specify semantics for this case — quite possibly semantics not implemented by any browser. How to do it, if implementations irreconcilably disagree? The solution comes in two parts. First, “ES6” will require affirmative opt-in to enable new syntax and semantics, including for currently-nonstandard function statements. Second, in anticipation of that change, the ECMA committee recommends that non-standard function statements be forbidden in strict mode code, to open up a future path down which ES6 can walk.

To permit ES6 to standardize semantics, the ECMA committee recommends forbidding non-standard function statements in strict mode code. Thus these examples are syntax errors:

"use strict";
{
  function foo() { }
}
"use strict";
if (true)
  function bar() { }
"use strict";
with (obj)
  function foo() { }
"use strict";
for (;;)
  function foo() { }
"use strict";
switch (v)
{
  case 10:
    function bar() { }
  default:
    function baz() { }
}

Both Firefox and WebKit now implement this restriction, and other engines will follow as they too implement strict mode.

Conclusion

In order for future versions of ECMAScript to be able to define semantics for extensionland functions, strict mode “clears the deck” and forbids them entirely. Instead, assign functions to variables, a la var f = function() { };. Semantics for this are completely defined and compatibly implemented across browsers.

You can experiment with a version of Firefox with these changes by downloading a nightly build. (Don’t forget to use the profile manager if you want to keep the settings you use with your primary Firefox installation pristine.)

10.01.11

New ES5 strict mode support: new vars created by strict mode eval code are local to that code only

tl;dr

Ideally you shouldn’t use eval, because it inhibits many optimizations and makes code run slower. new Function(argname1, argname2, ..., code) doesn’t inhibit optimizations, so it’s a better way to dynamically generate code. (This may require passing in arguments instead of using names for local variables. That’s the price you pay to play nice with compilers that could optimize but for eval.)

Nevertheless, it’s still possible to use eval in ES5. eval of normal code behaves as it always has. But eval of strict mode code behaves differently: any variables created by the code being evaluated affect only that code, not the enclosing scope. (The enclosing scope’s variables are still visible in the eval code if they aren’t shadowed.) Firefox now correctly binds variables created by strict mode eval code, completing the last major component of strict mode (though bugs remain). These programs demonstrate the idea:

var x = 2, y = 3;
print(eval("var x = 9; x"));               // prints 9
print(x);                                  // prints 9
print(eval("'use strict'; var x = 5; x")); // prints 5
print(eval("'use strict'; var x = 7; x")); // prints 7
print(eval("'use strict'; y"));            // prints 3
print(x);                                  // prints 9
"use strict";
var x = 2, y = 3;
// NB: Strictness propagates into eval code evaluated by a
//     direct call to eval — a call occurring through an
//     expression of the form eval(...).
print(eval("var x = 5; x")); // prints 5
print(eval("var x = 7; x")); // prints 7
print(eval("y"));            // prints 3
print(x);                    // prints 2

This partially defangs eval. But even strict mode eval inhibits optimizations, so you are still better off avoiding it.

eval is a double-edged sword

eval is one of the most powerful parts of JavaScript: it enables runtime code generation. You can compile code to perform specific operations, avoiding unnecessary general-purpose overhead — a powerful concept. (But you’d be better off using new Function(argname1, argname2, ..., code), which doesn’t inhibit optimizations and still enables code generation, at loss of the ability to capture the local scope. Code using eval may see considerable speedups: for example, roc’s CPU emulator sped up ~14% switching from eval to Function. Less beefy code won’t see that magnitude of win, yet why give up performance when you have a ready alternative?)

Yet at the same time, eval is too powerful. As inline assembly is to C or C++ (at least without the information gcc‘s asm syntax requires), so is eval to JavaScript. In both instances a powerful construct inhibits many optimizations. Even if you don’t care about optimizations or performance, eval‘s ability to introduce and delete bindings makes code that uses it much harder to reason about.

eval arbitrarily mutates variables

At its simplest, eval can change the value of any variable:

function test(code)
{
  var v = 1;
  eval(code);
  return v;
}
assert(test("v = 2") === 2);

Thus you can’t reorder or constant-fold assignments past eval: eval forces everything to be “written to disk” so that the eval code can observe it, and likewise it forces everything to be read back “from disk” when needed next. Without costly analysis you can’t store v in a register across the call.

eval can insert bindings after compilation

eval‘s ability to add bindings is worse. This can make it impossible to say what a name refers to until runtime:

var v;
function test(code)
{
  eval(code);
  return v;
}

Does the v in the return statement mean the global variable? You can’t know without knowing the code eval will compile and run. If that code is "var v = 17;" it refers to a new variable. If that code is "/* psych! */" it refers to the global variable. eval in a function will deoptimize any name in that function which refers to a variable in an enclosing scope. (And don’t forget that the name test itself is in an enclosing scope: if the function returned test instead of v, you couldn’t say whether that test referred to the enclosing function or to a new variable without knowing code.)

eval can remove bindings added after compilation

You can also delete bindings introduced by eval (but not any other variables):

var v = 42;
function test(code)
{
  eval(code);
  function f(code2)
  {
    eval(code2);
    return function g(code3) { eval(code3); return v; };
  }
  return f;
}
var f = test("var v = 17;");
var g = f("var v = 8675309;");
assert(g("/* nada */") === 8675309);
assert(g("var v = 5;") === 5);
assert(g("delete v") === 17);
assert(g("delete v") === 42);
assert(g("delete v") === 42); // can't delete non-eval var (thankfully)

So not only can you not be sure what binding a name refers to given eval, you can’t even be sure what binding it refers to over time! (Throw generators into the game and you also have to account for a scope without a binding containing that binding even after a function has “returned”.)

eval can affect enclosing scopes

Worst, none of these complications (and I’ve listed only a few) are limited to purely local variables. eval can affect any variable it can see at runtime, whether in its immediate function or in any enclosing function or globally. eval is the fruit of the poisonous tree: it taints not just the scope containing it, but all scopes containing it.

Save us, ES5!

ES5 brings some relief from this madness: strict mode eval can no longer introduce or delete bindings. (Normal eval remains unchanged.) Deleting a binding is impossible in strict mode because delete name is a syntax error. And instead of introducing bindings in the calling scope, eval of strict mode code introduces bindings for that code only:

var x = 2, y = 3;
print(eval("var x = 9; x"));               // prints 9
print(x);                                  // prints 9
print(eval("'use strict'; var x = 5; x")); // prints 5
print(eval("'use strict'; var x = 7; x")); // prints 7
print(eval("'use strict'; y"));            // prints 3
print(x);                                  // prints 9

This works best if you have strict mode all the way down, so that eval can never affect the bindings of any scope (and so you don’t need "use strict" at the start of every eval code):

"use strict";
var x = 2, y = 3;
// NB: Strictness propagates into eval code evaluated by a
//     direct call to eval — a call occurring through an
//     expression of the form eval(...).
print(eval("var x = 5; x")); // prints 5
print(eval("var x = 7; x")); // prints 7
print(eval("y"));            // prints 3
print(x);                    // prints 2

Names in strict mode code can thus be associated without having to worry about eval in strict mode code altering bindings, preserving additional optimization opportunities.

Firefox now correctly implements strict mode eval code binding semantics (modulo bugs, of course).

So if I write strict mode code, should I use eval?

eval‘s worst aspects are gone in strict mode, but using it still isn’t a good idea. It can still change variables in ways the JavaScript compiler can’t detect, so strict mode eval still generally forces every variable to be saved before it occurs and to be reloaded when needed. This deoptimization is unavoidable if runtime code generation can affect dynamically-determined local variables. It’s still better to use Function than to use eval.

Also, as a temporary SpiderMonkey-specific concern, we don’t perform many of the binding optimizations strict mode eval enables. Binding semantics might (I haven’t tested, and it’s entirely possible the extra work is unnoticeable in practice) slow down strict eval compared to normal eval. Strict mode eval performance in SpiderMonkey won’t be much better than that of regular eval, and it might be slightly worse. We’ll fix this over time, but for now don’t expect strict mode eval to improve performance. (If you really need performance, don’t use eval.)

Conclusion

eval is powerful — arguably too powerful. ES5’s strict mode blunts eval‘s sharpest corners to simplify it and permit typical optimizations in code using it. But while strict mode eval is better than regular eval, Function is still the best way to generate code at runtime. If you must use eval, consider using strict mode eval for a simpler binding model and eventual performance benefits.

You can experiment with a version of Firefox with these changes by downloading a nightly build. (Don’t forget to use the profile manager if you want to keep the settings you use with your primary Firefox installation pristine.)

09.01.11

New ES5 requirement: getters and setters in object literals must not conflict with each other or with data properties, even outside strict mode

Conflicting properties in object literals

Object literals in ECMAScript can contain the same property multiple times:

var obj = { prop: 42, prop: 17 };

How does this behave? The object, when fully initialized, has that property with its last assigned value:

var obj = { prop: 17 }; // same effect

The expression 42 is still evaluated in source order, but that value isn’t found in the final object when construction and initialization completes.

Are conflicting properties desirable?

Duplicating property names is at best innocuous, but at worst it’s the source of bugs. Repeated assignment of the same side effect-free expression is aesthetically unpleasing but harmless. But what if that expression has side effects? Or what if the two expressions are ever made to differ? (This needn’t be purely human error. For example, a conflict might be the result of a bad merge of your changes with changes made by others.) What if a developer accidentally changes the first instance of a property but doesn’t notice the second? You can see how this might cause bugs.

Compatibility

ES5 generally avoids breaking compatibility with ES3. For the sake of existing code, duplicate property names are a syntax error only in strict mode code.

function good() { return { p: 1, p: 2 }; } // okay
function bad() { "use strict"; return { p: 1, p: 2 }; } // ERROR

What about getters and setters?

ES5 standardizes syntax for getters and setters in object literals. Using getters and setters you can write properties which lazily compute their values only when asked. You can also write properties which post-process values assigned to them: to validate them, to transform them at time of assignment, and so on. Getters and setters are new in ES5, so they don’t present compatibility concerns.

Conflicts with accessors are worse than conflicts with data properties. What if a setter and a data property conflict? Properties in an initializer don’t invoke setters, so a conflicting data property might blow away an accessor pair entirely! Also, since getters and setters quite often involve side effects, or reliance on object structure and internals, errant fixes of one of a pair of getters or setters are likely to cause worse problems than conflicting data properties.

Therefore ES5 prohibits conflicting property getters and setters, either with each other or with existing data properties. You can’t have both an accessor and a data property, and you can’t have multiple getters or multiple setters for the same property. This applies even outside strict mode!

/* syntax errors in any code */
({ p: 1, get p() { } });
({ get p() { }, p: 1 });
({ p: 1, set p(v) { } });
({ set p(v) { }, p: 1 });
({ get p() { }, get p() { } });
({ set p(v) { }, set p(v) { } });
({ get p() { }, set p(v) { }, get p() { } });
({ set p(v) { }, get p() { }, set p(v) { } });

/* syntax error only in strict mode code */
function fail() { "use strict"; ({ p: 1, p: 2 }); }

SpiderMonkey and Firefox no longer permit conflicts involving accessor properties in object literals

Firefox 4 nightlies now reject any property-name conflicts in object literals. The only exception is when the object literal is outside strict mode and all assignments are for data properties. Previously we implemented accessor conflict detection only in strict mode, but now Firefox 4 fully conforms to the ES5 specification when parsing object literals. (While I’m here let me give a brief hat-tip to the ECMAScript 5 Conformance Suite for revealing this mistake, the result of spec misreading by multiple SpiderMonkey hackers.)

If you ever have conflicting properties in an object literal, odds are they were a mistake. If you’ve done this only with data properties, no sweat now — but you’ll have to fix that if you ever opt your code into strict mode. If you’ve done this with accessor properties (previously a non-standard, implementation-specific feature), you’ll need to change your code to eliminate the conflict. Conflicts are reported as syntax errors (but note the bug that syntax errors aren’t reported for JavaScript XPCOM components), and they should be easy to fix.

Conclusion

Object literals containing the same property multiple times are bug-prone, as only one of the properties will actually be respected when the code executes. That mistake can’t be fixed for data properties in normal code, but ES5 can prohibit new conflicts involving accessor properties; Firefox now properly treats such conflicts as syntax errors. You can experiment with a version of Firefox with these changes by downloading a nightly build. (Don’t forget to use the profile manager if you want to keep the settings you use with your primary Firefox installation pristine.)

08.09.10

SpiderMonkey JSON change: trailing commas no longer accepted

Historically, SpiderMonkey’s JSON implementation has accepted input containing trailing commas:

JSON.parse('[1, 2, 3, ]');
JSON.parse('{ "1": 2, }");

We did so because the JSON RFC permitted implementations to accept extensions, and trailing commas are nice for humans to be able to use. The down side is that accepting extra syntax like this makes interoperability harder: implementations which don’t implement the same extension, for reasons every bit as valid as those of implementations allowing the extension, are disadvantaged. ES5 weighed these concerns and chose to precisely specify permissible JSON syntax, putting everyone on the same page: trailing commas are not permitted. Therefore, the examples above should throw a SyntaxError per ES5.

SpiderMonkey has now been changed to conform to ES5 on this point: trailing commas are syntax errors. If you still need to accept trailing commas, you should use a custom implementation that accepts them — but best would be for you to adjust the processes that produce JSON strings including trailing commas to not include them. (If you are an extension or are in privileged code, for the moment you can use nsIJSON.legacyDecode to continue to accept trailing commas. However, note that we have added it only to accommodate legacy code in the process of being updated to no longer generate faulty input, and it will be removed sometime in the future.)

You can experiment with a version of Firefox with this change by downloading a TraceMonkey branch nightly; this change should also make its way into mozilla-central nightlies shortly, if you’d rather stick to trunk builds. (Don’t forget to use the profile manager if you want to keep the settings you use with your primary Firefox installation pristine.)

« NewerOlder »