08.09.10

New ES5 strict mode support: now with poison pills!

tl;dr

Don’t try to use the arguments or caller properties of functions created in strict mode code. Don’t try to use the callee or caller properties of arguments objects corresponding to invocations of functions in strict mode code. Don’t try to use the caller property of a function if it might be called from strict mode code. You are in for an unpleasant surprise (a thrown TypeError) if you do.

function strict() { "use strict"; return arguments; }
function outer() { "use strict"; return inner(); }
function inner() { return inner.caller; }

strict.caller;    // !!! BAD IDEA
strict.arguments; // !!! BAD IDEA
strict().caller;  // !!! BAD IDEA
strict().callee;  // !!! BAD IDEA
outer();          // !!! BAD IDEA

Really, it’s best not to access the caller of a function, the current function (except by naming it), or the arguments for a given function (except via arguments or by use of the named parameter) at all.

ES5 strict mode: self-limitation, not wish fulfillment

ES5 introduces the curious concept of strict mode. Strict mode, whose name and concept derive from the similar feature in Perl, is a new feature in ES5 whose purpose is to deliberately reduce the things you can do in JavaScript. Instead of a feature, it’s really more the absence of several features, within the scope of the strict-annotated code: with, particularly intrusive forms of eval, silent failure of writes to non-writable properties, silent failure of deletions of non-configurable properties, implicit creation of global-object properties, and so on. The goal of these removals is to simplify both the reasoning necessary to understand such code and the implementation of it in JavaScript engines: to sand away some rough edges in the language.

Magical stack-introspective properties of functions and arguments

Consider this code, and note the expected behavior (expected per the web, but not as part of ES3):

function getSelf() { return arguments.callee; }
assertEq(getSelf(), getSelf); // arguments.callee is the enclosing function

function outer()
{
  function inner() { return arguments.callee.caller; } // inner.caller === outer
  return inner();
}
assertEq(outer(), outer); // fun.caller === nearest function in stack that called fun, or null

function args2()
{
  return args2.arguments;
}
assertEq(args2(17)[0], 17); // fun.arguments === arguments object for nearest call to fun in stack, or null

Real-world JavaScript implementations take many shortcuts for top performance. These shortcuts are not (supposed to be) observable, except by timing the relevant functionality. Two common optimizations are function inlining and avoiding creating an arguments object. The above “features” play havoc with both of these optimizations (as well as others, one of which will be the subject of a forthcoming post).

Inlining a function should conceptually be equivalent to splatting the function’s contents in that location in the calling function and doing some α-renaming to ensure no names in the splatted body conflict with the surrounding code. The ability to access the calling function defeats this: there’s no function being invoked any more, so what does it even mean to ask for the function’s caller? (Don’t simply say you’d hard-code the surrounding function: how do you know which property lookups in the inlined code will occur upon precisely the function being called, looking for precisely the caller property?) It is also possible to access a function’s arguments through fun.arguments. While the “proper” behavior here is more obvious, implementing it would be a large hassle: either the arguments would have to be created when the function was inlined (in the general case where you can’t be sure the function will never be used this way), or you’d have to inline code in such a way as to be able to “work backward” to the argument values.

Speaking of arguments, in offering access to the corresponding function via arguments.callee it has the same problems as fun.caller. It also presents one further problem: in (some, mostly old) engines, arguments.caller provides access to the variables declared within that function when it was most recently called. (If you’re thinking security/integrity/optimization hazard, well, you now know why engines no longer support it.)

In sum these features are bad for optimization. Further, since they’re a form of dynamic scoping, they’re basically bad style in many other languages already.

Per ES5, SpiderMonkey no longer supports this stack-inspecting magic when it interacts with strict mode

As of the most recent Firefox nightly, SpiderMonkey now rejects code like that given above when it occurs in strict mode (more or less). (The properties in question are now generally implemented through a so-called “poison pill” mechanism, an immutable accessor property which throws a TypeError when retrieved or set.) The specific scenarios which we reject are as follows.

First, attempts to access the caller or arguments (except by directly naming the object) of a strict mode function throw a TypeError, because these properties are poison pills:

function strict()
{
  "use strict";
  strict.caller;    // !!! TypeError
  strict.arguments; // !!! TypeError
  return arguments; // direct name: perfectly cromulent
}
strict();

Second, attempts to access the enclosing function or caller variables via the arguments of a strict mode function throw a TypeError. These properties too are poison pills:

function strict()
{
  "use strict";
  arguments.callee; // !!! TypeError
  arguments.caller; // !!! TypeError
}
strict();

Third (and most trickily, because non-strict code is affected), attempts to access a function’s caller when that caller is in strict mode will throw a TypeError. This isn’t a poison pill, because if the "use strict" directive weren’t there it would still “work”:

function outer()
{
  "use strict";
  return inner();
}
function inner()
{
  return inner.caller; // !!! TypeError
}
outer();

But if there’s no strict mode in sight, nothing will throw exceptions, and what worked before will still work:

function fun()
{
  assertEq(fun.caller, null); // global scope
  assertEq(fun.arguments, arguments);
  assertEq(arguments.callee, fun);
  arguments.caller; // won't throw, won't do anything special
  return arguments;
}
fun();

Conclusion

With these changes, which are required by ES5, stack inspection is slowly going the way of the dodo. Don’t use it or rely on it! Even if you never use strict mode, beware the third change, for it still might affect you if you provide methods for other code to call. (But don’t expect to be able to avoid strict mode forever: I expect all JavaScript libraries will adopt strict mode in short order, given its benefits.)

(For those curious about new Error().stack, we’re still considering what to do about it. Regrettably, we may need to kill it for information-privacy reasons too, or at least censor it to something less useful. Nothing’s certain yet; stay tuned for further announcements should we make changes.)

You can experiment with a version of Firefox with these changes by downloading a TraceMonkey branch nightly; these changes should also make their way into mozilla-central nightlies shortly, if you’d rather stick to trunk builds. (Don’t forget to use the profile manager if you want to keep the settings you use with your primary Firefox installation pristine.)

07.09.10

Now in SpiderMonkey and Firefox: ES5‘s Function.prototype.bind

This is just a brief note to point out that, as of the August 29th Firefox nightly (and I think as of the latest beta, but don’t quote me), SpiderMonkey (and Firefox) now implements ES5‘s new Function.prototype.bind method — native support for creating functions bound to a pre-specified this value:

var property = 42;
var obj =
  {
    property: 17,
    method: function() { return this.property; }
  };

var bound = obj.method.bind(obj);
assertEq(bound(), 17);

…or with pre-specified leading arguments:

function multiply()
{
  var product = 1;
  for (var i = 0, sz = arguments.length; i < sz; i++)
    product *= arguments[i];
  return product;
}

var productTimesFive = multiply.bind(null /* this */, 5);
assertEq(productTimesFive(4, 3, 2, 1), 120);

…and, in a feature present only in the ES5 bind implementation (and not in any of the numerous precursors), they even work with new:

function Vector()
{
  var args = arguments;
  this.length = arguments.length;
  this.get = function(i) { return args[i]; };
  this.set = function(i, v) { args[i] = v; };
}

var PartialVector = Vector.bind(null /* this, ignored with new */, 3, 7);

var threeSevenTen = new PartialVector(10); // new Vector(3, 7, 10)

For more information, see the article on Function.prototype.bind on MDC. For the utmost information, see the ES5 specification for the method.

As always, you can experiment with a version of Firefox with Function.prototype.bind by downloading a nightly from nightly.mozilla.org. (Don’t forget to use the profile manager if you want to keep the settings you use with your primary Firefox installation pristine.)

22.08.10

Incompatible ES5 change: literal getter and setter functions must now have exactly zero or one arguments

ECMAScript accessor syntax in SpiderMonkey

For quite some time SpiderMonkey and Mozilla-based browsers have supported user-defined getter and setter functions (collectively, accessors), both programmatically and syntactically. The syntaxes for accessors were once legion, but SpiderMonkey has pared them back almost to the syntax recently codified in ES5 (and added new syntax where required by ES5).

// All valid in ES5
var a = { get x() { } };
var b = { get "y"() { } };
var c = { get 2() { } };

var e = { set x(v) { } };
var f = { set "y"(v) { } };
var g = { set 2(v) { } };

SpiderMonkey has historically parsed literal accessors using a slightly-tweaked version of its function parsing code. Therefore, as previously explained SpiderMonkey would accept essentially anything which could follow function in a function expression as valid accessor syntax in object literals.

ES5 requires accessors have exact numbers of arguments

A consequence of parsing accessors using generalized function parsing is that SpiderMonkey accepted some nonsensicalities, such as no-argument setters or multiple-argument getters or setters:

var o1 = { get p(a, b, c, d, e, f, g) { /* why have any arguments? */ } };
var o2 = { set p() { /* to what value? */ } };
var o3 = { set p(a, b, c) { /* why more than one? */ } };

ES5 accessor syntax sensibly deems such constructs errors: a conforming ES5 implementation would reject all of the above statements.

SpiderMonkey is changing to follow ES5: getters require no arguments, setters require one argument

SpiderMonkey has now been changed to follow ES5. There seemed little to no gain in continuing to support bizarre numbers of arguments when the spec counseled otherwise, and any code which does end up broken is easily fixed.

As always, you can experiment with a version of Firefox with these changes to accessor syntax by downloading a nightly from nightly.mozilla.org. (Don’t forget to use the profile manager if you want to keep the settings you use with your primary Firefox installation pristine.)

07.05.10

SpiderMonkey change du jour: the special __parent__ property has been removed

tl;dr

The special __parent__ property has been removed from SpiderMonkey and is no longer in nightly builds. If you don’t use __parent__ or don’t know what the property does, you didn’t miss much.

If you use __parent__, you have a couple replacements. If you were using it to determine the global object for another object, use Components.utils.getGlobalForObject instead. If you were using it only to test its value against an expected value, use nsIDOMWindowUtils.getParent instead (but do note that its semantics are not absolutely identical to those of __parent__). If you were using it from unprivileged web scripts as a potential vector for security exploits, I feel your pain and will take no steps to assuage it. If you were using it some other way, comment and we’ll figure something out for your use case.

If you think you understood the __parent__ property (you probably don’t), or if you’re interested in the nitty-gritty details of JavaScript semantics, read on for the details of this esoteric property and the reasons for its removal.

Scoping in JavaScript

In the following example code, when the function g is invoked, what stores the v variable?

function f(a)
{
  var v = a;
  function g() { return v; }
  return g;
}

var fun = f(2);
fun();

The variable is accessed in an enclosing scope, created when the enclosing function was invoked. In the ECMAScript standard, the location of such variables is an object, stored as an internal [[Scope]] property of the function being called. In ES3 this object was a standard JavaScript object; ES5 tightens semantics slightly and uses a simpler structure, but the idea’s the same.

Meet the __parent__

It’s not possible to access the object stored as [[Scope]] in ECMAScript proper, but it has been possible to access it in SpiderMonkey and Mozilla-based browsers. The magical __parent__ property can often provide access to this value:

this.toString = function() { return "global"; }
var q = 17;
function foo() { return q; }
print(foo.__parent__); // prints "global"

“Often”? When does __parent__ not expose this value?

__parent__ doesn’t always reflect [[Scope]] because ECMAScript requires that certain objects not be made available to scripts. Among such objects are what SpiderMonkey refers to as With objects, Call objects, and Block objects.

With objects

SpiderMonkey creates With objects to handle the esoteric, non-lexical name lookup required by the semantics of the with statement in JavaScript. These semantics require that a name, depending on the runtime object used in the with, refer to a property of that object or to a variable in an enclosing scope. It’s impossible to know in general which will be the case before runtime, and therefore it’s impossible to speed up a lot of script that runs inside a with. (Incidentally, this is why you should never ever ever [ever] use with in performance-critical code. Use a two-letter abbreviation variable to save typing, if verbosity is your concern.) We can’t simply use the with block’s object as the scope, because if we miss on a lookup there we want to fall back to the normal scope; therefore we introduce a With object. Here’s an example of a situation where a With object is created:

with ({ toString: function() { return "with object"; } })
{
  print(function() { return toString; }.__parent__); // prints "with object"
}

__parent__ doesn’t actually give you the With object, because it’s a carefully-tuned internal value. With objects have behavior and functionality optimized for their particular purpose, and if we simply exposed them to scripts it would probably be possible to do Very Bad Things. Consequently, if you try to access __parent__ in a situation where you “should” get a With object, we instead give you back the object where the With object begins its search: the object used for the with block. This may be what a developer superficially familiar with __parent__ might expect, but it is nevertheless a lie.

Call objects

SpiderMonkey’s Call objects represent the local variables of a function call. Therefore, Call objects correspond to the [[Scope]] of functions. Returning to the original example, reproduced below, v is stored in a Call object, which is the [[Scope]] of the function g:

function f(a)
{
  var v = a;
  function g() { return v; }
  return g;
}

var fun = f(2);
print(fun.__parent__);

When you attempt to access g‘s __parent__ property, SpiderMonkey censors the [[Scope]] and returns null, because ECMAScript requires that Call objects not be exposed. This case, which is far more common than the with case, completely prevents access to [[Scope]] at all (rather than exposing a half-representation of the value). This situation is even more pervasive in light of the modern JavaScript encapsulation practice of enclosing libraries in closures for pseudo-namespacing purposes. In many libraries __parent__ on many functions of interest gives no value whatsoever.

Block objects

One of the first things every JavaScript developer learns is that JavaScript variables are not scoped to blocks. Language tutorials usually emphasize this point, because it differs from most other languages (and in particular, it differs from the C-ish family of languages, the syntax of which JavaScript borrows to a fair degree). The conventional wisdom is that names in functions are always scoped to enclosing functions or to the global scope. If you want a locally-defined name, you have to wrap it up in a function.

The conventional wisdom is wrong.

Here’s proof: what does this example print?

var v = "global";
function test(frob)
{
  try
  {
    if (frob === 0)
      return v + " try";
    if (frob === 1)
      throw "local";
  }
  catch (v)
  {
    return v + " catch";
  }
  finally
  {
    if (frob === 2)
      return v + " finally";
  }
  return v + " after finally";
}

print("not throwing, in try:          " + test(0));
print("throwing, in catch:            " + test(1));
print("not throwing, in finally:      " + test(2));
print("not throwing, after finally:   " + test(3));

Behavior depends on whether the binding for v as introduced by the catch is scoped to the function or to something else. JavaScript specifies that v, referring to the value potentially thrown while executing the try block, is scoped only to the catch block. Thus v in the catch refers to the thrown value; all other uses of v are outside the catch block and refer to the global variable v. Therefore the output is this:

not throwing, in try:          global try
throwing, in catch:            local catch
not throwing, in finally:      global finally
not throwing, after finally:   global after finally

Any function which referred to v inside the test function would have a Call object as its [[Scope]] — except for a function defined inside the catch block. Such a function at that location would have to capture the thrown value, so there must be something else on the scope chain before the Call object. SpiderMonkey refers to these objects as Block objects, because they implement traditional block-level scoping.

Block objects have the same issues as Call objects, so SpiderMonkey censors them to null as well. For optimization purposes we’d like to “boil them away” whenever possible, so as not to create an obnoxious little one-property object when we catch an exception and define a function that captures it. Exposing such an object directly prevents these optimizations, and would likely expose some security vulnerabilities.

But __parent__‘s issues don’t stop merely at those induced by complex language semantics. Even perfectly simple code — simpler even than nested functions — has potential for unpredictability.

If certain optimizations are implemented, __parent__ may be a lie

Consider this script in the global scope:

function fun() { return 17; }

print(fun.__parent__);

Does fun‘s [[Scope]] need to be the global object?

It’s OK to cheat if you never get caught.

Smalltalk implementer maxim

ECMAScript provides no way for script to access [[Scope]]. It’s a construct used to ease specification, and it need not even exist in implementations. fun never uses any values from its enclosing scope, so there’s no reason that [[Scope]] must be global object. An implementation that carried around [[Scope]] with each function might simply make [[Scope]] be null, which could potentially speed up some garbage collection algorithms. Similarly, a function which refers only to never-modified variables in enclosing scopes might simply copy those variables’ values and, again, make [[Scope]] lie. Should a specification-artifact internal property constrain implementations looking to improve performance, or code size, or any number of other measures of desirability? Given where JavaScript use is heading now, the answer must be no.

For certain types of functions, there are very good reasons, even in simple cases, to make [[Scope]] a lie. SpiderMonkey implements some optimizations along these lines, but those optimizations don’t affect the value we store for [[Scope]] at the present. If it were advantageous to change that, we would do so in a heartbeat. Therefore, while __parent__ may have a reliable value now, it’s possible it would not in the future.

__parent__ is available even where you think it isn’t

Oddly enough, __parent__, despite its representing [[Scope]], isn’t just applicable to function objects. Instead, its value is generalized to all objects, in a way even more difficult to describe than that above. (Very roughly, __parent__ on non-function objects corresponds to the object which would be the __parent__ of a function created in the same context. I think. Mostly the value is used for security checks; improper setting of __parent__ is a common cause of implementation-caused XSS vulnerabilities.) This generalization and its semantics have even less support in the specification, so its meaning is even less clear than for functions.

__parent__ doesn’t expose anything useful

__parent__, beyond having unintuitive behavior, doesn’t tell the developer anything he’d care to know. Why would you need to access the enclosing scope of a function, as an object, in the first place? The very definition of the problem is obscure; it is almost a prerequisite to have read the ECMA-262 specification to be able to ask the question.

In the searching of various codebases that we’ve done, we’ve found only these use cases for __parent__:

Testing
A small number of Mozilla tests use __parent__ to verify proper scoping of objects. Since the value as we use it has security implications, this use case is reasonable — but it has extremely limited applicability. This use case can be supported by creating an alternate means of accessing an object’s parent, a means exposed through XPCOM, not through JavaScript itself, and certainly not by polluting every object with the functionality. We have implemented nsIDOMWindowUtils.getParent(obj) to support this testing-oriented use case. Beware: this method doesn’t censor like __parent__ does! Tread cautiously when using it where a With, Call, or Block object might be exposed, and don’t use the returned object except in immediate strict equality (===) checks. (NB: we’ve implemented it this way out of expediency; if it turns out to be too great a responsibility for typical testers, we’ll likely change it to censor.)
Determining the global context (global object) where an object was created
Since __parent__ often (for any function not nested in another function) is simply the global object, some people have used __parent__ to retrieve the global object associated with an arbitrary object. By walking from __parent__ to __parent__ you can often reach the global object this way. However, since the __parent__ value is sometimes a lie, it’s not as simple as var global = obj; while (global.__parent__) global = global.__parent__;. Instead, you have to be careful to start from an object with a known __parent__, one guaranteed not to be a nested function or similar. The easiest way to do this is to first walk up the prototype chain to eventually bottom out at Object.prototype, then to walk the parent chain. This method is baroque and non-obvious, and it requires careful tiptoeing around incorrect __parent__ values. If you know the object whose global you want to retrieve is a DOM node, the fully-standard node.ownerDocument.defaultView property also provides global access. Otherwise, the use case would be better addressed by exposing a method somewhere to directly retrieve the corresponding global object. We have recently added such a method to support this use case: Components.utils.getGlobalForObject(obj).
Being evil to Firefox
A fair number of security bugs reported against SpiderMonkey in recent years have worked by successfully confusing the engine into assigning a bad parent to an object. This incorrect security context can then be used as a vector to run code in the context of another website (resulting in an XSS hole, possibly usable against any website), or in some cases, in the context of the browser “chrome” itself (providing the capability for arbitrary code execution — the worst-case scenario as far as exploits are concerned). __parent__ is immutable and can’t itself be used to confuse the engine (at least in the absence of bugs, and we have no way to demonstrate such). However, it provides greater visibility into the engine, and it can sometimes expose values to script which wouldn’t (and shouldn’t) be accessible any other way.
Being evil to sandboxing mechanisms
__parent__ exposes security holes in some websites as well as in the browser directly. Such websites are those which purport to “safely” execute scripts provided by other users, through sandboxing or other mechanisms. For example, Facebook uses FBJS, a script-rewriting plus runtime-check mechanism, to allow applications to include interactivity. Google‘s Caja system provides a rigorous capabilities-providing system (also through script-rewriting plus runtime checks) to do the same, and it too is used in public sites these days. One requirement of such systems is that the global object must never be directly exposed: if you have the global object, you have unfettered access to the document, you have eval, you have Function, and you are utterly and thoroughly hosed. __parent__ provides easy access to this, so all these systems must censor access to __parent__, both statically (obj.__parent__) and dynamically (var p = "__parent__"; obj[p]). Not everyone knows, or remembers, that this is necessary. Removing __parent__ reduces attack surface in JavaScript sandboxes.

The former two use cases are better addressed in other ways. The latter two are holes best removed entirely.

__parent__ is being removed

__parent__ is being removed from SpiderMonkey. The value it purports to expose is esoteric and has semantics defined by a specification which JavaScript developers shouldn’t really have to read to understand it. The identification of that value is non-intuitive. Sometimes the value it exposes is a lie. There are plausible reasons why it might be made to lie if potential performance-improving ideas were implemented. It’s exposed even in places where it makes little sense. The use cases for it can be and are better served in other ways — or explicitly not served, when it serves as a vector for security vulnerabilities. In sum __parent__ doesn’t pass muster, so it’s being removed.

You can experiment with a version of Firefox without support for __parent__ by downloading a nightly from nightly.mozilla.org. (Don’t forget to use the profile manager if you want to keep the settings you use with your primary Firefox installation pristine.) The next release is many months away, which should provide plenty of time for extension developers to update their extensions to not use __parent__, if by chance they had managed to discover it and use it.

05.05.10

New Mozilla developer feature: Components.utils.getGlobalForObject(obj)

Suppose you’re an extension developer implementing some sort of event listener-like interface corresponding to browser windows. You’d like listeners to stick around as long as the original browser window is open, so when the browser window’s unload event fires when the browser window is closed, you want to remove the listener. Further, for simplicity, you’d like to be able to reuse the same interface as the DOM uses: EventTarget.addEventListener(eventName, listener, bubbles). But if you do that, how do you know what browser window to associate with a listener? One possibility is that you could associate event listeners with windows if you could determine the global object that corresponded to the event listener in question. (Assume arguendo that you control all use of listeners, so every listener is straightforwardly created in the window with which it’s associated — no shenanigans passing listeners across windows.) This isn’t the only situation where one might wish to know the global object, but it does happen to be a somewhat common one.

Is it possible to determine the global object corresponding to an arbitrary object? You can through a convoluted sequence of actions involving prototype hopping using Object.getPrototypeOf and walking the “scope chain” for the object (using an obscure Mozilla-specific feature whose details I omit). More simply, if the object in question is a DOM node, you could use node.ownerDocument.defaultView, which, while quite understandable, is still DOM-specific. But wouldn’t it be much better if there were some simpler, universal way to determine this value, rather than skirting the edges of feature intentionality?

Firefox nightlies now implement support for a new method available to extensions and other privileged code: Components.utils.getGlobalForObject(obj). It’s designed to support the specific need of determining the window where an object was created. (XPCOM components of course have a global object that isn’t a window, and that global object will be returned by the method in those circumstances.) Its functionality needs little explanation:

/* in the global scope */
var global = this;
var obj = {};
function foo() { }
assertEq(Components.utils.getGlobalForObject(foo), global);
assertEq(Components.utils.getGlobalForObject(obj), global);

If you’re using the previously-noted method involving prototype- and scope-chain-hopping, you should change your code to use Components.utils.getGlobalForObject(obj) instead. This new method is much simpler and clearer, and upcoming changes mean that the old method will no longer work in future nightlies and releases. The next release is still several months out, so you should have plenty of time to adjust.

As always, you can experiment with a bleeding-edge version of Firefox that supports Components.utils.getGlobalForObject(obj) by downloading a nightly from nightly.mozilla.org. (Don’t forget to use the profile manager if you want to keep the settings you use with your primary Firefox installation pristine.)

A last note: the method as currently implemented in nightlies suffers from a small bug when used with objects from unprivileged code — objects from scripts in web pages, that sort of thing. It’s fixed in the TraceMonkey repository, and the adjustment should make its way into the mozilla-central repository (and thus into nightlies) in short order. If you only use the method on objects from privileged scripts, I don’t believe you’ll encounter any problems.

« NewerOlder »