06.04.10

More changes coming to SpiderMonkey: the magical __count__ property is being removed

Tags: , , , , , — Jeff @ 17:17

Meet __count__

SpiderMonkey for some time has included a little-known, little-publicized, and little-used property named __count__ on all objects. This property stored the count of the number of enumerable properties directly on the object. For example,

assertEqual({ 1: 1 }.__count__, 1);
assertEqual([].__count__, 0);
assertEqual([1].__count__, 1);
assertEqual([1, /* hole */, 2, 3].__count__, 3);

It’s sort of a convenient way to check property counts, avoiding a verbose for (var p in o) loop. For example, you could use it to determine how many mappings you had in a hash.

Unfortunately, __count__ has a number of problems.

What’s wrong with __count__

First, and most notably for web developers, __count__ is non-standard. To the best of my knowledge, no other JavaScript engine supports it. Developers must write scripts under the assumption it doesn’t exist, so it provides no brevity bonus. (I recognize extensions and Mozilla-based applications are a special case. For the further reasons below, it’s still worth removing even if some code could assume its existence.)

Second, the special __count__ property contributes to the problem that you can’t actually use an object as a string-value hash. The reason is that you have to be careful that your string-valued keys don’t conflict with special properties, because special properties can’t be overwritten with a custom property. This breaks the nicest feature of using objects for hashes: you can’t just use normal [] property access to set and retrieve mappings. If someone inserts a mapping of, say, "__count__""special", the association with "special" won’t be preserved; obj.__count__ = 5 is actually a no-op. (NB: Even ignoring __count__ other special properties prevent this from working, but we’re slowly working toward getting rid of them. [Some much more slowly than others, I hasten to note! You don’t need to worry about __proto__ being removed any time in the near future, although you should use Object.getPrototypeOf(obj) with a compatibility shim to determine obj‘s prototype in new code.]) A further wrinkle is that __count__ is implemented in such a way that an object literal with "__count__" as a named property functions differently from an object to which the property is later added by assignment:

var o = { __count__: 17 };
assertEqual(o.__count__, 17);

// ...but...
var o2 = {};
o2.__count__ = 17;
assertEqual(o2.__count__, 17); // fails: __count__ is 0

Third, in supporting __count__ we’ve incurred special-case deoptimization code in SpiderMonkey’s script-to-bytecode compiler. This extra complication, for a feature not often used, does nothing for code readability, complexity, or quality.

Fourth, __count__ doesn’t work the way you might think it works: it’s not uniformly fast for all objects. Property access generally has a syntactic assumption of constant-time speediness. This assumption isn’t valid in languages with getters and setters, but since it’s usually the case, it’s not a horribly inaccurate one. Thus, one might expect that evaluating obj.__count__ is an uncomplicated O(1) operation which doesn’t allocate memory and just looks up the size of an idealized hash table. It might be possible to make that true, but in fact it never has been true: generally, computing __count__ is O(n) in the number of properties on the object. Further, because __count__ reuses the same enumeration mechanism as for loops, it usually requires a memory allocation, which can be slow. __count__ has no asymptotic advantage over manual enumeration of the object’s properties.

In sum, __count__ has problems that mean it doesn’t give you much more than a for (var p in o) loop would. If that loop were placed in a function, it would be almost identical in code size to use of the property — and it would have the advantage of being completely cross-browser.

__count__ is being removed

We have removed support for __count__ from SpiderMonkey. As a consequence __count__ will also be removed from the next version of Firefox based on trunk Mozilla code. (And, of course, future versions of other Mozilla-based products like SeaMonkey will pick the change up when they produce releases based on trunk Mozilla code.) For the above reasons __count__ doesn’t make much sense to keep around, and it imposes real development costs. You should have no difficulty updating your code to implement alternative functionality to __count__. Here’s one example of how you might do this:

function count(o)
{
  var n = 0;
  for (var p in o)
    n += Object.prototype.hasOwnProperty.call(o, p);
  return n;
}

assertEqual(count({ 1: 1 }), 1);
assertEqual(count([]), 0);
assertEqual(count([1]), 1);
assertEqual(count([1, /* hole */, 2, 3]), 3);

If you use __count__ and need to test changes to remove that use, you can experiment with a version of Firefox with support for __count__ removed by downloading a nightly from nightly.mozilla.org. (Don’t forget to use the profile manager if you want to keep the settings you use with your primary Firefox installation pristine.)

28.02.10

ECMA-262 quote of the day

If comparefn is not undefined and is not a consistent comparison function for the elements of this array (see below), the behaviour of sort is implementation-defined.

[…]

A function comparefn is a consistent comparison function for a set of values S if all of the requirements below are met for all values a, b, and c (possibly the same value) in the set S: The notation a <CF b means comparefn(a,b) < 0; a =CF b means comparefn(a,b) = 0 (of either sign); and a >CF b means comparefn(a,b) > 0.

  • Calling comparefn(a,b) always returns the same value v when given a specific pair of values a and b as its two arguments. Furthermore, Type(v) is Number, and v is not NaN. Note that this implies that exactly one of a <CF b, a =CF b, and a >CF b will be true for a given pair of a and b.
  • Calling comparefn(a,b) does not modify the this object.
  • a =CF a (reflexivity)
  • If a =CF b, then b =CF a (symmetry)
  • If a =CF b and b =CF c, then a =CF c (transitivity of =CF)
  • If a <CF b and b <CF c, then a <CF c (transitivity of <CF)
  • If a >CF b and b >CF c, then a >CF c (transitivity of >CF)
ECMA-262 3rd edition or ECMA-262 5th edition, 15.4.4.11 Array.prototype.sort (comparefn)

26.02.10

Working on the JS engine, Episode III: offered without comment

#define WORKAROUND(cx_, saver_)                                               \
    Workaround WORKAROUND_PASTE(w_, __LINE__)((cx_), (saver_))
#define WORKAROUND_PASTE(a_, b_) WORKAROUND_PASTE2(a_, b_)
#define WORKAROUND_PASTE2(a_, b_) a_ ## b_ /* having fun yet? */
#define SAVER(cx_, saver_)                                                    \
    AutoValueRooter saver_(cx_);                                              \
    WORKAROUND((cx_), (saver_));

08.02.10

Brief talk on ES5 and Mozilla support for it

Tags: , , , , , , — Jeff @ 12:32

I gave a three-minute not-actually-lightning-talk-but-let’s-call-it-that-anyway on ECMA-262 5th edition, what’s in it, and the state of Mozilla’s support for it at the Mozilla weekly meeting this week. It’s probably old hat if you’ve been following the standard closely, but if you haven’t it gives a short and sweet overview of what’s new; there’s a three-minute video of the actual talk on the meeting page (start at around 7:00 into the complete video). If you’re strapped for time, view the slides and turn off stylesheets (View > Page Style > No Style in Firefox) to see notes on what roughly accompanied each slide.

15.01.10

More ES5 backwards-incompatible changes: regular expressions now evaluate to a new object, not the same object, each time they’re encountered

(preemptive clarification: coming in Firefox 3.7 and not Firefox 3.6, which is to say, a good half year away from now rather than Real Soon Now)

Disjunction: is /foo/ the same object, or a new object, each time it’s evaluated in ES3?

According to ECMA-262 3rd edition, what should this code print?

function getRegEx() { return /regex/; }
print("getRegEx() === getRegEx(): " + (getRegEx() === getRegEx()));

The answer depends upon this question: when a JavaScript regular expression literal is evaluated, does it create a new RegExp object each time, or does it evaluate to the exact same RegExp object each time it’s evaluated? Let’s look at a few examples and make a guess.

I sense a pattern

var tests =
  [
   function getNull() { return null; },
   function getNumber() { return 1; },
   function getString() { return "a"; },
   function getBoolean() { return false; },
   function getObject() { return {}; },
   function getArray() { return []; },
   function getFunction() { return function f(){}; },
  ];

for (var i = 0, sz = tests.length; i < sz; i++)
{
  var t = tests[i];
  print(t.name + "() === " + t.name + "(): " + (t() === t()));
}

If you test that code, you’ll see that the first four results are true, and the rest are false, all per ECMA-262 3rd edition. (Okay, technically, and bizarrely, ES3 permitted either result for the function case, but no browser ever implemented a result of true; ES5 acknowledges reality and mandates that the result be false.) The first four functions return primitive values; the last three return objects. There’s only a single instance of any primitive value — or, alternately, you might say, equality doesn’t distinguish between different instances of the same primitive. Therefore it doesn’t really matter whether primitive literals evaluate to new instances or the same instance. On the other hand, objects compare equal only if they’re the same object. Since the object cases didn’t compare identically, they must be new objects each time. This makes sense: if this were not the case, what would happen in the following example?

function makePoint(x, y)
{
  var pt = {};
  pt.x = x;
  pt.y = y;
  return pt;
}

var pt1 = makePoint(1, 2);
var pt2 = makePoint(3, 4);

It would be complete nonsense if the object literal above evaluated to the same object every time it were encountered; the next two lines would blow away the previous point, and we would have pt1.x ===3 && pt1.y === 4.

Plausible assertion: regular expression literals evaluate to new objects when encountered?

Returning to the original question, then, what does ES3 say this code should print?

function getRegEx() { return /regex/; }
print("getRegEx() === getRegEx(): " + (getRegEx() === getRegEx()));

A regular expression is an object. If you don’t want to get weird property-poisoning of the sort just suggested, regular expression literals must evaluate to different objects each time they’re encountered, right?

Alternative: ES3 says /foo/ is the same object every time

Wrong. According to ES3, there’s only a single object for each regular expression literal that’s returned each time the literal is encountered:

A regular expression literal is an input element that is converted to a RegExp object (section 15.10) when it is scanned. The object is created before evaluation of the containing program or function begins. Evaluation of the literal produces a reference to that object; it does not create a new object.

ECMA-262, 3rd ed. 7.8.5 Regular Expression Literals

This was originally a dubious optimization in the standard to avoid the “costly” creation of a regular expression object every time a literal would be encountered. It’s perhaps a little surprising that the same object is returned each time, but does it make a difference in real programs not written to demonstrate the quirk? Often it doesn’t matter. As a simple example, if (/^\d+$/.test(str)) { /* ... */ } executes identically either way, assuming RegExp.prototype.test is unmodified. The RegExp never escapes, and its use doesn’t depend on mutable state, so creating new objects each time doesn’t make a difference (other than negligibly, in speed).

Sometimes, however, the shared-object misoptimization does matter meaningfully: when a RegExp with mutable state is used in ways that depend on that state. Most regular expressions don’t store any state, so if the same RegExp object is used twice it’s no big deal. However, it can matter a lot for regular expressions specified with the global flag:

var s = "abcddeeefffffgggggggghhhhhhhhhhhhh";
function next(s)
{
  var r = /(.)\1*/g;
  r.exec(s);
  return r.lastIndex;
}

var r = [];
for (var i =0; i < 8; i++)
  r.push(next(s));
print(r.join(", "));

Each time a regular expression with the global flag is used, its lastIndex property is updated with the index of the location in the matched string where matching should resume when the regular expression is next used. Thus, in this example we have mutable state, and if next is called multiple times we have uses which will depend on that mutable state. Let’s see what happens in engines which implemented regular expression literals per ES3. If you download the Firefox 3.6 release candidate and test the above code in it (adjusting the implied print to alert), the printed result will be this:

1, 2, 3, 5, 8, 13, 21, 34

ES5: an escape to sanity

Is ES3’s behavior what you’d expect? No, it isn’t. In fact, ES3’s behavior, which Mozilla and SpiderMonkey implement, is the second-most duplicated bug filed against Mozilla’s JavaScript engine. SpiderMonkey and (strangely enough) v8 are the only notable JavaScript engines out there that implement ES3’s behavior. ES3’s behavior is rarely what web developers expect, and it doesn’t provide any real value, so ES5 is changing to the behavior you’d expect: evaluating a regular expression literal creates a new object every time.

Starting with Firefox 3.7, Firefox will implement what ES5 specifies. Download a Firefox nightly from nightly.mozilla.org and test it out as above (use the profile manager if you want to keep your current Firefox settings and install untouched). Instead of the Fibonacci sequence you’ll get this:

1, 1, 1, 1, 1, 1, 1, 1

The bottom line

Starting with Firefox 3.7, evaluating a regular expression literal like /foo/ will create a new RegExp object, just as evaluating {} or [] currently creates a new object or array. The optimization ES3 specified has resulted in clear developer confusion and was misguided and inconsistent with respect to other object literal syntax in JavaScript.

Again, as with my previous post, we doubt this change will affect many scripts (in this case, except for the better). The fact that few browsers implemented ES3’s semantics means that most sites have to cope with either choice of semantics, so the semantics in ES5, implemented by Mozilla for Firefox 3.7, are likely already handled. Still, it’s possible that this change might break some sites (particularly those which include browser-specific code), so we’re giving a heads-up as early as possible.

« NewerOlder »