26.02.11

The proper way to call parseInt (tl;dr: parseInt(str, radix))

Introduction

Allen Wirfs-Brock recently discussed the impedance mismatch when functions accepting optional arguments are incompatibly combined, considering in particular combining parseInt and Array.prototype.map. In doing so he makes this comment:

The most common usage of parseInt passes only a single argument

Code-quality systems like JSLint routinely warn about parseInt without an explicit radix. Most uses might well pass only a single argument, but I could easily imagine this going the other way.

This raises an interesting question: why do lints warn about using parseInt without a radix?

parseInt and radixes

Like much of JavaScript, parseInt tries to be helpful when called without an explicitly specified radix. It attempts to guess a suitable radix:

assertEq(parseInt("+17"), 17);
assertEq(parseInt("42"), 42);
assertEq(parseInt("-0x42"), -66);
// assertEq(parseInt("0755"), ???); // we'll get back to this

If the string (after optional leading whitespace and + or -) starts with a non-zero decimal digit, it’s parsed as decimal. But if the string begins with 0, things get wacky. If the next character is x or X, the number is parsed in base-16: hexadecimal. Last, if the next character isn’t x or X…hold that thought. I’ll return to it in a moment.

Thus the behavior of parseInt without a radix depends not just on the numeric contents of the string but also upon its internal structure, entirely separate from its contents. This alone is reason enough to always specify a radix: specify a radix 2 ≤ r ≤ 36 and it will be used, no guessing, no uncertain behavior in the face of varying strings. (Although, to be sure, there’s still a very slight wrinkle: if r === 16 and your string begins with 0x or 0X, they’ll be skipped when determining the integer to return. But this is a pretty far-out edge case where you might want to parse a hexadecimal string without a prefix and would also want to process one with a prefix as just 0.)

But wait! There’s more

Beyond cuteness lies another concern. Let’s return to the leading-zero-but-not-hexadecimal case:

parseInt("0755");

In some programming languages a leading zero (that’s not part of a hexadecimal prefix) means the number is base-8: octal. So maybe JavaScript infers this to be an octal number, returning 7 × 8 × 8 + 5 × 8 + 5 === 493.

On the other hand, as I noted in Mozilla’s ES5 strict mode documentation, there’s some evidence that people use leading zeroes as alignment devices, thinking they have no semantic effect. So maybe leading zero is decimal instead.

The wonderful thing about standards is that there are so many of them to choose from

According to ES3, a leading zero with no explicit radix is either decimal or, if octal extensions have been implemented, octal. So what happens depends on who wrote the ES3 implementation, and what choice they made. But what if it’s an ES5 implementation? ES5 explicitly forbids octal and says this is interpreted as decimal. Therefore, parseInt("0755") is 755 in bog-standard ES3 implementations, 493 in ES3 implementations which have implemented octal extensions, and 755 in conforming ES5 implementations. Isn’t it great?

What do browsers actually do?

On the web everyone implements the octal extensions, so you’ll have to look hard to find an ES3 browser that doesn’t make parseInt("0755") === 493. But ES3 is old and busted, and ES5 is the new hotness. What do ES5 implementations do, especially as the change in ES5 isn’t backwards-compatible?

Surprisingly browsers aren’t all playing chicken here, waiting to see that they can change without breaking sites. On this particular point IE9 leads the way (in standards mode code only), implementing parseInt("0755") === 755 despite having implemented parseInt("0755") === 493 in the past. Before I saw IE9 implemented this (although I hasten to note they have not shipped a release with it yet), I expected no browser would implement it due to the possibility of breaking sites. After seeing IE9’s example, I’m less certain. Hopefully their experience will shed light on the decision for the other browser vendors.

Conclusion

Precise details of browser and specification inconsistencies aside, the point remains that parseInt(str) tries to be cute when parsing str. That cuteness can make parseInt(str) unpredictable if inputs vary sufficiently. Avoid edge-case bugs and use parseInt(str, radix) instead.

03.02.11

Working on the JS engine, Episode IV

A testcase submitted to us today:

([][(![]+[])[!+[]+!+[]+!+[]]+(!![]+[][(![]+[])[+[]]+(![]+[]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[+!+[]+[+[]]]+(!![]+[])[+!+[]]+(!![]+[])[+[]]][([][(![]+[])[+[]]+(![]+[]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]]+[])[!+[]+!+[]+!+[]]+(![]+[])[+!+[]]+(![]+[])[!+[]+!+[]]+(![]+[])[!+[]+!+[]]]()+[])[!+[]+!+[]+!+[]]+(!![]+[][(![]+[])[+[]]+(![]+[]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[+!+[]+[+[]]]+([][(![]+[])[!+[]+!+[]+!+[]]+(!![]+[][(![]+[])[+[]]+(![]+[]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]])[+!+[]+[+[]]]+(!![]+[])[+!+[]]+(!![]+[])[+[]]][([][(![]+[])[+[]]+(![]+[]+[][[]])[+!+[]+[+[]]]+(![]+[])[!+[]+!+[]]+(!![]+[])[+[]]+(!![]+[])[!+[]+!+[]+!+[]]+(!![]+[])[+!+[]]]+[])[!+[]+!+[]+!+[]]+(![]+[])[+!+[]]+(![]+[])[!+[]+!+[]]+(![]+[])[!+[]+!+[]]]()+[])[!+[]+!+[]]

The result according to ES3, plus a common implementation-specific behavior, is the string "job".

The result according to ES5, plus a common implementation-specific behavior, is a thrown TypeError.

31.01.11

Waiting for Superman

Tags: , , — Jeff @ 02:20

Hacks (if you’re into web developer-y things: subscribe!) has a post on the latest MDN sprint, a well-attended event with many fresh faces. This bodes well for Mozilla documentation. Sheppy always has more docs to write than time to write them. Even with recent help, there’s too much for documentation writers at Mozilla-the-corporation to handle. But Mozilla-the-community can get it done if people are willing to do it.

Yet there’s the rub: if. Some people who could absolutely kill writing documentation instead write patches, curate mailing lists or forums, run IRC channels, or translate. It’s no given that there will be writers sufficient to the tasks at hand. Even if there are, what will they choose to document? Will they write about ooh, shiny! or about technologies more abbreviation than word? Docs will probably get written, but it might take awhile, particularly if the change is hard to grok in short order.

Who will write critical documentation for your next great fix if Mozilla-the-corporation lacks the writers and Mozilla-the-community chooses to work on other important tasks?

You.

There is no cavalry, no Superman to save the day.

You know the technology in question, the nature of the fix. You’re the person who can best explain what you did. You don’t need to have the change explained to you in order to document it. The person uniquely suited to documenting your fix is you.

Maybe you’re not the best writer. Take a stab anyway. Write something hackish and dump it in a user sub-page, then mention it in the relevant bug when you add the dev-doc-needed keyword. Or edit the relevant page and ping a good writer for a once-over.

You know the ins and outs of the change. The expert writer doesn’t. Write up the change if you can. But if you can’t, just getting it out there, even in an unpolished state, makes it vastly easier for him to translate your expert knowledge into expert documentation.

24.01.11

New ES5 strict mode requirement: function statements not at top level of a program or function are prohibited

Function statements in ECMAScript

What’s the effect of this program according to ECMAScript?

function foo() { }

If you said that it defines a function as a property of the global object, congratulations! You’ve mastered a basic part of JavaScript syntax.

Let’s go a little trickier: what is the effect of the function defined in this program according to ECMAScript?

function foo()
{
  return g;
  function g() { }
}

This function, when called, defines a local variable g whose value is the specified function. Then it returns that function as the value of that variable. If you knew this as well, give yourself a gold star.

Now let’s try something even harder: what’s the effect of these programs?

if (true)
  function bar() { }
function g() { }
function foo()
{
  if (true)
    function g() { }
  return g;
}

Shenanigans!

Trick question! They fail to run due to syntax errors.

ECMAScript permits function statements in exactly two places: directly within the list of statements that make up a program, and directly within the list of statements that make up the contents of a function body. These are the first two examples. (A function statement also looks like an expression, but if it appears in expression context it’s a function expression, not a function statement.) Engines which permit a function statement anywhere else — as the child of a block statement enclosed by curly braces, as the child of a loop or condition, as the child of a with, or as the child of a case or default in a switch statement — do so by extending ES5.

Spec requirements aside, what are the semantics of extensionland function statements?

Now you’re just messing with me

Which semantics?

Browsers all implement extensionland function statements differently, with different semantics. Use them just so and they’ll work the same way across browsers. Use them in any way where the function statement conditionally executes, or where you start capturing the binding for the function in different locations, and you’ll find any semblance of cross-browser compatibility disappears. This example by Rich Dougherty, used with permission, demonstrates some of the incompatibilities (and I wonder whether function statements in with might present more):

var result = [];
result.push(f());
function f() { return 1; }
result.push(f());
if (1)
{
  result.push(f());
  function f() { return 2; }
  result.push(f());
}
result.push(f());
function y()
{
  result.push(g());
  function g() { return 3; }
  result.push(g());
  if (1)
  {
    result.push(g());
    function g() { return 4; }
    result.push(g());
  }
  result.push(g());
}
y();
print(result);

Results in different browsers vary a fair bit, although there’s a little more consensus on behavior now than at the time this example was originally written:

Browser Output
Firefox 1.5 and 2 1,1,1,2,2,3,3,3,3,3
Firefox 4 1,1,1,2,2,3,3,3,4,4
Opera 2,2,2,2,2,4,4,4,4,4
Internet Explorer 7 2,2,2,2,2,4,4,4,4,4
Safari 3 1,1,2,2,2,3,3,4,4,4
Safari 4 2,2,2,2,2,4,4,4,4,4
Chrome 2,2,2,2,2,4,4,4,4,4

Why not specify semantics?

Blindly specifying some particular behavior won’t work. Many sites these days (and different browser-specific implementations of those sites) rely on engine-specific behavior with user-agent-conditioned code. Changing browser behavior breaks that pretty hard. Specification will break any browsers not already implementing it at time of specification.

A way forward

The next version of ECMAScript would like to specify semantics for this case — quite possibly semantics not implemented by any browser. How to do it, if implementations irreconcilably disagree? The solution comes in two parts. First, “ES6” will require affirmative opt-in to enable new syntax and semantics, including for currently-nonstandard function statements. Second, in anticipation of that change, the ECMA committee recommends that non-standard function statements be forbidden in strict mode code, to open up a future path down which ES6 can walk.

To permit ES6 to standardize semantics, the ECMA committee recommends forbidding non-standard function statements in strict mode code. Thus these examples are syntax errors:

"use strict";
{
  function foo() { }
}
"use strict";
if (true)
  function bar() { }
"use strict";
with (obj)
  function foo() { }
"use strict";
for (;;)
  function foo() { }
"use strict";
switch (v)
{
  case 10:
    function bar() { }
  default:
    function baz() { }
}

Both Firefox and WebKit now implement this restriction, and other engines will follow as they too implement strict mode.

Conclusion

In order for future versions of ECMAScript to be able to define semantics for extensionland functions, strict mode “clears the deck” and forbids them entirely. Instead, assign functions to variables, a la var f = function() { };. Semantics for this are completely defined and compatibly implemented across browsers.

You can experiment with a version of Firefox with these changes by downloading a nightly build. (Don’t forget to use the profile manager if you want to keep the settings you use with your primary Firefox installation pristine.)

18.01.11

Property rights without legal enforcement

Tags: , , , , , — Jeff @ 07:51

Can you own something without a government enforcing that ownership right? Sometimes, yes:

A parking space cleared of snow lies empty but for a wooden chair placed in the center of it, laying claim to that space for the person who cleared it
Property takes many forms

Some time ago I read about Boston’s system for allocating parking spaces in snowy weather: you clear it, you claim it. (See also more recent articles, too.) I happened to see this system in action yesterday for the first time. People not living in areas with proper winter climes might not have heard of this before, and I think it’s a nifty little system worth highlighting. It calibrates investments and incentives, thus “pricing” a scarcity to produce a more efficient allocation. It saves money, because there’s no need to pay city workers to clear spaces. And it does so with little administrative overhead: individuals overwhelmingly maintain the system. It’s a thing of beauty all around.

For more on the economics of this property system, see the article Snow Jobs at the Library of Economics and Liberty.

« NewerOlder »