20.06.15

New changes to make SpiderMonkey’s (and Firefox’s) parsing of destructuring patterns more spec-compliant

Destructuring in JavaScript

One new feature in JavaScript, introduced in ECMAScript 6 (formally ECMAScript 2015, but it’ll always be ES6 in our hearts), is destructuring. Destructuring is syntactic sugar for assigning sub-values within a single value — nested properties, iteration results, &c., to arbitrary depths — to a set of locations (names, properties, &c.).

// Declarations
var [a, b] = [1, 2]; // a = 1, b = 2
var { x: c, y: d } = { x: 42, y: 17 }; // c = 42, d = 17

function f([z]) { return z; }
print(f([8675309])); // 8675309


// Assignments
[b, f.prop] = [3, 15]; // b = 3, f.prop = 15
({ p: d } = { p: 33 }); // d = 33

function Point(x, y) { this.x = x; this.y = y; }

// Nesting's copacetic, too.
// a = 2, b = 4, c = 8, d = 16
[{ x: a, y: b }, [c, d]] = [new Point(2, 4), [8, 16]];

Ambiguities in the parsing of destructuring

One wrinkle to destructuring is its ambiguity: reading start to finish, is a “destructuring pattern” instead a literal? Until any succeeding = is observed, it’s impossible to know. And for object destructuring patterns, could the “pattern” just be a block statement? (A block statement is a list of statements inside {}, e.g. many loop bodies.)

How ES6 handles the potential parser ambiguities in destructuring

ES6 says an apparent “pattern” could be any of these possibilities: the only way to know is to completely parse the expression/statement. There are more elegant and less elegant ways to do this, although in the end they amount to the same thing.

Object destructuring patterns present somewhat less ambiguity than array patterns. In expression context, { may begin an object literal or an object destructuring pattern (just as [ does for arrays, mutatis mutandis). But in statement context, { since the dawn of JavaScript only begins a block statement, never an object literal — and now, never an object destructuring pattern.

How then to write object destructuring pattern assignments not in expression context? For some time SpiderMonkey has allowed destructuring patterns to be parenthesized, incidentally eliminating this ambiguity. But ES6 chose another path. In ES6 destructuring patterns must not be parenthesized, at any level of nesting within the pattern. And in declarative destructuring patterns (but not in destructuring assignments), declaration names also must not be parenthesized.

SpiderMonkey now adheres to ES6 in requiring no parentheses around destructuring patterns

As of several hours ago on mozilla-inbound, SpiderMonkey conforms to ES6’s parsing requirements for destructuring, with respect to parenthesization. These examples are all now syntax errors:

// Declarations
var [(a)] = [1]; // BAD, a parenthesized
var { x: (c) } = {}; // BAD, c parenthesized
var { o: ({ p: p }) } = { o: { p: 2 } }; // BAD, nested pattern parenthesized

function f([(z)]) { return z; } // BAD, z parenthesized

// Top level
({ p: a }) = { p: 42 }; // BAD, pattern parenthesized
([a]) = [5]; // BAD, pattern parenthesized

// Nested
[({ p: a }), { x: c }] = [{}, {}]; // BAD, nested pattern parenthesized

Non-array/object patterns in destructuring assignments, outside of declarations, can still be parenthesized:

// Assignments
[(b)] = [3]; // OK: parentheses allowed around non-pattern in a non-declaration assignment
({ p: (d) } = {}); // OK: ditto
[(parseInt.prop)] = [3]; // OK: parseInt.prop not a pattern, assigns parseInt.prop = 3

Conclusion

These changes shouldn’t much disrupt anyone writing JS. Parentheses around array patterns are unnecessary and are easily removed. For object patterns, instead of parenthesizing the object pattern, parenthesize the whole assignment. No big deal!

// Assignments
([b]) = [3]; // BAD: parentheses around array pattern
[b] = [3]; // GOOD

({ p: d }) = { p: 2 }; // BAD: parentheses around object pattern
({ p: d } = { p: 2 }); // GOOD

One step forward for SpiderMonkey standards compliance!

11.12.14

Introducing the JavaScript Internationalization API

(also cross-posted on the Hacks blog — comment over there if you have anything to say)

Firefox 29 issued half a year ago, so this post is long overdue. Nevertheless I wanted to pause for a second to discuss the Internationalization API first shipped on desktop in that release (and passing all tests!). Norbert Lindenberg wrote most of the implementation, and I reviewed it and now maintain it. (Work by Makoto Kato should bring this to Android soon; b2g may take longer due to some b2g-specific hurdles. Stay tuned.)

What’s internationalization?

Internationalization (i18n for short — i, eighteen characters, n) is the process of writing applications in a way that allows them to be easily adapted for audiences from varied places, using varied languages. It’s easy to get this wrong by inadvertently assuming one’s users come from one place and speak one language, especially if you don’t even know you’ve made an assumption.

function formatDate(d)
{
  // Everyone uses month/date/year...right?
  var month = d.getMonth() + 1;
  var date = d.getDate();
  var year = d.getFullYear();
  return month + "/" + date + "/" + year;
}

function formatMoney(amount)
{
  // All money is dollars with two fractional digits...right?
  return "$" + amount.toFixed(2);
}

function sortNames(names)
{
  function sortAlphabetically(a, b)
  {
    var left = a.toLowerCase(), right = b.toLowerCase();
    if (left > right)
      return 1;
    if (left === right)
      return 0;
    return -1;
  }

  // Names always sort alphabetically...right?
  names.sort(sortAlphabetically);
}

JavaScript’s historical i18n support is poor

i18n-aware formatting in traditional JS uses the various toLocaleString() methods. The resulting strings contained whatever details the implementation chose to provide: no way to pick and choose (did you need a weekday in that formatted date? is the year irrelevant?). Even if the proper details were included, the format might be wrong e.g. decimal when percentage was desired. And you couldn’t choose a locale.

As for sorting, JS provided almost no useful locale-sensitive text-comparison (collation) functions. localeCompare() existed but with a very awkward interface unsuited for use with sort. And it too didn’t permit choosing a locale or specific sort order.

These limitations are bad enough that — this surprised me greatly when I learned it! — serious web applications that need i18n capabilities (most commonly, financial sites displaying currencies) will box up the data, send it to a server, have the server perform the operation, and send it back to the client. Server roundtrips just to format amounts of money. Yeesh.

A new JS Internationalization API

The new ECMAScript Internationalization API greatly improves JavaScript’s i18n capabilities. It provides all the flourishes one could want for formatting dates and numbers and sorting text. The locale is selectable, with fallback if the requested locale is unsupported. Formatting requests can specify the particular components to include. Custom formats for percentages, significant digits, and currencies are supported. Numerous collation options are exposed for use in sorting text. And if you care about performance, the up-front work to select a locale and process options can now be done once, instead of once every time a locale-dependent operation is performed.

That said, the API is not a panacea. The API is “best effort” only. Precise outputs are almost always deliberately unspecified. An implementation could legally support only the oj locale, or it could ignore (almost all) provided formatting options. Most implementations will have high-quality support for many locales, but it’s not guaranteed (particularly on resource-constrained systems such as mobile).

Under the hood, Firefox’s implementation depends upon the International Components for Unicode library (ICU), which in turn depends upon the Unicode Common Locale Data Repository (CLDR) locale data set. Our implementation is self-hosted: most of the implementation atop ICU is written in JavaScript itself. We hit a few bumps along the way (we haven’t self-hosted anything this large before), but nothing major.

The Intl interface

The i18n API lives on the global Intl object. Intl contains three constructors: Intl.Collator, Intl.DateTimeFormat, and Intl.NumberFormat. Each constructor creates an object exposing the relevant operation, efficiently caching locale and options for the operation. Creating such an object follows this pattern:

var ctor = "Collator"; // or the others
var instance = new Intl[ctor](locales, options);

locales is a string specifying a single language tag or an arraylike object containing multiple language tags. Language tags are strings like en (English generally), de-AT (German as used in Austria), or zh-Hant-TW (Chinese as used in Taiwan, using the traditional Chinese script). Language tags can also include a “Unicode extension”, of the form -u-key1-value1-key2-value2..., where each key is an “extension key”. The various constructors interpret these specially.

options is an object whose properties (or their absence, by evaluating to undefined) determine how the formatter or collator behaves. Its exact interpretation is determined by the individual constructor.

Given locale information and options, the implementation will try to produce the closest behavior it can to the “ideal” behavior. Firefox supports 400+ locales for collation and 600+ locales for date/time and number formatting, so it’s very likely (but not guaranteed) the locales you might care about are supported.

Intl generally provides no guarantee of particular behavior. If the requested locale is unsupported, Intl allows best-effort behavior. Even if the locale is supported, behavior is not rigidly specified. Never assume that a particular set of options corresponds to a particular format. The phrasing of the overall format (encompassing all requested components) might vary across browsers, or even across browser versions. Individual components’ formats are unspecified: a short-format weekday might be “S”, “Sa”, or “Sat”. The Intl API isn’t intended to expose exactly specified behavior.

Date/time formatting

Options

The primary options properties for date/time formatting are as follows:

weekday, era
"narrow", "short", or "long". (era refers to typically longer-than-year divisions in a calendar system: BC/AD, the current Japanese emperor’s reign, or others.)
month
"2-digit", "numeric", "narrow", "short", or "long"
year
day
hour, minute, second
"2-digit" or "numeric"
timeZoneName
"short" or "long"
timeZone
Case-insensitive "UTC" will format with respect to UTC. Values like "CEST" and "America/New_York" don’t have to be supported, and they don’t currently work in Firefox.

The values don’t map to particular formats: remember, the Intl API almost never specifies exact behavior. But the intent is that "narrow", "short", and "long" produce output of corresponding size — “S” or “Sa”, “Sat”, and “Saturday”, for example. (Output may be ambiguous: Saturday and Sunday both could produce “S”.) "2-digit" and "numeric" map to two-digit number strings or full-length numeric strings: “70” and “1970”, for example.

The final used options are largely the requested options. However, if you don’t specifically request any weekday/year/month/day/hour/minute/second, then year/month/day will be added to your provided options.

Beyond these basic options are a few special options:

hour12
Specifies whether hours will be in 12-hour or 24-hour format. The default is typically locale-dependent. (Details such as whether midnight is zero-based or twelve-based and whether leading zeroes are present are also locale-dependent.)

There are also two special properties, localeMatcher (taking either "lookup" or "best fit") and formatMatcher (taking either "basic" or "best fit"), each defaulting to "best fit". These affect how the right locale and format are selected. The use cases for these are somewhat esoteric, so you should probably ignore them.

Locale-centric options

DateTimeFormat also allows formatting using customized calendaring and numbering systems. These details are effectively part of the locale, so they’re specified in the Unicode extension in the language tag.

For example, Thai as spoken in Thailand has the language tag th-TH. Recall that a Unicode extension has the format -u-key1-value1-key2-value2.... The calendaring system key is ca, and the numbering system key is nu. The Thai numbering system has the value thai, and the Chinese calendaring system has the value chinese. Thus to format dates in this overall manner, we tack a Unicode extension containing both these key/value pairs onto the end of the language tag: th-TH-u-ca-chinese-nu-thai.

For more information on the various calendaring and numbering systems, see the full DateTimeFormat documentation.

Examples

After creating a DateTimeFormat object, the next step is to use it to format dates via the handy format() function. Conveniently, this function is a bound function: you don’t have to call it on the DateTimeFormat directly. Then provide it a timestamp or Date object.

Putting it all together, here are some examples of how to create DateTimeFormat options for particular uses, with current behavior in Firefox.

var msPerDay = 24 * 60 * 60 * 1000;

// July 17, 2014 00:00:00 UTC.
var july172014 = new Date(msPerDay * (44 * 365 + 11 + 197));

Let’s format a date for English as used in the United States. Let’s include two-digit month/day/year, plus two-digit hours/minutes, and a short time zone to clarify that time. (The result would obviously be different in another time zone.)

var options =
  { year: "2-digit", month: "2-digit", day: "2-digit",
    hour: "2-digit", minute: "2-digit",
    timeZoneName: "short" };
var americanDateTime =
  new Intl.DateTimeFormat("en-US", options).format;

print(americanDateTime(july172014)); // 07/16/14, 5:00 PM PDT

Or let’s do something similar for Portuguese — ideally as used in Brazil, but in a pinch Portugal works. Let’s go for a little longer format, with full year and spelled-out month, but make it UTC for portability.

var options =
  { year: "numeric", month: "long", day: "numeric",
    hour: "2-digit", minute: "2-digit",
    timeZoneName: "short", timeZone: "UTC" };
var portugueseTime =
  new Intl.DateTimeFormat(["pt-BR", "pt-PT"], options);

// 17 de julho de 2014 00:00 GMT
print(portugueseTime.format(july172014));

How about a compact, UTC-formatted weekly Swiss train schedule? We’ll try the official languages from most to least popular to choose the one that’s most likely to be readable.

var swissLocales = ["de-CH", "fr-CH", "it-CH", "rm-CH"];
var options =
  { weekday: "short",
    hour: "numeric", minute: "numeric",
    timeZone: "UTC", timeZoneName: "short" };
var swissTime =
  new Intl.DateTimeFormat(swissLocales, options).format;

print(swissTime(july172014)); // Do. 00:00 GMT

Or let’s try a date in descriptive text by a painting in a Japanese museum, using the Japanese calendar with year and era:

var jpYearEra =
  new Intl.DateTimeFormat("ja-JP-u-ca-japanese",
                          { year: "numeric", era: "long" });

print(jpYearEra.format(july172014)); // 平成26年

And for something completely different, a longer date for use in Thai as used in Thailand — but using the Thai numbering system and Chinese calendar. (Quality implementations such as Firefox’s would treat plain th-TH as th-TH-u-ca-buddhist-nu-latn, imputing Thailand’s typical Buddhist calendar system and Latin 0-9 numerals.)

var options =
  { year: "numeric", month: "long", day: "numeric" };
var thaiDate =
  new Intl.DateTimeFormat("th-TH-u-nu-thai-ca-chinese", options);

print(thaiDate.format(july172014)); // ๒๐ 6 ๓๑

Calendar and numbering system bits aside, it’s relatively simple. Just pick your components and their lengths.

Number formatting

Options

The primary options properties for number formatting are as follows:

style
"currency", "percent", or "decimal" (the default) to format a value of that kind.
currency
A three-letter currency code, e.g. USD or CHF. Required if style is "currency", otherwise meaningless.
currencyDisplay
"code", "symbol", or "name", defaulting to "symbol". "code" will use the three-letter currency code in the formatted string. "symbol" will use a currency symbol such as $ or £. "name" typically uses some sort of spelled-out version of the currency. (Firefox currently only supports "symbol", but this will be fixed soon.)
minimumIntegerDigits
An integer from 1 to 21 (inclusive), defaulting to 1. The resulting string is front-padded with zeroes until its integer component contains at least this many digits. (For example, if this value were 2, formatting 3 might produce “03”.)
minimumFractionDigits, maximumFractionDigits
Integers from 0 to 20 (inclusive). The resulting string will have at least minimumFractionDigits, and no more than maximumFractionDigits, fractional digits. The default minimum is currency-dependent (usually 2, rarely 0 or 3) if style is "currency", otherwise 0. The default maximum is 0 for percents, 3 for decimals, and currency-dependent for currencies.
minimumSignificantDigits, maximumSignificantDigits
Integers from 1 to 21 (inclusive). If present, these override the integer/fraction digit control above to determine the minimum/maximum significant figures in the formatted number string, as determined in concert with the number of decimal places required to accurately specify the number. (Note that in a multiple of 10 the significant digits may be ambiguous, as in “100” with its one, two, or three significant digits.)
useGrouping
Boolean (defaulting to true) determining whether the formatted string will contain grouping separators (e.g. “,” as English thousands separator).

NumberFormat also recognizes the esoteric, mostly ignorable localeMatcher property.

Locale-centric options

Just as DateTimeFormat supported custom numbering systems in the Unicode extension using the nu key, so too does NumberFormat. For example, the language tag for Chinese as used in China is zh-CN. The value for the Han decimal numbering system is hanidec. To format numbers for these systems, we tack a Unicode extension onto the language tag: zh-CN-u-nu-hanidec.

For complete information on specifying the various numbering systems, see the full NumberFormat documentation.

Examples

NumberFormat objects have a format function property just as DateTimeFormat objects do. And as there, the format function is a bound function that may be used in isolation from the NumberFormat.

Here are some examples of how to create NumberFormat options for particular uses, with Firefox’s behavior. First let’s format some money for use in Chinese as used in China, specifically using Han decimal numbers (instead of much more common Latin numbers). Select the "currency" style, then use the code for Chinese renminbi (yuan), grouping by default, with the usual number of fractional digits.

var hanDecimalRMBInChina =
  new Intl.NumberFormat("zh-CN-u-nu-hanidec",
                        { style: "currency", currency: "CNY" });

print(hanDecimalRMBInChina.format(1314.25)); // ¥ 一,三一四.二五

Or let’s format a United States-style gas price, with its peculiar thousandths-place 9, for use in English as used in the United States.

var gasPrice =
  new Intl.NumberFormat("en-US",
                        { style: "currency", currency: "USD",
                          minimumFractionDigits: 3 });

print(gasPrice.format(5.259)); // $5.259

Or let’s try a percentage in Arabic, meant for use in Egypt. Make sure the percentage has at least two fractional digits. (Note that this and all the other RTL examples may appear with different ordering in RTL context, e.g. ٤٣٫٨٠٪ instead of ٤٣٫٨٠٪.)

var arabicPercent =
  new Intl.NumberFormat("ar-EG",
                        { style: "percent",
                          minimumFractionDigits: 2 }).format;

print(arabicPercent(0.438)); // ٤٣٫٨٠٪

Or suppose we’re formatting for Persian as used in Afghanistan, and we want at least two integer digits and no more than two fractional digits.

var persianDecimal =
  new Intl.NumberFormat("fa-AF",
                        { minimumIntegerDigits: 2,
                          maximumFractionDigits: 2 });

print(persianDecimal.format(3.1416)); // ۰۳٫۱۴

Finally, let’s format an amount of Bahraini dinars, for Arabic as used in Bahrain. Unusually compared to most currencies, Bahraini dinars divide into thousandths (fils), so our number will have three places. (Again note that apparent visual ordering should be taken with a grain of salt.)

var bahrainiDinars =
  new Intl.NumberFormat("ar-BH",
                        { style: "currency", currency: "BHD" });

print(bahrainiDinars.format(3.17)); // د.ب.‏ ٣٫١٧٠

Collation

Options

The primary options properties for collation are as follows:

usage
"sort" or "search" (defaulting to "sort"), specifying the intended use of this Collator. (A search collator might want to consider more strings equivalent than a sort collator would.)
sensitivity
"base", "accent", "case", or "variant". This affects how sensitive the collator is to characters that have the same “base letter” but have different accents/diacritics and/or case. (Base letters are locale-dependent: “a” and “ä” have the same base letter in German but are different letters in Swedish.) "base" sensitivity considers only the base letter, ignoring modifications (so for German “a”, “A”, and “ä” are considered the same). "accent" considers the base letter and accents but ignores case (so for German “a” and “A” are the same, but “ä” differs from both). "case" considers the base letter and case but ignores accents (so for German “a” and “ä” are the same, but “A” differs from both). Finally, "variant" considers base letter, accents, and case (so for German “a”, “ä, “ä” and “A” all differ). If usage is "sort", the default is "variant"; otherwise it’s locale-dependent.
numeric
Boolean (defaulting to false) determining whether complete numbers embedded in strings are considered when sorting. For example, numeric sorting might produce "F-4 Phantom II", "F-14 Tomcat", "F-35 Lightning II"; non-numeric sorting might produce "F-14 Tomcat", "F-35 Lightning II", "F-4 Phantom II".
caseFirst
"upper", "lower", or "false" (the default). Determines how case is considered when sorting: "upper" places uppercase letters first ("B", "a", "c"), "lower" places lowercase first ("a", "c", "B"), and "false" ignores case entirely ("a", "B", "c"). (Note: Firefox currently ignores this property.)
ignorePunctuation
Boolean (defaulting to false) determining whether to ignore embedded punctuation when performing the comparison (for example, so that "biweekly" and "bi-weekly" compare equivalent).

And there’s that localeMatcher property that you can probably ignore.

Locale-centric options

The main Collator option specified as part of the locale’s Unicode extension is co, selecting the kind of sorting to perform: phone book (phonebk), dictionary (dict), and many others.

Additionally, the keys kn and kf may, optionally, duplicate the numeric and caseFirst properties of the options object. But they’re not guaranteed to be supported in the language tag, and options is much clearer than language tag components. So it’s best to only adjust these options through options.

These key-value pairs are included in the Unicode extension the same way they’ve been included for DateTimeFormat and NumberFormat; refer to those sections for how to specify these in a language tag.

Examples

Collator objects have a compare function property. This function accepts two arguments x and y and returns a number less than zero if x compares less than y, 0 if x compares equal to y, or a number greater than zero if x compares greater than y. As with the format functions, compare is a bound function that may be extracted for standalone use.

Let’s try sorting a few German surnames, for use in German as used in Germany. There are actually two different sort orders in German, phonebook and dictionary. Phonebook sort emphasizes sound, and it’s as if “ä”, “ö”, and so on were expanded to “ae”, “oe”, and so on prior to sorting.

var names =
  ["Hochberg", "Hönigswald", "Holzman"];

var germanPhonebook = new Intl.Collator("de-DE-u-co-phonebk");

// as if sorting ["Hochberg", "Hoenigswald", "Holzman"]:
//   Hochberg, Hönigswald, Holzman
print(names.sort(germanPhonebook.compare).join(", "));

Some German words conjugate with extra umlauts, so in dictionaries it’s sensible to order ignoring umlauts (except when ordering words differing only by umlauts: schon before schön).

var germanDictionary = new Intl.Collator("de-DE-u-co-dict");

// as if sorting ["Hochberg", "Honigswald", "Holzman"]:
//   Hochberg, Holzman, Hönigswald
print(names.sort(germanDictionary.compare).join(", "));

Or let’s sort a list Firefox versions with various typos (different capitalizations, random accents and diacritical marks, extra hyphenation), in English as used in the United States. We want to sort respecting version number, so do a numeric sort so that numbers in the strings are compared, not considered character-by-character.

var firefoxen =
  ["FireFøx 3.6",
   "Fire-fox 1.0",
   "Firefox 29",
   "FÍrefox 3.5",
   "Fírefox 18"];

var usVersion =
  new Intl.Collator("en-US",
                    { sensitivity: "base",
                      numeric: true,
                      ignorePunctuation: true });

// Fire-fox 1.0, FÍrefox 3.5, FireFøx 3.6, Fírefox 18, Firefox 29
print(firefoxen.sort(usVersion.compare).join(", "));

Last, let’s do some locale-aware string searching that ignores case and accents, again in English as used in the United States.

// Comparisons work with both composed and decomposed forms.
var decoratedBrowsers =
  [
   "A\u0362maya",  // A͢maya
   "CH\u035Brôme", // CH͛rôme
   "FirefÓx",
   "sAfàri",
   "o\u0323pERA",  // ọpERA
   "I\u0352E",     // I͒E
  ];

var fuzzySearch =
  new Intl.Collator("en-US",
                    { usage: "search", sensitivity: "base" });

function findBrowser(browser)
{
  function cmp(other)
  {
    return fuzzySearch.compare(browser, other) === 0;
  }
  return cmp;
}

print(decoratedBrowsers.findIndex(findBrowser("Firêfox"))); // 2
print(decoratedBrowsers.findIndex(findBrowser("Safåri")));  // 3
print(decoratedBrowsers.findIndex(findBrowser("Ãmaya")));   // 0
print(decoratedBrowsers.findIndex(findBrowser("Øpera")));   // 4
print(decoratedBrowsers.findIndex(findBrowser("Chromè")));  // 1
print(decoratedBrowsers.findIndex(findBrowser("IË")));      // 5

Odds and ends

It may be useful to determine whether support for some operation is provided for particular locales, or to determine whether a locale is supported. Intl provides supportedLocales() functions on each constructor, and resolvedOptions() functions on each prototype, to expose this information.

var navajoLocales =
  Intl.Collator.supportedLocalesOf(["nv"], { usage: "sort" });
print(navajoLocales.length > 0
      ? "Navajo collation supported"
      : "Navajo collation not supported");

var germanFakeRegion =
  new Intl.DateTimeFormat("de-XX", { timeZone: "UTC" });
var usedOptions = germanFakeRegion.resolvedOptions();
print(usedOptions.locale);   // de
print(usedOptions.timeZone); // UTC

Legacy behavior

The ES5 toLocaleString-style and localeCompare functions previously had no particular semantics, accepted no particular options, and were largely useless. So the i18n API reformulates them in terms of Intl operations. Each method now accepts additional trailing locales and options arguments, interpreted just as the Intl constructors would do. (Except that for toLocaleTimeString and toLocaleDateString, different default components are used if options aren’t provided.)

For brief use where precise behavior doesn’t matter, the old methods are fine to use. But if you need more control or are formatting or comparing many times, it’s best to use the Intl primitives directly.

Conclusion

Internationalization is a fascinating topic whose complexity is bounded only by the varied nature of human communication. The Internationalization API addresses a small but quite useful portion of that complexity, making it easier to produce locale-sensitive web applications. Go use it!

(And a special thanks to Norbert Lindenberg, Anas El Husseini, Simon Montagu, Gary Kwong, Shu-yu Guo, Ehsan Akhgari, the people of #mozilla.de, and anyone I may have forgotten [sorry!] who provided feedback on this article or assisted me in producing and critiquing the examples. The English and German examples were the limit of my knowledge, and I’d have been completely lost on the other examples without their assistance. Blame all remaining errors on me. Thanks again!)

(and to reiterate: comment on the Hacks post if you have anything to say)

12.08.13

Micro-feature from ES6, now in Firefox Aurora and Nightly: binary and octal numbers

A couple years ago when SpiderMonkey’s implementation of strict mode was completed, I observed that strict mode forbids octal number syntax. There was some evidence that novice programmers used leading zeroes as alignment devices, leading to unexpected results:

var sum = 015 + // === 13, not 15!
          197;
// sum === 210, not 212

But some users (Mozilla extensions and server-side node.js packages in particular) still want octal syntax, usually for file permissions. ES6 thus adds new octal syntax that won’t trip up novices. Hexadecimal numbers are formed with the prefix 0x or 0X followed by hexadecimal digits. Octal numbers are similarly formed using 0o or 0O followed by octal digits:

var DEFAULT_PERMS = 0o644; // kosher anywhere, including strict mode code

(Yes, it was intentional to allow the 0O prefix [zero followed by a capital O] despite its total unreadability. Consistency trumped readability in TC39, as I learned when questioning the wisdom of 0O as prefix. I think that decision is debatable, and the alternative is certainly not “nanny language design”. But I don’t much care as long as I never see it. 🙂 I recommend never using the capital version and applying a cluestick to anyone who does.)

Some developers also need binary syntax, which ECMAScript has never provided. ES6 thus adds analogous binary syntax using the letter b (lowercase or uppercase):

var FLT_SIGNBIT  = 0b10000000000000000000000000000000;
var FLT_EXPONENT = 0b01111111100000000000000000000000;
var FLT_MANTISSA = 0b00000000011111111111111111111111;

Try out both new syntaxes in Firefox Aurora or, if you’re feeling adventurous, in a Firefox nightly. Use the profile manager if you don’t want your regular Firefox browsing history touched.

If you’ve ever needed octal or binary numbers, hopefully these additions will brighten your day a little. 🙂

05.08.13

New in Firefox 23: the length property of an array can be made non-writable (but you shouldn’t do it)

Properties and their attributes

Properties of JavaScript objects include attributes for enumerability (whether the property shows up in a for-in loop on the object) and configurability (whether the property can be deleted or changed in certain ways). Getter/setter properties also include get and set attributes storing those functions, and value properties include attributes for writability and value.

Array properties’ attributes

Arrays are objects; array properties are structurally identical to properties of all other objects. But arrays have long-standing, unusual behavior concerning their elements and their lengths. These oddities cause array properties to look like other properties but behave quite differently.

The length property

The length property of an array looks like a data property but when set acts like an accessor property.

var arr = [0, 1, 2, 3];
var desc = Object.getOwnPropertyDescriptor(arr, "length");
print(desc.value); // 4
print(desc.writable); // true
print("get" in desc); // false
print("set" in desc); // false

print("0" in arr); // true
arr.length = 0;
print("0" in arr); // false (!)

In ES5 terms, the length property is a data property. But arrays have a special [[DefineOwnProperty]] hook, invoked whenever a property is added, set, or modified, that imposes special behavior on array length changes.

The element properties of arrays

Arrays’ [[DefineOwnProperty]] also imposes special behavior on array elements. Array elements also look like data properties, but if you add an element beyond the length, it’s as if a setter were called — the length grows to accommodate the element.

var arr = [0, 1, 2, 3];
var desc = Object.getOwnPropertyDescriptor(arr, "0");
print(desc.value); // 0
print(desc.writable); // true
print("get" in desc); // false
print("set" in desc); // false

print(arr.length); // 4
arr[7] = 0;
print(arr.length); // 8 (!)

Arrays are unlike any other objects, and so JS array implementations are highly customized. These customizations allow the length and elements to act as specified when modified. They also make array element access about as fast as array element accesses in languages like C++.

Object.defineProperty implementations and arrays

Customized array representations complicate Object.defineProperty. Defining array elements isn’t a problem, as increasing the length for added elements is long-standing behavior. But defining array length is problematic: if the length can be made non-writable, every place that modifies array elements must respect that.

Most engines’ initial Object.defineProperty implementations didn’t correctly support redefining array lengths. Providing a usable implementation for non-array objects was top priority; array support was secondary. SpiderMonkey’s initial Object.defineProperty implementation threw a TypeError when redefining length, stating this was “not currently supported”. Fully-correct behavior required changes to our object representation.

Earlier this year, Brian Hackett’s work in bug 827490 changed our object representation enough to implement length redefinition. I fixed bug 858381 in April to make Object.defineProperty work for array lengths. Those changes will be in Firefox 23 tomorrow.

Should you make array lengths non-writable?

You can change an array’s length without redefining it, so the only new capability is making an array length non-writable. Compatibility aside, should you make array lengths non-writable? I don’t think so.

Non-writable array length forbids certain operations:

  • You can’t change the length.
  • You can’t add an element past that length.
  • You can’t call methods that increase (e.g. Array.prototype.push) or decrease (e.g. Array.prototype.pop) the length. (These methods do sometimes modify the array, in well-specified ways that don’t change the length, before throwing.)

But these are purely restrictions. Any operation that succeeds on an array with non-writable length, succeeds on the same array with writable length. You wouldn’t do any of these things anyway, to an array whose length you’re treating as fixed. So why mark it non-writable at all? There’s no functionality-based reason for good code to have non-writable array lengths.

Fixed-length arrays’ only value is in maybe permitting optimizations dependent on immutable length. Making length immutable permits minimizing array elements’ memory use. (Arrays usually over-allocate memory to avoid O(n2) behavior when repeatedly extending the array.) But if it saves memory (this is highly allocator-sensitive), it won’t save much. Fixed-length arrays may permit bounds-check elimination in very circumscribed situations. But these are micro-optimizations you’d be hard-pressed to notice in practice.

In conclusion: I don’t think you should use non-writable array lengths. They’re required by ES5, so we’ll support them. But there’s no good reason to use them.

25.01.12

SpiderMonkey no longer supports sharp variables

Tags: , , , , — Jeff @ 10:17

ECMAScript object literals

ECMAScript, the standard underlying the JavaScript language, can represent simple objects and arrays using literals.

var arr = [1, 2, 3];
var obj = { property: "ohai" };

But it can’t represent all objects and arrays.

Cyclic and non-tree objects

ECMAScript literals can’t represent circular objects and arrays which (perhaps at some nesting distance) have properties referring to themselves (in other words, the objects form cyclic graphs). Nor can they (faithfully) represent objects which contain some other object multiple times (in other words, the objects form a directed acyclic graph which is not also a tree).

var arr = [1, "overwritten", 3];
arr[1] = arr; // cyclic
var obj = { property: "ohai", nest: {} };
obj.nest.parent = obj; // cyclic
obj.secondCopy = obj.nest; // non-tree, obj.nest is repeated

Sharp variables

SpiderMonkey historically supported extension syntax to represent such graphs under the name sharp variables. Sharp variables were inspired by Common Lisp syntax, and they enabled naming an object or array literal before it had been fully evaluated (even, to a limited extent, interacting with it). Netscape proposed sharp variables for inclusion in ES3, but the proposal was rejected as being too domain-specific and being arguably ugly. Since then the extension has lingered in SpiderMonkey but has seen very little use.

// Identical semantics using sharp variables
var arr = #1=[1, #1#, 3]; // #n= names an object being created, #n# refers to it
var obj = #1={ property: "ohai", nest: #2={ parent: #1# }, secondCopy: #2# };

No other ECMAScript implementer has since shown interest in implementing sharp variables. And with renewed efforts to evolve ECMAScript syntax, special characters like # are increasingly precious. Thus we’ve decided it’s time to remove sharp variable support from SpiderMonkey.

One benefit to removing sharp variables is that we can remove a good chunk of rarely-used code (and attack surface: sharp variables have been a source of some number of likely vulnerabilities) from SpiderMonkey. A syntax-removal patch added 79 lines and removed 1112 lines, including tests; not including tests, it added 42 lines and removed 677 lines. A subsequent patch to remove generation of sharp variable syntax from object decompilation added 65 lines and removed 128 lines. Removing sharp variables will also permit some simplifications now that evaluating a literal can’t have side effects beyond those in any nested property initializers.

Alternatives to sharp variables

Sharp variables may have been sometimes convenient, but they were mere syntactic sugar. It should be simple to convert any use of sharp variables to an equivalent sequence of property additions. If you were sufficiently aware of sharp variables to use them to represent non-tree objects, I trust I don’t have to explain how to do this.

Somewhat more interesting are the cases where decompiling an object produced sharp variable syntax, as when decompiling a cyclic object during debugging. (It’s worth noting in passing that decompilation will not infinitely recur: instead, it’ll bottom out with an empty object or an omitted property.) Jason Orendorff has written a sharps mini-library implementing decompilation of cyclic and non-tree objects which may be useful for this task.

When to expect this change

The sharp variable documentation on MDN has long noted that sharp variables were deprecated and would likely to be removed; a few months ago that warning was upgraded to a firm statement that they would be removed. The sharp variable removal patch that landed yesterday completes the process. The removal will first appear in either today or tomorrow’s nightly; in a week’s time it’ll make its way into the aurora branch, then the beta branch, and finally into Firefox 12. Versions of Firefox prior to 12 will not be affected by this removal, including the extended-support release Firefox 10.

Older »