Internationalization in SpiderMonkey
Jeff Walden
Mozilla Corporation
Problem
- formatting numbers to string
- common on financial sites, which want formatting wrt a specific currency's conventional display
- formatting times and dates to string
- social sites that display timestamps and such, for example
- sorting lists of data
- Bugzilla search results, etc.
The current solutions
- Number.prototype.toLocaleString, Date.prototype.toLocaleString, Date.prototype.toLocaleDateString, Date.prototype.toLocaleTimeString
- ES5 semantics for each: *handwave*
- Our semantics: embedding-dependent, OS/system-specific -- uncontrollable, unreliable
- Array.prototype.sort
- takes a sorting function -- a very poor match with the this-centric to*String methods
- no input as to what sort (heh) of sort is desired: phone book order, numeric order, etc.
ECMA-402
- an Intl global property
- various constructors/methods to implement i18n subclassing
- Intl.NumberFormat
- Intl.DateTimeFormat
- Intl.Collator
- Just a start: more to come in future editions
Intl constructors allow behavior customization
- specifying the locale with respect to which behavior is performed
- configuring formatting along basic axes (e.g. time zone, calendaring system for dates/times)
- for dates/times, selecting the components to be included in the formatted string
- hour, minute, seconds, month, date, year, etc.
- turning on/off various formatting choices
- number grouping (1,000 versus 1000)
- leading zero in hour numbers (04:00 versus 04:00)
- more
- ...and much more
"You know nothing"
- not an Intl expert
- not an Intl user
- writes JS, doesn't write i18n-aware JS, doesn't particularly know needs/wants
- look to Norbert Lindenberg's articles on his site for API discussion
Concepts
- language tags
- currency codes
- time zones
Language tags
- Examples: fr, en-US, nan-Hant-TW, etc.
- Defined by BCP47, a living collection of RFCs
- Basic syntax: alphanumeric components separated by dashes
- Breaks up into a bunch of subcomponents
- Strings are really the wrong representation for this
- Structured objects would be better
- But but but lowest common denominator :-(
In SpiderMonkey
- fundamental to most Intl interfaces
- aside from syntax-checking, mostly we don't inspect them much
- exceptions
- old-style language tags imply extra data -- we map them to newer forms making that explicit
- language fallback: en-US to en
- Intl APIs provide better controls for what some langtag components control, so some components must be actively removed
Currency codes
- Examples: USD, EUR, AUD
- ISO 4217 provides for maintenance of the full list, and updates to it
- Currencies matter because they affect default formatting: $100.00 versus ¥100 (note decimal places)
- They also show up somewhat in formatting, in the desired "length" of the currency name in the formatted string
In SpiderMonkey
- Currencies matter for determining default number of decimal places
- We maintain a list (imported) of all currencies with non-2 decimal places, and consult it at the right times
- But mostly we hand off the currency to i18n libs and don't think about it
Time zones
- Maintained by IANA
- We'll have to do some updating as they update things
SpiderMonkey's Intl implementation uses ICU
- Internationalization Components for Unicode, ~20 year old collaboration on i18n libraries
- except MSFT, because compatibility
- ICU uses CLDR data (common locale data used by everyone including MSFT)
- ICU has a reputation for being huge, and we cut down on what it builds, but there's not much other option
- MSFT's thing is only available on Windows
- Android system ICU isn't exposed to apps
- we'll probably always ship an embedded, slimmed-down ICU, but Linux distros (e.g.) will probably contribute patches to work against a system ICU
About ICU
- Large and sprawling, but pretty well-documented: user guide, and javadoc-style comments by APIs feed into generated web-hosted docs
- C/C++ API: C considered stable when marked as such (often), C++ uniformly unstable
- ICU APIs tend to return a reasonable value, indicate success/failure via outparam
Intl in SpiderMonkey
- enabled in shell builds by default
- not yet working in cross-compile situations (ICU builds C/C++ files to create utilities to process data tables into usable format), being worked on now
- compile needed to enable fully in Firefox
- some minor changes needed to make --enable/disable-intl-api work as desired still
Code walkthrough
- Intl.cpp, Intl.h
- GlobalObject.cpp
- Intl.js
- IntlData.js, make_intl_data.py
- SelfHosting.cpp
- Tests (test402, Intl) and updating them
Fini