I feel the need…the need for JSON parsing correctness and speed!

JSON and SpiderMonkey

JSON is a handy serialization format for passing data between servers and browsers and between independent, cooperating web pages. It’s increasingly the format of choice for website APIs.

ECMAScript 5 (the standard underlying JavaScript) includes built-in support for producing and parsing JSON. SpiderMonkey has included such support since before ES5 added it.

SpiderMonkey’s support, because it predated ES5, hasn’t always agreed with ES5. Also, because JSON support was added before it became ubiquitous on the web, it wasn’t written with raw speed in mind.

Improving JSON.parse

We’ve now improved JSON parsing in Firefox 5 to be fast and fully conformant with ES5. For awhile we’ve made improvements to JSON by piecemeal change. This worked for small bug fixes, and it probably would have worked to fix the remaining conformance bugs. But performance is different: to improve performance we needed to parse in a fundamentally different way. It was time for a rewrite.

What parsing bugs got fixed?

The bugs the new parser fixes are quite small and generally shouldn’t affect sites, in part because other browsers overwhelmingly don’t have these bugs. We’ve had no compatibility reports for these fixes in the month and a half they’ve been in the tree:

  • The number syntax is properly stricter:
    • Octal numbers are now syntax errors.
    • Numbers containing a decimal point must now include a fractional component (i.e. 1. is no longer accepted).
  • JSON.parse("this") now throws a SyntaxError rather than evaluate to true, due to a mistake reusing our keyword parser. (Hysterically, because we used our JSON parser to optimize eval in certain cases, this change means that eval("(this)") will no longer evaluate to true.)
  • Strings can’t contain tab characters: JSON.parse('"\t"') now properly throws a SyntaxError.

This list of changes should be complete, but it’s possible I’ve missed others. Parsing might be a solved problem in the compiler literature, but it’s still pretty complicated. I could have missed lurking bugs in the old parser, and it’s possible (although I think less likely) that I’ve introduced bugs in the new parser.

What about speed?

The new parser is much faster than the old one. Exactly how fast depends on the data you’re parsing. For example, on Opera’s simple parse test, I get around 156000 times/second in Firefox 4, but in Firefox 5 with the new JSON parser I get around 339000 times/second (bigger is better). On a second testcase, Kraken’s JSON.parse test (json-parse-financial, to be precise), I get a 4.0 time of around 140ms and a 5.0 time of around 100ms (smaller is better). (In both cases I’m comparing builds containing far more JavaScript changes than just the new parser, to be sure. But I’m pretty sure the bulk of the performance improvements in these two cases are due to the new parser.) The new JSON parser puts us solidly in the center of the browser pack.

It’ll only get better in the future as we wring even more speed out of SpiderMonkey. After all, on the same system used to generate the above numbers, IE gets around 510000 times/second. I expect further speedup will happen during more generalized performance improvements: improving the speed of defining new properties, improving the speed with which objects are allocated, improving the speed of creating a property name from a string, and so on. As we perform such streamlining, we’ll parse JSON even faster.

Side benefit: better error messages

The parser rewrite also gives JSON.parse better error messages. With the old parser it would have been difficult to provide useful feedback, but in the new parser it’s easy to briefly describe the reason for syntax errors.

js> JSON.parse('{ foo: 17 }'); // unquoted property name
(old) typein:1: SyntaxError: JSON.parse
(new) typein:1: SyntaxError: JSON.parse: expected property name or '}'

We can definitely do more here, perhaps by including context for the error from the provided string, but this is nevertheless a marked improvement over the old parser’s error messages.

Bottom line

JSON.parse in Firefox 5 is faster, follows the spec, and tells you what went wrong if you give it bad data. ’nuff said.


SpiderMonkey JSON change: trailing commas no longer accepted

Historically, SpiderMonkey’s JSON implementation has accepted input containing trailing commas:

JSON.parse('[1, 2, 3, ]');
JSON.parse('{ "1": 2, }");

We did so because the JSON RFC permitted implementations to accept extensions, and trailing commas are nice for humans to be able to use. The down side is that accepting extra syntax like this makes interoperability harder: implementations which don’t implement the same extension, for reasons every bit as valid as those of implementations allowing the extension, are disadvantaged. ES5 weighed these concerns and chose to precisely specify permissible JSON syntax, putting everyone on the same page: trailing commas are not permitted. Therefore, the examples above should throw a SyntaxError per ES5.

SpiderMonkey has now been changed to conform to ES5 on this point: trailing commas are syntax errors. If you still need to accept trailing commas, you should use a custom implementation that accepts them — but best would be for you to adjust the processes that produce JSON strings including trailing commas to not include them. (If you are an extension or are in privileged code, for the moment you can use nsIJSON.legacyDecode to continue to accept trailing commas. However, note that we have added it only to accommodate legacy code in the process of being updated to no longer generate faulty input, and it will be removed sometime in the future.)

You can experiment with a version of Firefox with this change by downloading a TraceMonkey branch nightly; this change should also make its way into mozilla-central nightlies shortly, if you’d rather stick to trunk builds. (Don’t forget to use the profile manager if you want to keep the settings you use with your primary Firefox installation pristine.)


Yet another post which planet.mozilla.org and web-tech readers may ignore

Tags: , , , — Jeff @ 17:16

I wrote another web-tech blog post:

Object and array initializers should not invoke setters when evaluated

This post is duplicative if you follow other Mozilla news sources, and if you don’t and aren’t a web developer you probably won’t have the background to understand much of the content at the link, but it’s still quite representative of some of what I’ve worked on recently if you happen to be interested and hadn’t otherwise encountered the article.