06.06.11

I feel the need…the need for JSON parsing correctness and speed!

JSON and SpiderMonkey

JSON is a handy serialization format for passing data between servers and browsers and between independent, cooperating web pages. It’s increasingly the format of choice for website APIs.

ECMAScript 5 (the standard underlying JavaScript) includes built-in support for producing and parsing JSON. SpiderMonkey has included such support since before ES5 added it.

SpiderMonkey’s support, because it predated ES5, hasn’t always agreed with ES5. Also, because JSON support was added before it became ubiquitous on the web, it wasn’t written with raw speed in mind.

Improving JSON.parse

We’ve now improved JSON parsing in Firefox 5 to be fast and fully conformant with ES5. For awhile we’ve made improvements to JSON by piecemeal change. This worked for small bug fixes, and it probably would have worked to fix the remaining conformance bugs. But performance is different: to improve performance we needed to parse in a fundamentally different way. It was time for a rewrite.

What parsing bugs got fixed?

The bugs the new parser fixes are quite small and generally shouldn’t affect sites, in part because other browsers overwhelmingly don’t have these bugs. We’ve had no compatibility reports for these fixes in the month and a half they’ve been in the tree:

  • The number syntax is properly stricter:
    • Octal numbers are now syntax errors.
    • Numbers containing a decimal point must now include a fractional component (i.e. 1. is no longer accepted).
  • JSON.parse("this") now throws a SyntaxError rather than evaluate to true, due to a mistake reusing our keyword parser. (Hysterically, because we used our JSON parser to optimize eval in certain cases, this change means that eval("(this)") will no longer evaluate to true.)
  • Strings can’t contain tab characters: JSON.parse('"\t"') now properly throws a SyntaxError.

This list of changes should be complete, but it’s possible I’ve missed others. Parsing might be a solved problem in the compiler literature, but it’s still pretty complicated. I could have missed lurking bugs in the old parser, and it’s possible (although I think less likely) that I’ve introduced bugs in the new parser.

What about speed?

The new parser is much faster than the old one. Exactly how fast depends on the data you’re parsing. For example, on Opera’s simple parse test, I get around 156000 times/second in Firefox 4, but in Firefox 5 with the new JSON parser I get around 339000 times/second (bigger is better). On a second testcase, Kraken’s JSON.parse test (json-parse-financial, to be precise), I get a 4.0 time of around 140ms and a 5.0 time of around 100ms (smaller is better). (In both cases I’m comparing builds containing far more JavaScript changes than just the new parser, to be sure. But I’m pretty sure the bulk of the performance improvements in these two cases are due to the new parser.) The new JSON parser puts us solidly in the center of the browser pack.

It’ll only get better in the future as we wring even more speed out of SpiderMonkey. After all, on the same system used to generate the above numbers, IE gets around 510000 times/second. I expect further speedup will happen during more generalized performance improvements: improving the speed of defining new properties, improving the speed with which objects are allocated, improving the speed of creating a property name from a string, and so on. As we perform such streamlining, we’ll parse JSON even faster.

Side benefit: better error messages

The parser rewrite also gives JSON.parse better error messages. With the old parser it would have been difficult to provide useful feedback, but in the new parser it’s easy to briefly describe the reason for syntax errors.

js> JSON.parse('{ foo: 17 }'); // unquoted property name
(old) typein:1: SyntaxError: JSON.parse
(new) typein:1: SyntaxError: JSON.parse: expected property name or '}'

We can definitely do more here, perhaps by including context for the error from the provided string, but this is nevertheless a marked improvement over the old parser’s error messages.

Bottom line

JSON.parse in Firefox 5 is faster, follows the spec, and tells you what went wrong if you give it bad data. ’nuff said.

10 Comments »

  1. The speed stuff made me go “ooh cool”.

    The better error messages made me go “OMFG YES!”

    I had a problem I eventually traced to a typo (well generator failed) in a giant JSON response, but it took longer than it should have to find. So I’m slightly bias in my enthusiasm in seeing improvements in error messages.

    Comment by Robert Accettura — 06.06.11 @ 17:53

  2. Just to satisfy my curiosity:

    1. You said the original parser was not written with speed in mind. Considering that a 2x speedup doesn’t seem like a lot at all. Is it possible to still speed this up considerably? Perhaps speed was not your main focus this time.

    2. Is there really any application where JSON parse speed is the bottleneck or would make a significant difference … ? It seems somewhat unlikely.

    Comment by Habakuk — 06.06.11 @ 19:27

  3. I reckon a recursive descent parser would be faster. I did some profiling and there are a couple of unpredictable switches on the parser state that I’m sure hurt, and which a recursive descent parser would avoid. But still, what you’ve done is a vast improvement, so thanks for that!

    Comment by njn — 06.06.11 @ 22:05

  4. It’s unlikely there’s significantly more speed to be wrung out of it, except by speeding up those more-general paths I mentioned in the post. The loops in the parser are tight, the instructions are carefully interwoven to minimize code size, and branches are organized such that there’s quite simple control flow which should be reasonably branch-predictable. The big complexity is in the calls escaping the parser: to add properties to objects (adding indexed properties to arrays is quite fast and doesn’t suffer), to create property names (we have to take a lock to do it, which hurts), and of course to allocate memory with our not-yet-fully-tuned garbage collector. Those aspects can be sped up, and when that happens, parsing will speed up too. But the core algorithm’s basically fully optimized. njn mentions a couple switch statements that aren’t fully predictable, which could be avoided with a series of ifs (but would they be any faster? and they’d mean some duplication of code, I think), but beyond that I’m not seeing much in the parser itself.

    You’d have to throw a fair bit of data at a JSON parser for parser speed to matter, but it’s not completely out of the question. tinderboxpushlog, say, can use fairly chunky JSON depending on tree activity level — not to the point of dominating performance, perhaps, but certainly to the point of showing up as one of the smaller fractions of overall processing. And I wouldn’t presume that current JSON uses will be representative of where future JSON uses will go, as far as overall amount goes.

    Comment by Jeff — 06.06.11 @ 23:17

  5. @Robert Accettura, you can find the location of problems in JSON much easier with validators such as this one.

    Comment by Craig — 07.06.11 @ 17:15

  6. Strings can’t contain tab characters: JSON.parse(‘”\t”‘) now properly throws a SyntaxError.

    Firefox 4 also throw SyntaxError for JSON.parse(‘”\t”‘). I think other changes are for Firefox 5 or later but this is already implemented for Firefox 4 isn’t it?

    Comment by dynamis — 14.06.11 @ 22:17

  7. I don’t have a readily-available copy of 4.0 to check (and am too lazy to check source), but I’m pretty sure it wasn’t. We definitely had an open bug on it until I closed it after checking in the new parser, and I’m pretty sure that bug wasn’t out of date on the point. Prior to this the JSON parser didn’t change very often, and I’d been watching changes as they filtered in for awhile and remember no such change.

    Comment by Jeff — 15.06.11 @ 09:59

  8. JSON parsing in FF5 has definitely become more whiny. I can no longer parse {“x”:04.22}. It wants the number as 4.22 not 04.22. Why-oh-why did that bother you??????? I have absolute TONS of data pretty-formatted like that and nothing in my apps complained until FF5. And I don’t see why it should complain! I certainly don’t call that progress and this is very infuriating!

    Comment by Trian — 25.06.11 @ 13:30

  9. That’s simply not JSON. Per RFC 4627, §2.4 Numbers, “Leading zeros are not allowed.” And more specifically, concerning ECMAScript/JavaScript, a number in JSON begins with a DecimalIntegerLiteral, which is either 0 or NonZeroDigit DecimalDigitsopt. That explicitly disallows leading zeroes as well.

    Beyond pure spec concerns, I find it interesting that you say that “nothing in my apps complained until FF5”. Did you and your users never use other browsers? WebKit’s Nitro JS engine rejects leading zeroes:

    > JSON.parse('{"x":04.22}')
    SyntaxError: Unable to parse JSON string
    

    Opera’s JS engine rejects leading zeroes:

    > JSON.parse('{"x":04.22}')
    SyntaxError: JSON.parse: Illegal number format (excessive leading 0): 04.22}
    

    Google’s v8 JS engine, used in Chrome, rejects leading zeroes:

    d8> JSON.parse('{"x":04.22}')
    undefined:1: SyntaxError: Unexpected token ILLEGAL
    {"x":04.22}
         ^
    

    IE is the only browser I can find which currently accepts leading zeroes. But IE alone is not reason to defect, particularly so when we are considering an interchange format whose precise details are important and must be identically respected by all parties. IE’s behavior is a bug, and I believe in due time they too will fix it, particularly once test262 contains tests for JSON numeric literal syntax. (It seems not to, now, but I looked only cursorily.)

    Comment by Jeff — 25.06.11 @ 13:56

  10. @Jeff: Yes, the app I was referring to is an intranet-wide app with all users running FF3/FF4. It only came up when the json files that the users were using as data files (think drag and drop in a web-app) started giving them errors.

    Nevertheless, so far I wasn’t that concerned about strictness and standards compliance. If it worked in my deployment environment, that was ok with me. I sort-of allowed the (browser’s) implementation to define the (JSON) standard. My bad, I know, but on the other hand this kind of standards-adherence out of the blue is a bit too much, methinks…

    Comment by Trian — 29.06.11 @ 07:55

RSS feed for comments on this post. TrackBack URI

Leave a comment

HTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>