07.06.11

Introducing mozilla/RangedPtr.h: a smart pointer for members of buffers

Tags: , , , , — Jeff @ 15:17

Introduction

Suppose you’re implementing a method which searches for -- as part of an HTML parser:

bool containsDashDash(const char* chars, size_t length)
{
  for (size_t i = 0; i < length; i++)
  {
    if (chars[i] != '-')
      continue;
    if (chars[i + 1] == '-')
      return true;
  }
  return false;
}

But your method contains a bug! Can you spot it?

The buffer bounds-checking problem

The problem with the above implementation is that the bounds-checking is off. If chars doesn’t contain -- but ends with a -, then chars[i + 1] will read past the end of chars. Sometimes that might be harmless; other times it might be an exploitable security vulnerability.

The most obvious way to fix this in C++ is to pass an object which encapsulates both characters and length together, so you can’t use the characters except in accordance with the length. If you were using std::string, for example, accessing characters by [] or at() would generally assert or throw an exception for out-of-bounds access.

For one reason or another, however, you might not want to use an encapsulated type. In a parser, for example, you probably would want to use a pointer to process the input string, because the compiler might not be able to optimize an index into the equivalent pointer.

Is there a way to get “safety” via debug assertions or similar without giving up a pointer interface?

Introducing RangedPtr

We’re talking C++, so of course the answer is yes, and of course the answer is a smart pointer class.

The Mozilla Framework Based on Templates in mozilla-central now includes a RangedPtr<T> class. It’s defined in mfbt/RangedPtr.h and can be #included from mozilla/RangedPtr.h. RangedPtr stores a pointer, and in debug builds it stores start and end pointers fixed at construction time. Operations on the smart pointer — indexing, deriving new pointers through addition or subtraction, dereferencing, &c. — assert in debug builds that they don’t exceed the range specified by the start and end pointers. Indexing and dereferencing are restricted to the half-open range [start, end); new-pointer derivation is restricted to the range [start, end] to permit sentinel pointers. (It’s possible for start == end, although you can’t really do anything with such a pointer.)

The RangedPtr interface is pretty straightforward, supporting these constructors:

#include "mozilla/RangedPtr.h"

int nums[] = { 1, 2, 5, 3 };
RangedPtr<int> p1(nums, nums, nums + 4);
RangedPtr<int> p2(nums, nums, 4); // short for (nums, nums, nums + 4)
RangedPtr<int> p3(nums, 4); // short for (nums, nums, nums + 4)
RangedPtr<int> p4(nums); // short for (nums, length(nums))

RangedPtr<T> supports all the usual actions you’d expect from a pointer — indexing, dereferencing, addition, subtraction, assignment, equality and comparisons, and so on. All methods assert self-correctness as far as is possible in debug builds. RangedPtr<T> differs from T* only in that it doesn’t implicitly convert to T*: use get() method to get the corresponding T*. In addition to being explicit and consistent with nsCOMPtr and nsRefPtr, this will serve as a nudge to consider changing the relevant code to use RangedPtr instead of a raw pointer. But in essence RangedPtr is a pretty easy drop-in replacement for raw pointers in buffers. For example, adjusting containsDashDash to use it to assert in-rangeness is basically a single-line change:

#include "mozilla/RangedPtr.h"

bool containsDashDash(const char* charsRaw, size_t length)
{
  RangedPtr<const char> chars(charsRaw, length);
  for (size_t i = 0; i < length; i++)
  {
    if (chars[i] != '-')
      continue;
    if (chars[i + 1] == '-')
      return true;
  }
  return false;
}

(And to resolve all loose ends, if you wanted containsDashDash to be correct, you’d change the loop to go from 1 rather than 0 and would check chars[i - 1] and chars[i]. Thanks go to Neil in comments for noting this.)

A minor demerit of RangedPtr

RangedPtr is extremely lightweight and should almost always be as efficient as a raw pointer, even as it provides debug-build correctness checking. The sole exception is that, for sadmaking ABI reasons, using RangedPtr<T> as an argument to a method may be slightly less efficient than using a T* (passed-on-the-stack versus passed-in-a-register, to be precise). Most of the time the cost will be negligible, and if the method is inlined there probably won’t be any cost at all, but it’s worth pointing out as a potential concern if performance is super-critical.

Bottom line

Raw pointers into buffers bad, smart RangedPtrs into buffers good. Go forth and use RangedPtr throughout Mozilla code!

06.06.11

I feel the need…the need for JSON parsing correctness and speed!

JSON and SpiderMonkey

JSON is a handy serialization format for passing data between servers and browsers and between independent, cooperating web pages. It’s increasingly the format of choice for website APIs.

ECMAScript 5 (the standard underlying JavaScript) includes built-in support for producing and parsing JSON. SpiderMonkey has included such support since before ES5 added it.

SpiderMonkey’s support, because it predated ES5, hasn’t always agreed with ES5. Also, because JSON support was added before it became ubiquitous on the web, it wasn’t written with raw speed in mind.

Improving JSON.parse

We’ve now improved JSON parsing in Firefox 5 to be fast and fully conformant with ES5. For awhile we’ve made improvements to JSON by piecemeal change. This worked for small bug fixes, and it probably would have worked to fix the remaining conformance bugs. But performance is different: to improve performance we needed to parse in a fundamentally different way. It was time for a rewrite.

What parsing bugs got fixed?

The bugs the new parser fixes are quite small and generally shouldn’t affect sites, in part because other browsers overwhelmingly don’t have these bugs. We’ve had no compatibility reports for these fixes in the month and a half they’ve been in the tree:

  • The number syntax is properly stricter:
    • Octal numbers are now syntax errors.
    • Numbers containing a decimal point must now include a fractional component (i.e. 1. is no longer accepted).
  • JSON.parse("this") now throws a SyntaxError rather than evaluate to true, due to a mistake reusing our keyword parser. (Hysterically, because we used our JSON parser to optimize eval in certain cases, this change means that eval("(this)") will no longer evaluate to true.)
  • Strings can’t contain tab characters: JSON.parse('"\t"') now properly throws a SyntaxError.

This list of changes should be complete, but it’s possible I’ve missed others. Parsing might be a solved problem in the compiler literature, but it’s still pretty complicated. I could have missed lurking bugs in the old parser, and it’s possible (although I think less likely) that I’ve introduced bugs in the new parser.

What about speed?

The new parser is much faster than the old one. Exactly how fast depends on the data you’re parsing. For example, on Opera’s simple parse test, I get around 156000 times/second in Firefox 4, but in Firefox 5 with the new JSON parser I get around 339000 times/second (bigger is better). On a second testcase, Kraken’s JSON.parse test (json-parse-financial, to be precise), I get a 4.0 time of around 140ms and a 5.0 time of around 100ms (smaller is better). (In both cases I’m comparing builds containing far more JavaScript changes than just the new parser, to be sure. But I’m pretty sure the bulk of the performance improvements in these two cases are due to the new parser.) The new JSON parser puts us solidly in the center of the browser pack.

It’ll only get better in the future as we wring even more speed out of SpiderMonkey. After all, on the same system used to generate the above numbers, IE gets around 510000 times/second. I expect further speedup will happen during more generalized performance improvements: improving the speed of defining new properties, improving the speed with which objects are allocated, improving the speed of creating a property name from a string, and so on. As we perform such streamlining, we’ll parse JSON even faster.

Side benefit: better error messages

The parser rewrite also gives JSON.parse better error messages. With the old parser it would have been difficult to provide useful feedback, but in the new parser it’s easy to briefly describe the reason for syntax errors.

js> JSON.parse('{ foo: 17 }'); // unquoted property name
(old) typein:1: SyntaxError: JSON.parse
(new) typein:1: SyntaxError: JSON.parse: expected property name or '}'

We can definitely do more here, perhaps by including context for the error from the provided string, but this is nevertheless a marked improvement over the old parser’s error messages.

Bottom line

JSON.parse in Firefox 5 is faster, follows the spec, and tells you what went wrong if you give it bad data. ’nuff said.

08.05.11

Glory II

On a flight from Detroit to San Jose via Chicago then San Diego (I’m a cheapskate who rarely pays more for a non-stop flight) I briefly saw a glory on Saturday evening:

A glory over a plane wing
A glory against a field of clouds over San Diego
Another view of the glory, to the right of the plane wing
The plane is making a sweeping turn to the left, so here the glory appears further to the right
Continuing the arc, the glory appears further to the right of the wing, which is just barely in the bottom left corner of the picture
One last picture of the glory while the plane continues its arc...
...and a brief video clip of the glory near the end of the plane's arc

I wouldn’t have seen it but for an aborted landing (aborted while still several hundred feet up in the air, not sure why it happened), in which we banked left and ascended above the layer of clouds above San Diego for another pass at landing. Sometimes short delays while flying can be a good thing. The glory was only visible, if you were looking, for perhaps thirty seconds or so until the arc of flight exceeded the optical angle I could see through two windows near me.

This is the second time I’ve seen a glory while flying; the first time was on a flight from Boston to Minneapolis last year.

30.04.11

Washington, D.C., part 6: Predictions and wrapup

(Just started reading? See part 1, part 2, part 3, part 4, and part 5.)

Reading the tea leaves

In some cases it can be reasonably obvious which way the Supreme Court justices lean. Oral argument and questioning in United States v. Stevens and District of Columbia v. Heller, for example, left most observers fairly certain which way the ultimate opinion would rule.

In other cases the future is much more murky. This was the case for both Tapia v. United States and Microsoft v. i4i Limited Partnership, at least in my understanding of the arguments.

In the case of Tapia, I’m only really sure of Justice Scalia’s vote. Justice Sotomayor seemed to lean pretty clearly one direction, but I have no idea if she was merely feeling out the waters on her argument, vocally pushing it to her colleagues, or just testing the arguments presented to her. And I couldn’t say how she might respond to the ultimate assertion made by Tapia (rather, the lawyer who argued for her) that “Congress has spoken” and that her desired outcome (supposing she indeed desires it) had been foreclosed.

In the case of Microsoft I have even less to go upon. I consider this the more technically challenging and complex argument, both for me to understand and for the justices to approach. Much of it went over my head. I suspect more justices will be drawn to the argument that Justice Cardozo described the standard of proof in patent cases in his long-ago opinion, simply based on discussion of it during the argument, and the appeal of referring to it under the concept of stare decisis (that is, to generally stand by prior decisions — although when one makes exceptions, as all justices do, is key to applying the doctrine). That doesn’t bode well for Microsoft. (Particularly because with Chief Justice Roberts’s recusal, Microsoft must count to count to five — but i4i only needs to count to four for their win to be upheld. In that case i4i would win the day without the case setting precedent to conclusively establish a standard of proof in patent litigation.) But I could easily be wrong.

Conclusion

I didn’t know exactly what to expect when I decided to make the trip to D.C. to go to an oral argument. Would it be worth the time to endure a mostly sleepless night, to go to arguments I might well not understand, at the expense of time and money it would take to be there? I was pretty sure the answer was yes (after buying the plane ticket to D.C. I was practically giddy with anticipation, and anyone who knows me knows how rarely I get that excited), but I didn’t know for sure.

Looking back, it was well worth the effort. Getting to see the highest court in the country in session, on matters of strong importance, even if I didn’t fully understand all that was discussed, was a priceless experience. And it was all the better by preparation spent reading briefs and considering the arguments presented. (I strongly recommend doing this if you ever consider visiting.) There’s also something to be said for the experience of just sitting in the line to get in, with people of all varieties all waiting to get in, each with as equal a right as yours to be there. (Well, almost equal: there’s that Supreme Court bar line, but they certainly put in the time for it. Although I have to admit I don’t immediately see a rational relationship between that investment of time, money, and labor and the ability to see arguments more easily.)

Anyway: it was definitely worth doing, and if I have reason to be in the area again in the future at an opportune time, I’ll probably try to do it again.

27.04.11

Washington, D.C., part 5: The arguments

(Just started reading? See part 1, part 2, part 3, and part 4.)

After much line-standing, it was now time to hear actual argument.

For a number of reasons, I’m not going to go into much detail. First, I was pretty sleep-deprived. While I did a reasonable job of concentrating on the arguments, my mind sometimes just wasn’t capable of keeping up with the discussion out of sheer exhaustion. This also means my memory of the arguments is spotty and may have forgotten particular interesting exchanges. Second, while I read petitioner/respondent briefs for both cases, quickly skimmed the United States’s briefs in both, and read a couple amicus briefs of particular interest in Microsoft v. i4i, I’m far from expert in either area of law, so anything I say isn’t going to be the best-informed commentary. Third, if you want full detail, you can always read transcripts and listen to audio (simultaneously, even, thanks to The Oyez Project) from the oral arguments for Tapia v. United States and for Microsoft v. i4i. (Count your blessings: until Chief Justice Roberts joined the Court in 2005, transcripts and audio weren’t released until the end of the term, around the end of June. The Court then started releasing transcripts shortly after argument. And until this year, with very rare exceptions, they didn’t release oral argument audio until the end of the term, whereas now it’s released at the end of the week the argument occurs.)

For the most part, then, I’ll limit discussion to the impressions that stuck with me. Due to lack of time I haven’t gone over the argument transcripts, so this is all raw recollection diluted by a week’s delay in scribing.

General thoughts

The justices sit in high-backed chairs that appear very solemn, rigid, and somber. They are that — when nobody’s sitting in them and testing them. It turns out the chairs recline quite considerably, which diminishes their impressiveness a bit. 🙂 Justice Thomas and Justice Scalia sit next to each other in this iteration of the Court (the chief justice sits in the center, and the remaining justices alternate sides in order of seniority, so they sat on opposite sides before Justice Kagan joined the Court), and they had a tendency to lean back so far in the chairs that the sense of dignity the chairs conveyed was rather disrupted. 🙂 Not that it really makes any difference, of course: they are who they are. Still, it was kind of funny to see that the chairs’ veneer of gravitas wasn’t as deep as the angle they reclined.

Justice Thomas is frequently noted as remaining unusually silent on the bench. At the moment he hasn’t spoken in oral argument in just over five years — to counsel making argument, that is. From time to time he and Scalia would turn their chairs and confer with each other, as a lawyer for one side or the other continued his argument, presumably discussing the case being argued. His involvement’s certainly not passive even if that involvement includes none with counsel making an argument. This is about what I’d read various places before going to argument.

Justice Ginsburg is definitely the most frail-looking justice. Even taking into account her physical position, she seemed to look downward at counsel more than appeared to be necessary, and she seemed to hunch over the presumed papers in front of her when asking questions. The other justices had a physical presence of sorts which Justice Ginsburg seemed to lack. (Again, not that it makes any real difference, she being who she is.)

Some people question the importance of oral argument, suggesting that the written briefs settle the issue, the justices have already made up their minds, and so on. Among the usual arguments against this is that the justices use argument as a sounding board to figure out what their fellow justices think and to in a sense passively-aggressively make their cases to each other. If that happened here, I didn’t have the Court-reading experience to see it, or perhaps these cases were simply not clear-cut enough for it to happen in any obvious way.

Tapia v. United States

Justice Sotomayor asked a number of questions, at various points in the argument, which essentially took the point of view of a district court judge tasked with sentencing. Her questions seemed to suggest that she thought judges should have sentencing discretion to consider rehabilitation programs for the defendant’s own good. She was a trial judge at one time, as I recall, and I found it interesting that, for the first time I can recall reading, (dare I say it?) empathy due to her past experience as a trial judge seemed to strongly affect her approach to a case. I also found it interesting because from what I can remember, historically her position has been generally pro-defendant, and (facially) this sympathy for the judge would run counter to that.

Justice Scalia, well known as an advocate of textualism, repeatedly questioned about how a judge could “recognizing that imprisonment is not an appropriate means of promoting correction and rehabilitation” while lengthening a sentence to rehabilitate. As I recall he drew a laugh once for suggesting a judge could “recognize”…and then say he was going to give a longer sentence anyway. He seemed to find the statutory text pretty clear.

I have no strong memories of the questions and answers of the other justices (save Justice Thomas, who asked no questions), although I’m sure on reading the oral argument transcript more memories would spring to mind.

Microsoft v. i4i

The very first thing I noticed about this case was that, in the very brief recess between this case and Tapia, Chief Justice Roberts had disappeared. This was expected: he’d recused himself because, as I recall, his family owns Microsoft stock. At the same time his disappearance was so abrupt that I didn’t even see him leave, despite making half an effort to watch and see it happen (because I was wondering if the other justices would rearrange their seating around Justice Scalia, who acted as chief justice for the argument).

I found this argument more technical and peppered with citations than the previous one, which made it harder to follow. The justices spent some time talking about RCA v. Radio Engineering Laboratories, Inc., the opinion in which Justice Cardozo made perhaps the strongest statements supporting the clear and convincing standard of proof. It was more time than I’d have expected, given the other opinions that could have been discussed as well.

At one point one counsel cited an opinion made by then-Judge Alito when he had been judge in a lower (circuit, I think?) court. I’m not certain, but I suspect it was the same side that had also cited an opinion of his in their brief. I had to read briefs containing bald assertions that previous cases cut in exactly opposite directions, and skeptic that I was at the time, I found this good for a laugh when reading that brief. Justice Alito seemed to think similarly, as he immediately made a self-deprecating comment about how he wasn’t nearly as smart at the time, or something to that effect. However, I don’t remember thinking he was seriously hinting which way he was leaning in doing so.

When oral argument in Microsoft finished (Microsoft’s counsel still had rebuttal time, as I noticed because the white light on the podium had turned on [indicating five minutes remaining] but the red light [indicating no time] had not), my mind felt overwhelmed with what I’d seen and heard. I knew I’d followed less than half of what had been said, yet at the same time I knew I’d basically achieved my goal: to know enough to be dangerous while hearing the arguments.

Next time, reading the Court’s tea leaves.

« NewerOlder »