Introducing mozilla/RangedPtr.h: a smart pointer for members of buffers

Tags: , , , , — Jeff @ 15:17


Suppose you’re implementing a method which searches for -- as part of an HTML parser:

bool containsDashDash(const char* chars, size_t length)
  for (size_t i = 0; i < length; i++)
    if (chars[i] != '-')
    if (chars[i + 1] == '-')
      return true;
  return false;

But your method contains a bug! Can you spot it?

The buffer bounds-checking problem

The problem with the above implementation is that the bounds-checking is off. If chars doesn’t contain -- but ends with a -, then chars[i + 1] will read past the end of chars. Sometimes that might be harmless; other times it might be an exploitable security vulnerability.

The most obvious way to fix this in C++ is to pass an object which encapsulates both characters and length together, so you can’t use the characters except in accordance with the length. If you were using std::string, for example, accessing characters by [] or at() would generally assert or throw an exception for out-of-bounds access.

For one reason or another, however, you might not want to use an encapsulated type. In a parser, for example, you probably would want to use a pointer to process the input string, because the compiler might not be able to optimize an index into the equivalent pointer.

Is there a way to get “safety” via debug assertions or similar without giving up a pointer interface?

Introducing RangedPtr

We’re talking C++, so of course the answer is yes, and of course the answer is a smart pointer class.

The Mozilla Framework Based on Templates in mozilla-central now includes a RangedPtr<T> class. It’s defined in mfbt/RangedPtr.h and can be #included from mozilla/RangedPtr.h. RangedPtr stores a pointer, and in debug builds it stores start and end pointers fixed at construction time. Operations on the smart pointer — indexing, deriving new pointers through addition or subtraction, dereferencing, &c. — assert in debug builds that they don’t exceed the range specified by the start and end pointers. Indexing and dereferencing are restricted to the half-open range [start, end); new-pointer derivation is restricted to the range [start, end] to permit sentinel pointers. (It’s possible for start == end, although you can’t really do anything with such a pointer.)

The RangedPtr interface is pretty straightforward, supporting these constructors:

#include "mozilla/RangedPtr.h"

int nums[] = { 1, 2, 5, 3 };
RangedPtr<int> p1(nums, nums, nums + 4);
RangedPtr<int> p2(nums, nums, 4); // short for (nums, nums, nums + 4)
RangedPtr<int> p3(nums, 4); // short for (nums, nums, nums + 4)
RangedPtr<int> p4(nums); // short for (nums, length(nums))

RangedPtr<T> supports all the usual actions you’d expect from a pointer — indexing, dereferencing, addition, subtraction, assignment, equality and comparisons, and so on. All methods assert self-correctness as far as is possible in debug builds. RangedPtr<T> differs from T* only in that it doesn’t implicitly convert to T*: use get() method to get the corresponding T*. In addition to being explicit and consistent with nsCOMPtr and nsRefPtr, this will serve as a nudge to consider changing the relevant code to use RangedPtr instead of a raw pointer. But in essence RangedPtr is a pretty easy drop-in replacement for raw pointers in buffers. For example, adjusting containsDashDash to use it to assert in-rangeness is basically a single-line change:

#include "mozilla/RangedPtr.h"

bool containsDashDash(const char* charsRaw, size_t length)
  RangedPtr<const char> chars(charsRaw, length);
  for (size_t i = 0; i < length; i++)
    if (chars[i] != '-')
    if (chars[i + 1] == '-')
      return true;
  return false;

(And to resolve all loose ends, if you wanted containsDashDash to be correct, you’d change the loop to go from 1 rather than 0 and would check chars[i - 1] and chars[i]. Thanks go to Neil in comments for noting this.)

A minor demerit of RangedPtr

RangedPtr is extremely lightweight and should almost always be as efficient as a raw pointer, even as it provides debug-build correctness checking. The sole exception is that, for sadmaking ABI reasons, using RangedPtr<T> as an argument to a method may be slightly less efficient than using a T* (passed-on-the-stack versus passed-in-a-register, to be precise). Most of the time the cost will be negligible, and if the method is inlined there probably won’t be any cost at all, but it’s worth pointing out as a potential concern if performance is super-critical.

Bottom line

Raw pointers into buffers bad, smart RangedPtrs into buffers good. Go forth and use RangedPtr throughout Mozilla code!


  1. Does this only assert in debug builds, or does it enforce bounds on release builds as well?

    Comment by Brion — 07.06.11 @ 18:21

  2. It only asserts in debug builds. In release builds it is functionally identical to the pointer value it represents. It’s designed to detect errors during the writing of a patch and during testing, not to be a hard guarantee of proper use or safe failure. We have a lot of good testers who I trust to find errors if they exist — this is just a matter of making sure those errors can’t be missed (say, because in practice the overread reads from a location with a predictable value).

    Comment by Jeff — 07.06.11 @ 20:15

  3. Does this assert fatally?

    Comment by Kyle Huey — 07.06.11 @ 21:23

  4. An assertion is by definition fatal. 😛

    Comment by Jeff — 07.06.11 @ 21:29

  5. length - 1 isn’t the complete fix, because length might be 0, but you could start the loop at 1 and use chars[i - 1].

    Comment by Neil Rashbrook — 08.06.11 @ 03:06

  6. I had a quick peek at the source, and you have gone overboard with the sanity checks; two methods actually duplicate work, and there was at least one other redundant check. Also the this->operator= stuff was confusing, but (for instance) operator--() really doesn’t have to be that complicated.

    Comment by Neil Rashbrook — 08.06.11 @ 03:48

  7. Sorry, I meant to refer to operator--() [there is of course no operator-()].

    Comment by Neil Rashbrook — 08.06.11 @ 03:54

  8. Oh, it’s not me, it’s your blog software, should I have written operator--()?

    Comment by Neil Rashbrook — 08.06.11 @ 03:55

  9. Fair point about non-zero length, although I could perhaps claim that the user shouldn’t be checking for dashes in a sequence of no characters, since I didn’t make the contract clear. :-)

    Most of the duplication I remember is amongst the constructors, but that’s unavoidable if rangeStart and rangeEnd are to be const members. You’re probably right that not explicitly invoking operator= would be clearer. I’ll probably fix that in future adjustments (I’ll need operator! and operator bool in order to use RangedPtr as the return value of JSString::getChars, for starters).

    WordPress likes to convert double-dashes into em dashes, as you’ve discovered. I only discovered when writing this post that it won’t convert if it’s inside a <code> tag, which avoids the problem but is not particularly discoverable.

    Comment by Jeff — 08.06.11 @ 08:35

RSS feed for comments on this post. TrackBack URI

Leave a comment

HTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>