Introduction
Suppose you’re implementing a method which searches for --
as part of an HTML parser:
bool containsDashDash(const char* chars, size_t length) { for (size_t i = 0; i < length; i++) { if (chars[i] != '-') continue; if (chars[i + 1] == '-') return true; } return false; }
But your method contains a bug! Can you spot it?
The buffer bounds-checking problem
The problem with the above implementation is that the bounds-checking is off. If chars
doesn’t contain --
but ends with a -
, then chars[i + 1]
will read past the end of chars
. Sometimes that might be harmless; other times it might be an exploitable security vulnerability.
The most obvious way to fix this in C++ is to pass an object which encapsulates both characters and length together, so you can’t use the characters except in accordance with the length. If you were using std::string
, for example, accessing characters by []
or at()
would generally assert or throw an exception for out-of-bounds access.
For one reason or another, however, you might not want to use an encapsulated type. In a parser, for example, you probably would want to use a pointer to process the input string, because the compiler might not be able to optimize an index into the equivalent pointer.
Is there a way to get “safety” via debug assertions or similar without giving up a pointer interface?
Introducing RangedPtr
We’re talking C++, so of course the answer is yes, and of course the answer is a smart pointer class.
The Mozilla Framework Based on Templates in mozilla-central now includes a RangedPtr<T>
class. It’s defined in mfbt/RangedPtr.h
and can be #include
d from mozilla/RangedPtr.h
. RangedPtr
stores a pointer, and in debug builds it stores start and end pointers fixed at construction time. Operations on the smart pointer — indexing, deriving new pointers through addition or subtraction, dereferencing, &c. — assert in debug builds that they don’t exceed the range specified by the start and end pointers. Indexing and dereferencing are restricted to the half-open range [start, end)
; new-pointer derivation is restricted to the range [start, end]
to permit sentinel pointers. (It’s possible for start == end
, although you can’t really do anything with such a pointer.)
The RangedPtr
interface is pretty straightforward, supporting these constructors:
#include "mozilla/RangedPtr.h" int nums[] = { 1, 2, 5, 3 }; RangedPtr<int> p1(nums, nums, nums + 4); RangedPtr<int> p2(nums, nums, 4); // short for (nums, nums, nums + 4) RangedPtr<int> p3(nums, 4); // short for (nums, nums, nums + 4) RangedPtr<int> p4(nums); // short for (nums, length(nums))
RangedPtr<T>
supports all the usual actions you’d expect from a pointer — indexing, dereferencing, addition, subtraction, assignment, equality and comparisons, and so on. All methods assert self-correctness as far as is possible in debug builds. RangedPtr<T>
differs from T*
only in that it doesn’t implicitly convert to T*
: use get()
method to get the corresponding T*
. In addition to being explicit and consistent with nsCOMPtr
and nsRefPtr
, this will serve as a nudge to consider changing the relevant code to use RangedPtr
instead of a raw pointer. But in essence RangedPtr
is a pretty easy drop-in replacement for raw pointers in buffers. For example, adjusting containsDashDash
to use it to assert in-rangeness is basically a single-line change:
#include "mozilla/RangedPtr.h" bool containsDashDash(const char* charsRaw, size_t length) { RangedPtr<const char> chars(charsRaw, length); for (size_t i = 0; i < length; i++) { if (chars[i] != '-') continue; if (chars[i + 1] == '-') return true; } return false; }
(And to resolve all loose ends, if you wanted containsDashDash
to be correct, you’d change the loop to go from 1
rather than 0
and would check chars[i - 1]
and chars[i]
. Thanks go to Neil in comments for noting this.)
A minor demerit of RangedPtr
RangedPtr
is extremely lightweight and should almost always be as efficient as a raw pointer, even as it provides debug-build correctness checking. The sole exception is that, for sadmaking ABI reasons, using RangedPtr<T>
as an argument to a method may be slightly less efficient than using a T*
(passed-on-the-stack versus passed-in-a-register, to be precise). Most of the time the cost will be negligible, and if the method is inlined there probably won’t be any cost at all, but it’s worth pointing out as a potential concern if performance is super-critical.
Bottom line
Raw pointers into buffers bad, smart RangedPtr
s into buffers good. Go forth and use RangedPtr
throughout Mozilla code!