A pitfall in C++ low-level object creation and storage, and how to avoid it

While doing a couple recent reviews, I’ve read lots of code trying to solve the same general problem. Some code wants to store an object of some type (more rarely, an object from among several types), but it can’t do so yet. Some series of operations must occur first: a data structure needing to be put into the right state, or a state machine needing to transition just so. Think of this, perhaps, as the problem of having a conditionally initialized object.

Possibly-unsatisfactory approaches

One solution is to new the object at the proper time, storing nullptr prior to that instant. But this imposes the complexity of a heap allocation and handling its potential failure.

Another solution is to use mozilla::Maybe<T>. But Maybe must track whether the T object has been constructed, even if that status is already tracked elsewhere. And Maybe is potentially twice T‘s size.

The power approach

Tthe most common solution is to placement-new the object into raw storage. (This solution is particularly common in code written by especially-knowledgeable developers.) mfbt has historically provided mozilla::AlignedStorage and mozilla::AlignedStorage2 to implement such raw storage. Each class has a Type member typedef implementing suitable aligned, adequately-sized storage. Bog-standard C++11 offers similar functionality in std::aligned_storage.

Unfortunately, this approach is extremely easy to subtly misuse. And misuse occurs regularly in Mozilla code, and it can and has triggered crashes.

A detour into the C++ object model

C++ offers very fine-grained, low-level control over memory. It’s absurdly easy to access memory representations with the right casts.

But just because it’s easy to access memory representations, doesn’t mean it’s easy to do it safely. The C++ object model generally restricts you to accessing memory according to the actual type stored there at that instant. You can cast a true float* to uint32_t*, but C++ says undefined behavior — literally any behavior at all — may occur if you read from the uint32_t*. These so-called strict aliasing violations are pernicious not just because anything could happen, but because often exactly what you wanted to happen, happens. Broken code often still “works” — until it unpredictably misbehaves, in C++’s view with absolute justification.

A dragon lays waste to men attacking it
Here be dragons (CC-BY-SA, by bagogames)

There’s a big exception to the general by-actual-type rule: the memcpy exception. (Technically it’s a handful of rules that, in concert, permit memcpy. And there are other, non-memcpy-related exceptions.) memcpy(char*, const char*, size_t) has always worked to copy C-compatible objects around without regard to types, and C++’s object model permits this by letting you safely interpret the memory of a T as chars or unsigned chars. If T is trivially copyable, then:

T t1; 
char buf[sizeof(T)];
memcpy(buf, &t1, sizeof(T)); // stash bytes away
// ...time elapses during execution...
memcpy(&t1, buf, sizeof(T)); // restore them
// t1 safely has its original value

You can safely copy a T by the aforementioned character types elsewhere, then back into a T, and things will work. And second:

T t1, t2;
memcpy(&t1, &t2, sizeof(T)); // t1 safely has t2's value

You can safely copy a T into another T, and the second T will have the first T‘s value.

A C++-compatible placement-new approach

The placement-new-into-storage approach looks like this. (Real code would almost always use something more interesting than double, but this is the gist of it.)

#include <new> // for placement new

struct ContainsLazyDouble
    // Careful: align the storage consistent with the type it'll store.
    alignas(double) char lazyData[sizeof(double)];
    bool hasDouble_;

    // Indirection through these functions, rather than directly casting
    // lazyData to double*, evades a buggy GCC -Wstrict-aliasing warning.
    void* data() { return lazyData; }
    const void* data() const { return lazyData; }

    ContainsLazyDouble() : hasDouble_(false) {}

    void init(double d) {
      new (data()) double(d);
      hasDouble_ = true;

    bool hasDouble() const { return hasDouble_; }
    double d() const {
      return *reinterpret_cast<const double*>(data());

ContainsLazyDouble c;
// c.d(); // BAD, not initialized as double
c.d(); // OK

This is safe. c.lazyData was originally char data, but we allocated a new double there, so henceforth that memory contains a double (even though it wasn’t declared as double), not chars (even though it was declared that way). The actual type stored in lazyData at that instant is properly respected.

A C++-incompatible extension of the placement-new approach

It’s safe to copy a T by the aforementioned character types. But it’s only safe to do so if 1) T is trivially copyable, and 2) the copied bytes are interpreted as T only within an actual T object. Not into a location to be (re)interpreted as T, but into a T. It’s unsafe to copy a T into a location that doesn’t at that instant contain a T, then reinterpret it as T.

So what happens if we use ContainsLazyDouble‘s implicitly-defined default copy constructor?

ContainsLazyDouble c2 = c;

This default copy constructor copies ContainsLazyDouble member by member, according to their declared types. So c.lazyData is copied as a char array that contains the object representation of c.d(). c2.lazyData therefore contains the same char array. But it doesn’t contain an actual double. It doesn’t matter that those chars encode a double: according to C++, that location does not contain a double.

Dereferencing reinterpret_cast<const double*>(data()) therefore mis-accesses an array of chars by the wrong type, triggering undefined behavior. c2.d() might seem to work if you’re lucky, but C++ doesn’t say it must work.

This is extraordinarily subtle. SpiderMonkey hackers missed this issue in their code until bug 1269319 was debugged and a (partly invalid, on other grounds) GCC compiler bug was filed. Even (more or less) understanding the spec intricacy, I missed this issue in some of the patchwork purporting to fix that bug. (Bug 1341951 provides an actual fix for one of these remaining issues.) Another SpiderMonkey hacker almost introduced another instance of this bug; fortunately I was reviewing the patch and flagged this issue.

Using AlignedStorage<N>::Type, AlignedStorage2<T>::Type, or std::aligned_storage<N>::type doesn’t avoid this problem. We mitigated the problem by deleting AlignedStorage{,2}::Type‘s copy constructors and assignment operators that would always do actual-type-unaware initialization. (Of course we can’t modify std::aligned_storage.) But only careful scrutiny prevents other code from memcpying those types. And memcpy will copy without respecting the actual type stored there at that instant, too. And in practice, developers do try to use memcpy for this when copy construction and assignment are forbidden, and reviewers can miss it.

What’s the solution to this problem?

As long as memcpy and memmove exist, this very subtle issue can’t be eradicated. There is no silver bullet.

The best solution is don’t hand-roll raw storage. This problem doesn’t exist in Maybe, mozilla::Variant, mozilla::MaybeOneOf, mozilla::Vector, and other utility classes designed to possibly hold a value. (Sometimes because we just fixed them.)

But if you must hand-roll a solution, construct an object of the actual type into your raw storage. It isn’t enough to copy the bytes of an object of the actual type into raw storage, then treat the storage as that actual type. For example, in ContainsLazyDouble, a correct copy constructor that respects C++ strict aliasing rules would be:

#include <string.h> // for memcpy

// Add this to ContainsLazyDouble:
ContainsLazyDouble(const ContainsLazyDouble& other)
  : hasDouble_(other.hasDouble_)
  if (hasDouble_)
    // The only way to allocate a free-floating T, is to
    // placement-new it, usually also invoking T's copy
    // constructor.
    new (data()) double(other.d());

    // This would also be valid, if almost pointlessly bizarre — but only
    // because double is trivially copyable.  (It wouldn't be safe
    // to do this with a type with a user-defined copy constructor, or
    // virtual functions, or that had to do anything at all to initialize
    // the new object.)
    new (data()) double; // creates an uninitialized double
    memcpy(lazyData, other.lazyData, sizeof(lazyData)); // sets to other.d()

// ...and this to the using code:
ContainsLazyDouble c2 = c; // invokes the now-safe copy constructor

The implicitly-generated copy assignment operator will usually require similar changes — or it can be = deleted.

Final considerations

AlignedStorage seems like a good idea. But it’s extremely easy to run afoul of a copy operation that doesn’t preserve the actual object type, by the default copy constructor or assignment operator or by memcpy in entirely-separate code. We’re removing AlignedStorage{,2} so these classes can’t be misused this way. (The former has just been removed from the tree — the latter has many more users and will be harder to kill.) It’s possible to use them correctly, but misuse is too easy to leave these loaded guns in the tree.

If it’s truly necessary to hand-roll a solution, you should hand-roll all of it, all the way down to the buffer of unsigned char with an alignas() attribute. Writing this correctly this is expert-level C++. But it was expert-level C++ even with the aligned-storage helper. You should have to know what you’re doing — and you shouldn’t need the std::aligned_storage crutch to do it.

Moreover, I hope that the extra complexity of hand-rolling discourages non-expert reviewers from reviewing such code. I’d feel significantly more confident Mozilla won’t repeat this mistake if I knew that every use of (for example) alignas were reviewed by (say) froydnj or me. Perhaps we can get some Mercurial tooling in place to enforce a review requirement along those lines.

In the meantime, I hope I’ve made the C++ developers out there who read this, at least somewhat aware of this pitfall, and at best competent to avoid it.


mozilla/PodOperations.h: functions for zeroing, assigning to, copying, and comparing plain old data objects

Tags: , , , , , , , — Jeff @ 13:20

Recently I introduced the new header mozilla/PodOperations.h to mfbt, moving its contents out of SpiderMonkey so for general use. This header makes various operations on memory for objects easier and safer.

The problem

Often in C or C++ one might want to set the contents of an object to zero — perhaps to initialize it:

mMetrics = new gfxFont::Metrics;
::memset(mMetrics, 0, sizeof(*mMetrics));

Or perhaps the same might need to be done for a range of objects:

memset(mTreeData.Elements(), 0, mTreeData.Length() * sizeof(mTreeData[0]));

Or perhaps one might want to set the contents of an object to those of another object:

memcpy(&e, buf, sizeof(e));

Or perhaps a range of objects must be copied:

memcpy(to + aOffset, aBuffer, aLength * sizeof(PRUnichar));

Or perhaps a range of objects must be memory-equivalence-compared:

return memcmp(k->chars(), l->chars(), k->length() * sizeof(jschar)) == 0;

What do all these cases have in common? They all require using a sizeof() operation.

The problem

C and C++, as low-level languages very much focused on the actual memory, place great importance in the size of an object. Programmers often think much less about sizes. It’s pretty easy to write code without having to think about memory. But some cases require it, and because it doesn’t happen regularly, it’s easy to make mistakes. Even experienced programmers can screw it up if they don’t think carefully.

This is particularly likely for operations on arrays of objects. If the object’s size isn’t 1, forgetting a sizeof means an array of objects might not be completely cleared, copied, or compared. This has led to Mozilla security bugs in the past. (Although, the best I can find now is bug 688877, which doesn’t use quite the same operations, and can’t be solved with these methods, but which demonstrates the same sort of issue.)

The solution

Using the prodigious magic of C++ templates, the new mfbt/PodOperations.h abstracts away the sizeof in all the above examples, implements bounds-checking assertions as appropriate, and is type-safe (doesn’t require implicit casts to void*).

  • Zeroing
    • PodZero(T* t): set the contents of *t to 0
    • PodZero(T* t, size_t count): set the contents of count elements starting at t to 0
    • PodArrayZero(T (&t)[N]): set the contents of the array t (with a compile-time size) to 0
  • Assigning
    • PodAssign(T* dst, const T* src): set the contents of *dst to *src — locations can’t overlap (no self-assignments)
  • Copying
    • PodCopy(T* dst, const T* src, size_t count): copy count elements starting at src to dst — ranges can’t overlap
  • Comparison
    • PodEqual(const T* one, const T* two, size_t len): true or false indicating whether len elements at one are memory-identical to len elements at two

Random questions

Why “Pod”?

POD is a C++ term of art abbreviation for “plain old data”. A type that’s plain old data is, roughly: a built-in type; a pointer or enum that’s represented like a built-in type; a user-defined class without any virtual methods or inheritance or user-defined constructors or destructors (including in any of its base classes), whose non-static members are themselves plain old data; or an array of a type that’s plain old data. (There are a couple other restrictions that don’t matter here and would take too long to explain anyway.)

One implication of a type being POD is that (systemic interactions aside) you can copy an object of that type using memcpy. The file and method names simply play on that. Arguably it’s not the best, clearest term in the world — especially as these methods aren’t restricted to POD types. (One intended use is for initializing classes that are non-POD, where the initial state is fully-zeroed.) But it roughly gets the job done, no better names quickly spring to mind, and renaming would have been pain without much gain.

What are all these “optimizations” in these methods?

When these operations were added to SpiderMonkey a few years ago, various people (principally Luke, if I remember right) benchmarked these operations when used in various places in SpiderMonkey. It turned out that “trivial” uses of memcmp, &c. wouldn’t always be optimally compiled by the compiler to fast, SIMD-optimizable loops. Thus we introduced special cases. Newer compilers may do better, such that we have less need for the optimizations. But the worst that happens with them is slightly slower code — not correctness bugs. If you have real-world data (inchoate fears don’t count 🙂 ) showing these optimizations aren’t needed now, file a bug and we can adapt them as needed.