While doing a couple recent reviews, I’ve read lots of code trying to solve the same general problem. Some code wants to store an object of some type (more rarely, an object from among several types), but it can’t do so yet. Some series of operations must occur first: a data structure needing to be put into the right state, or a state machine needing to transition just so. Think of this, perhaps, as the problem of having a conditionally initialized object.
Possibly-unsatisfactory approaches
One solution is to
new the object at the proper time, storing
nullptr prior to that instant. But this imposes the complexity of a heap allocation and handling its potential failure.
Another solution is to use
mozilla::Maybe<T>. But
Maybe must track whether the
T object has been constructed. But that status might already be tracked elsewhere. And
Maybe is potentially twice
T‘s size.
The power approach
Tthe most common solution is to placement-
new the object into raw storage. (This solution is particularly common in code written by especially-knowledgeable developers.) mfbt has historically provided
mozilla::AlignedStorage and
mozilla::AlignedStorage2 to implement such raw storage. Each class has a
Type member
typedef implementing suitable aligned, adequately-sized storage. Bog-standard C++11 offers similar functionality in
std::aligned_storage.
Unfortunately, this approach is extremely easy to subtly misuse. And misuse occurs regularly in Mozilla code, and it can and has triggered crashes.
A detour into the C++ object model
C++ offers very fine-grained, low-level control over memory. It’s absurdly easy to access memory representations with the right casts.
But just because it’s easy to access memory representations, doesn’t mean it’s easy to do it safely. The C++ object model generally restricts you to accessing memory according to the actual type stored there at that instant. You can cast a true
float* to
uint32_t*, but C++ says undefined behavior — literally any behavior at all — may occur if you read from the
uint32_t*. These so-called strict aliasing violations are pernicious not just because anything could happen, but because often exactly what you wanted to happen, happens. Broken code often still “works” — until it unpredictably misbehaves, in C++’s view with absolute justification.
There’s a big exception to the general by-actual-type rule: the
memcpy exception. (Technically it’s a handful of rules that, in concert, permit
memcpy. And there are other, non-
memcpy-related exceptions.)
memcpy(char*, const char*, size_t) has always worked to copy C-compatible objects around without regard to types, and C++’s object model permits this by letting you safely interpret the memory of a
T as
chars or
unsigned chars. If
T is trivially copyable, then:
T t1; char buf[sizeof(T)]; memcpy(buf, &t1, sizeof(T)); // stash bytes away // ...time elapses during execution... memcpy(&t1, buf, sizeof(T)); // restore them // t1 safely has its original value
You can safely copy a
T by the aforementioned character types elsewhere, then back into a
T, and things will work. And second:
T t1, t2; memcpy(&t1, &t2, sizeof(T)); // t1 safely has t2's value
You can safely copy a
T into another
T, and the second
T will have the first
T‘s value.
A C++-compatible placement-
new approach
The placement-
new-into-storage approach looks like this. (Real code would almost always use something more interesting than
double, but this is the gist of it.)
#include <new> // for placement
newstruct ContainsLazyDouble { // Careful: align the storage consistent with the type it'll store. alignas(double) char lazyData[sizeof(double)]; bool hasDouble_; // Indirection through these functions, rather than directly casting //
lazyDatato
double*, evades a buggy GCC -Wstrict-aliasing warning. void* data() { return lazyData; } const void* data() const { return lazyData; } public: ContainsLazyDouble() : hasDouble_(false) {} void init(double d) { new (data()) double(d); hasDouble_ = true; } bool hasDouble() const { return hasDouble_; } double d() const { return *reinterpret_cast<const double*>(data()); } }; ContainsLazyDouble c; // c.d(); // BAD, not initialized as double c.init(3.141592654); c.d(); // OK
This is safe.
c.lazyData was originally
char data, but we allocated a new
double there, so henceforth that memory contains a
double (even though it wasn’t declared as
double), not
chars (even though it was declared that way). The actual type stored in
lazyData at that instant is properly respected.
A C++-incompatible extension of the placement-
new approach
It’s safe to copy a
T by the aforementioned character types. But it’s only safe to do so if 1)
T is trivially copyable, and 2) the copied bytes are interpreted as
T only within an actual
T object. Not into a location to be (re)interpreted as
T, but into a
T. It’s unsafe to copy a
T into a location that doesn’t at that instant contain a
T, then reinterpret it as
T.
So what happens if we use
ContainsLazyDouble‘s implicitly-defined default copy constructor?
ContainsLazyDouble c2 = c;
This default copy constructor copies
ContainsLazyDouble member by member, according to their declared types. So
c.lazyData is copied as a
char array that contains the object representation of
c.d().
c2.lazyData therefore contains the same
char array. But it doesn’t contain an actual
double. It doesn’t matter that those
chars encode a
double: according to C++, that location does not contain a
double.
Dereferencing
reinterpret_cast<const double*>(data()) therefore mis-accesses an array of
chars by the wrong type, triggering undefined behavior.
c2.d() might seem to work if you’re lucky, but C++ doesn’t say it must work.
This is extraordinarily subtle. SpiderMonkey hackers missed this issue in their code until bug 1269319 was debugged and a (partly invalid, on other grounds) GCC compiler bug was filed. Even (more or less) understanding the spec intricacy, I missed this issue in some of the patchwork purporting to fix that bug. (Bug 1341951 provides an actual fix for one of these remaining issues.) Another SpiderMonkey hacker almost introduced another instance of this bug; fortunately I was reviewing the patch and flagged this issue.
Using
AlignedStorage<N>::Type,
AlignedStorage2<T>::Type, or
std::aligned_storage<N>::type doesn’t avoid this problem. We mitigated the problem by deleting
AlignedStorage{,2}::Type‘s copy constructors and assignment operators that would always do actual-type-unaware initialization. (Of course we can’t modify
std::aligned_storage.) But only careful scrutiny prevents other code from
memcpying those types. And
memcpy will copy without respecting the actual type stored there at that instant, too. And in practice, developers do try to use
memcpy for this when copy construction and assignment are forbidden, and reviewers can miss it.
What’s the solution to this problem?
As long as
memcpy and
memmove exist, this very subtle issue can’t be eradicated. There is no silver bullet.
The best solution is don’t hand-roll raw storage. This problem doesn’t exist in
Maybe,
mozilla::Variant,
mozilla::MaybeOneOf,
mozilla::Vector, and other utility classes designed to possibly hold a value. (Sometimes because we just fixed them.)
But if you must hand-roll a solution, construct an object of the actual type into your raw storage. It isn’t enough to copy the bytes of an object of the actual type into raw storage, then treat the storage as that actual type. For example, in
ContainsLazyDouble, a correct copy constructor that respects C++ strict aliasing rules would be:
#include <string.h> // for
memcpy// Add this to ContainsLazyDouble: ContainsLazyDouble(const ContainsLazyDouble& other) : hasDouble_(other.hasDouble_) { if (hasDouble_) { // The only way to allocate a free-floating
T, is to // placement-
newit, usually also invoking
T's copy // constructor. new (data()) double(other.d()); // This would also be valid, if almost pointlessly bizarre — but only // because
doubleis trivially copyable. (It wouldn't be safe // to do this with a type with a user-defined copy constructor, or // virtual functions, or that had to do anything at all to initialize // the new object.) new (data()) double; // creates an uninitialized double memcpy(lazyData, other.lazyData, sizeof(lazyData)); // sets to other.d() } } // ...and this to the using code: ContainsLazyDouble c2 = c; // invokes the now-safe copy constructor
The implicitly-generated copy assignment operator will usually require similar changes — or it can be
= deleted.
Final considerations
AlignedStorage seems like a good idea. But it’s extremely easy to run afoul of a copy operation that doesn’t preserve the actual object type, by the default copy constructor or assignment operator or by
memcpy in entirely-separate code. We’re removing
AlignedStorage{,2} so these classes can’t be misused this way. (The former has just been removed from the tree — the latter has many more users and will be harder to kill.) It’s possible to use them correctly, but misuse is too easy to leave these loaded guns in the tree.
If it’s truly necessary to hand-roll a solution, you should hand-roll all of it, all the way down to the buffer of
unsigned char with an
alignas() attribute. Writing this correctly this is expert-level C++. But it was expert-level C++ even with the aligned-storage helper. You should have to know what you’re doing — and you shouldn’t need the
std::aligned_storage crutch to do it.
Moreover, I hope that the extra complexity of hand-rolling discourages non-expert reviewers from reviewing such code. I’d feel significantly more confident Mozilla won’t repeat this mistake if I knew that every use of (for example)
alignas were reviewed by (say) froydnj or me. Perhaps we can get some Mercurial tooling in place to enforce a review requirement along those lines.
In the meantime, I hope I’ve made the C++ developers out there who read this, at least somewhat aware of this pitfall, and at best competent to avoid it.
Madness! What an insane footgun.
Comment by Robert O'Callahan — 27.02.17 @ 15:11
It’s madness, of a sort. The real problem is it’s not clear how you could eliminate it.
The aim of the C++ object model is to enable writes of distinct types to be safely reordered for optimization purposes, and similar such things. Operations on unrelated types don’t affect each other. But if we suppose that this
char-write into
charactually can affect other types, with no indication to C++ of the existence of such other type, no type is safe from overwriting, in the presence of a
charwrite. When C++ requires a placement-
newto notify the compiler, that’s an explicit signal that the compiler can use as input when performing alias analysis, and in knowing what might really be in that memory at any instant. Without that signal, there’s nothing to distinguish any
charcopy of a series of bytes, from a modification to any type in the program. And then everything must be pessimized, and performance goes in the tank.
We probably wouldn’t be in such a bad situation if there had been (in C originally, ha, ha, ha) better-typed replacements for
memcpyand friends, that informed the compiler what was really being done. And a bunch of helper classes for conditionally-initialized objects of various sorts, that would handle notifying the compiler themselves. Which is a long way of saying the C++ situation is tractable in any particular examined situation, but hopeless when you have to look across the board and at how many places might unwittingly violate the rules.
Comment by Jeff — 27.02.17 @ 15:41