While doing a couple recent reviews, I’ve read lots of code trying to solve the same general problem. Some code wants to store an object of some type (more rarely, an object from among several types), but it can’t do so yet. Some series of operations must occur first: a data structure needing to be put into the right state, or a state machine needing to transition just so. Think of this, perhaps, as the problem of having a conditionally initialized object.
Possibly-unsatisfactory approaches
One solution is to new
the object at the proper time, storing nullptr
prior to that instant. But this imposes the complexity of a heap allocation and handling its potential failure.
Another solution is to use mozilla::Maybe<T>
. But Maybe
must track whether the T
object has been constructed, even if that status is already tracked elsewhere. And Maybe
is potentially twice T
‘s size.
The power approach
Tthe most common solution is to placement-new
the object into raw storage. (This solution is particularly common in code written by especially-knowledgeable developers.) mfbt has historically provided mozilla::AlignedStorage
and mozilla::AlignedStorage2
to implement such raw storage. Each class has a Type
member typedef
implementing suitable aligned, adequately-sized storage. Bog-standard C++11 offers similar functionality in std::aligned_storage
.
Unfortunately, this approach is extremely easy to subtly misuse. And misuse occurs regularly in Mozilla code, and it can and has triggered crashes.
A detour into the C++ object model
C++ offers very fine-grained, low-level control over memory. It’s absurdly easy to access memory representations with the right casts.
But just because it’s easy to access memory representations, doesn’t mean it’s easy to do it safely. The C++ object model generally restricts you to accessing memory according to the actual type stored there at that instant. You can cast a true float*
to uint32_t*
, but C++ says undefined behavior — literally any behavior at all — may occur if you read from the uint32_t*
. These so-called strict aliasing violations are pernicious not just because anything could happen, but because often exactly what you wanted to happen, happens. Broken code often still “works” — until it unpredictably misbehaves, in C++’s view with absolute justification.
There’s a big exception to the general by-actual-type rule: the memcpy
exception. (Technically it’s a handful of rules that, in concert, permit memcpy
. And there are other, non-memcpy
-related exceptions.) memcpy(char*, const char*, size_t)
has always worked to copy C-compatible objects around without regard to types, and C++’s object model permits this by letting you safely interpret the memory of a T
as char
s or unsigned char
s. If T
is trivially copyable, then:
T t1; char buf[sizeof(T)]; memcpy(buf, &t1, sizeof(T)); // stash bytes away // ...time elapses during execution... memcpy(&t1, buf, sizeof(T)); // restore them // t1 safely has its original value
You can safely copy a T
by the aforementioned character types elsewhere, then back into a T
, and things will work. And second:
T t1, t2; memcpy(&t1, &t2, sizeof(T)); // t1 safely has t2's value
You can safely copy a T
into another T
, and the second T
will have the first T
‘s value.
A C++-compatible placement-new
approach
The placement-new
-into-storage approach looks like this. (Real code would almost always use something more interesting than double
, but this is the gist of it.)
#include <new> // for placementnew
struct ContainsLazyDouble { // Careful: align the storage consistent with the type it'll store. alignas(double) char lazyData[sizeof(double)]; bool hasDouble_; // Indirection through these functions, rather than directly casting //lazyData
todouble*
, evades a buggy GCC -Wstrict-aliasing warning. void* data() { return lazyData; } const void* data() const { return lazyData; } public: ContainsLazyDouble() : hasDouble_(false) {} void init(double d) { new (data()) double(d); hasDouble_ = true; } bool hasDouble() const { return hasDouble_; } double d() const { return *reinterpret_cast<const double*>(data()); } }; ContainsLazyDouble c; // c.d(); // BAD, not initialized as double c.init(3.141592654); c.d(); // OK
This is safe. c.lazyData
was originally char
data, but we allocated a new double
there, so henceforth that memory contains a double
(even though it wasn’t declared as double
), not char
s (even though it was declared that way). The actual type stored in lazyData
at that instant is properly respected.
A C++-incompatible extension of the placement-new
approach
It’s safe to copy a T
by the aforementioned character types. But it’s only safe to do so if 1) T
is trivially copyable, and 2) the copied bytes are interpreted as T
only within an actual T
object. Not into a location to be (re)interpreted as T
, but into a T
. It’s unsafe to copy a T
into a location that doesn’t at that instant contain a T
, then reinterpret it as T
.
So what happens if we use ContainsLazyDouble
‘s implicitly-defined default copy constructor?
ContainsLazyDouble c2 = c;
This default copy constructor copies ContainsLazyDouble
member by member, according to their declared types. So c.lazyData
is copied as a char
array that contains the object representation of c.d()
. c2.lazyData
therefore contains the same char
array. But it doesn’t contain an actual double
. It doesn’t matter that those char
s encode a double
: according to C++, that location does not contain a double
.
Dereferencing reinterpret_cast<const double*>(data())
therefore mis-accesses an array of char
s by the wrong type, triggering undefined behavior. c2.d()
might seem to work if you’re lucky, but C++ doesn’t say it must work.
This is extraordinarily subtle. SpiderMonkey hackers missed this issue in their code until bug 1269319 was debugged and a (partly invalid, on other grounds) GCC compiler bug was filed. Even (more or less) understanding the spec intricacy, I missed this issue in some of the patchwork purporting to fix that bug. (Bug 1341951 provides an actual fix for one of these remaining issues.) Another SpiderMonkey hacker almost introduced another instance of this bug; fortunately I was reviewing the patch and flagged this issue.
Using AlignedStorage<N>::Type
, AlignedStorage2<T>::Type
, or std::aligned_storage<N>::type
doesn’t avoid this problem. We mitigated the problem by deleting AlignedStorage{,2}::Type
‘s copy constructors and assignment operators that would always do actual-type-unaware initialization. (Of course we can’t modify std::aligned_storage
.) But only careful scrutiny prevents other code from memcpy
ing those types. And memcpy
will copy without respecting the actual type stored there at that instant, too. And in practice, developers do try to use memcpy
for this when copy construction and assignment are forbidden, and reviewers can miss it.
What’s the solution to this problem?
As long as memcpy
and memmove
exist, this very subtle issue can’t be eradicated. There is no silver bullet.
The best solution is don’t hand-roll raw storage. This problem doesn’t exist in Maybe
, mozilla::Variant
, mozilla::MaybeOneOf
, mozilla::Vector
, and other utility classes designed to possibly hold a value. (Sometimes because we just fixed them.)
But if you must hand-roll a solution, construct an object of the actual type into your raw storage. It isn’t enough to copy the bytes of an object of the actual type into raw storage, then treat the storage as that actual type. For example, in ContainsLazyDouble
, a correct copy constructor that respects C++ strict aliasing rules would be:
#include <string.h> // formemcpy
// Add this to ContainsLazyDouble: ContainsLazyDouble(const ContainsLazyDouble& other) : hasDouble_(other.hasDouble_) { if (hasDouble_) { // The only way to allocate a free-floatingT
, is to // placement-new
it, usually also invokingT
's copy // constructor. new (data()) double(other.d()); // This would also be valid, if almost pointlessly bizarre — but only // becausedouble
is trivially copyable. (It wouldn't be safe // to do this with a type with a user-defined copy constructor, or // virtual functions, or that had to do anything at all to initialize // the new object.) new (data()) double; // creates an uninitialized double memcpy(lazyData, other.lazyData, sizeof(lazyData)); // sets to other.d() } } // ...and this to the using code: ContainsLazyDouble c2 = c; // invokes the now-safe copy constructor
The implicitly-generated copy assignment operator will usually require similar changes — or it can be = delete
d.
Final considerations
AlignedStorage
seems like a good idea. But it’s extremely easy to run afoul of a copy operation that doesn’t preserve the actual object type, by the default copy constructor or assignment operator or by memcpy
in entirely-separate code. We’re removing AlignedStorage{,2}
so these classes can’t be misused this way. (The former has just been removed from the tree — the latter has many more users and will be harder to kill.) It’s possible to use them correctly, but misuse is too easy to leave these loaded guns in the tree.
If it’s truly necessary to hand-roll a solution, you should hand-roll all of it, all the way down to the buffer of unsigned char
with an alignas()
attribute. Writing this correctly this is expert-level C++. But it was expert-level C++ even with the aligned-storage helper. You should have to know what you’re doing — and you shouldn’t need the std::aligned_storage
crutch to do it.
Moreover, I hope that the extra complexity of hand-rolling discourages non-expert reviewers from reviewing such code. I’d feel significantly more confident Mozilla won’t repeat this mistake if I knew that every use of (for example) alignas
were reviewed by (say) froydnj or me. Perhaps we can get some Mercurial tooling in place to enforce a review requirement along those lines.
In the meantime, I hope I’ve made the C++ developers out there who read this, at least somewhat aware of this pitfall, and at best competent to avoid it.