27.02.17

A pitfall in C++ low-level object creation and storage, and how to avoid it

While doing a couple recent reviews, I’ve read lots of code trying to solve the same general problem. Some code wants to store an object of some type (more rarely, an object from among several types), but it can’t do so yet. Some series of operations must occur first: a data structure needing to be put into the right state, or a state machine needing to transition just so. Think of this, perhaps, as the problem of having a conditionally initialized object.

Possibly-unsatisfactory approaches

One solution is to new the object at the proper time, storing nullptr prior to that instant. But this imposes the complexity of a heap allocation and handling its potential failure.

Another solution is to use mozilla::Maybe<T>. But Maybe must track whether the T object has been constructed, even if that status is already tracked elsewhere. And Maybe is potentially twice T‘s size.

The power approach

Tthe most common solution is to placement-new the object into raw storage. (This solution is particularly common in code written by especially-knowledgeable developers.) mfbt has historically provided mozilla::AlignedStorage and mozilla::AlignedStorage2 to implement such raw storage. Each class has a Type member typedef implementing suitable aligned, adequately-sized storage. Bog-standard C++11 offers similar functionality in std::aligned_storage.

Unfortunately, this approach is extremely easy to subtly misuse. And misuse occurs regularly in Mozilla code, and it can and has triggered crashes.

A detour into the C++ object model

C++ offers very fine-grained, low-level control over memory. It’s absurdly easy to access memory representations with the right casts.

But just because it’s easy to access memory representations, doesn’t mean it’s easy to do it safely. The C++ object model generally restricts you to accessing memory according to the actual type stored there at that instant. You can cast a true float* to uint32_t*, but C++ says undefined behavior — literally any behavior at all — may occur if you read from the uint32_t*. These so-called strict aliasing violations are pernicious not just because anything could happen, but because often exactly what you wanted to happen, happens. Broken code often still “works” — until it unpredictably misbehaves, in C++’s view with absolute justification.

A dragon lays waste to men attacking it
Here be dragons (CC-BY-SA, by bagogames)

There’s a big exception to the general by-actual-type rule: the memcpy exception. (Technically it’s a handful of rules that, in concert, permit memcpy. And there are other, non-memcpy-related exceptions.) memcpy(char*, const char*, size_t) has always worked to copy C-compatible objects around without regard to types, and C++’s object model permits this by letting you safely interpret the memory of a T as chars or unsigned chars. If T is trivially copyable, then:

T t1; 
char buf[sizeof(T)];
memcpy(buf, &t1, sizeof(T)); // stash bytes away
// ...time elapses during execution...
memcpy(&t1, buf, sizeof(T)); // restore them
// t1 safely has its original value

You can safely copy a T by the aforementioned character types elsewhere, then back into a T, and things will work. And second:

T t1, t2;
memcpy(&t1, &t2, sizeof(T)); // t1 safely has t2's value

You can safely copy a T into another T, and the second T will have the first T‘s value.

A C++-compatible placement-new approach

The placement-new-into-storage approach looks like this. (Real code would almost always use something more interesting than double, but this is the gist of it.)

#include <new> // for placement new

struct ContainsLazyDouble
{
    // Careful: align the storage consistent with the type it'll store.
    alignas(double) char lazyData[sizeof(double)];
    bool hasDouble_;

    // Indirection through these functions, rather than directly casting
    // lazyData to double*, evades a buggy GCC -Wstrict-aliasing warning.
    void* data() { return lazyData; }
    const void* data() const { return lazyData; }

  public:
    ContainsLazyDouble() : hasDouble_(false) {}

    void init(double d) {
      new (data()) double(d);
      hasDouble_ = true;
    }

    bool hasDouble() const { return hasDouble_; }
    double d() const {
      return *reinterpret_cast<const double*>(data());
    }
};

ContainsLazyDouble c;
// c.d(); // BAD, not initialized as double
c.init(3.141592654);
c.d(); // OK

This is safe. c.lazyData was originally char data, but we allocated a new double there, so henceforth that memory contains a double (even though it wasn’t declared as double), not chars (even though it was declared that way). The actual type stored in lazyData at that instant is properly respected.

A C++-incompatible extension of the placement-new approach

It’s safe to copy a T by the aforementioned character types. But it’s only safe to do so if 1) T is trivially copyable, and 2) the copied bytes are interpreted as T only within an actual T object. Not into a location to be (re)interpreted as T, but into a T. It’s unsafe to copy a T into a location that doesn’t at that instant contain a T, then reinterpret it as T.

So what happens if we use ContainsLazyDouble‘s implicitly-defined default copy constructor?

ContainsLazyDouble c2 = c;

This default copy constructor copies ContainsLazyDouble member by member, according to their declared types. So c.lazyData is copied as a char array that contains the object representation of c.d(). c2.lazyData therefore contains the same char array. But it doesn’t contain an actual double. It doesn’t matter that those chars encode a double: according to C++, that location does not contain a double.

Dereferencing reinterpret_cast<const double*>(data()) therefore mis-accesses an array of chars by the wrong type, triggering undefined behavior. c2.d() might seem to work if you’re lucky, but C++ doesn’t say it must work.

This is extraordinarily subtle. SpiderMonkey hackers missed this issue in their code until bug 1269319 was debugged and a (partly invalid, on other grounds) GCC compiler bug was filed. Even (more or less) understanding the spec intricacy, I missed this issue in some of the patchwork purporting to fix that bug. (Bug 1341951 provides an actual fix for one of these remaining issues.) Another SpiderMonkey hacker almost introduced another instance of this bug; fortunately I was reviewing the patch and flagged this issue.

Using AlignedStorage<N>::Type, AlignedStorage2<T>::Type, or std::aligned_storage<N>::type doesn’t avoid this problem. We mitigated the problem by deleting AlignedStorage{,2}::Type‘s copy constructors and assignment operators that would always do actual-type-unaware initialization. (Of course we can’t modify std::aligned_storage.) But only careful scrutiny prevents other code from memcpying those types. And memcpy will copy without respecting the actual type stored there at that instant, too. And in practice, developers do try to use memcpy for this when copy construction and assignment are forbidden, and reviewers can miss it.

What’s the solution to this problem?

As long as memcpy and memmove exist, this very subtle issue can’t be eradicated. There is no silver bullet.

The best solution is don’t hand-roll raw storage. This problem doesn’t exist in Maybe, mozilla::Variant, mozilla::MaybeOneOf, mozilla::Vector, and other utility classes designed to possibly hold a value. (Sometimes because we just fixed them.)

But if you must hand-roll a solution, construct an object of the actual type into your raw storage. It isn’t enough to copy the bytes of an object of the actual type into raw storage, then treat the storage as that actual type. For example, in ContainsLazyDouble, a correct copy constructor that respects C++ strict aliasing rules would be:

#include <string.h> // for memcpy

// Add this to ContainsLazyDouble:
ContainsLazyDouble(const ContainsLazyDouble& other)
  : hasDouble_(other.hasDouble_)
{
  if (hasDouble_)
  {
    // The only way to allocate a free-floating T, is to
    // placement-new it, usually also invoking T's copy
    // constructor.
    new (data()) double(other.d());

    // This would also be valid, if almost pointlessly bizarre — but only
    // because double is trivially copyable.  (It wouldn't be safe
    // to do this with a type with a user-defined copy constructor, or
    // virtual functions, or that had to do anything at all to initialize
    // the new object.)
    new (data()) double; // creates an uninitialized double
    memcpy(lazyData, other.lazyData, sizeof(lazyData)); // sets to other.d()
  }
}

// ...and this to the using code:
ContainsLazyDouble c2 = c; // invokes the now-safe copy constructor

The implicitly-generated copy assignment operator will usually require similar changes — or it can be = deleted.

Final considerations

AlignedStorage seems like a good idea. But it’s extremely easy to run afoul of a copy operation that doesn’t preserve the actual object type, by the default copy constructor or assignment operator or by memcpy in entirely-separate code. We’re removing AlignedStorage{,2} so these classes can’t be misused this way. (The former has just been removed from the tree — the latter has many more users and will be harder to kill.) It’s possible to use them correctly, but misuse is too easy to leave these loaded guns in the tree.

If it’s truly necessary to hand-roll a solution, you should hand-roll all of it, all the way down to the buffer of unsigned char with an alignas() attribute. Writing this correctly this is expert-level C++. But it was expert-level C++ even with the aligned-storage helper. You should have to know what you’re doing — and you shouldn’t need the std::aligned_storage crutch to do it.

Moreover, I hope that the extra complexity of hand-rolling discourages non-expert reviewers from reviewing such code. I’d feel significantly more confident Mozilla won’t repeat this mistake if I knew that every use of (for example) alignas were reviewed by (say) froydnj or me. Perhaps we can get some Mercurial tooling in place to enforce a review requirement along those lines.

In the meantime, I hope I’ve made the C++ developers out there who read this, at least somewhat aware of this pitfall, and at best competent to avoid it.

14 Comments »

  1. Madness! What an insane footgun.

    Comment by Robert O'Callahan — 27.02.17 @ 15:11

  2. It’s madness, of a sort. The real problem is it’s not clear how you could eliminate it.

    The aim of the C++ object model is to enable writes of distinct types to be safely reordered for optimization purposes, and similar such things. Operations on unrelated types don’t affect each other. But if we suppose that this char-write into char actually can affect other types, with no indication to C++ of the existence of such other type, no type is safe from overwriting, in the presence of a char write. When C++ requires a placement-new to notify the compiler, that’s an explicit signal that the compiler can use as input when performing alias analysis, and in knowing what might really be in that memory at any instant. Without that signal, there’s nothing to distinguish any char copy of a series of bytes, from a modification to any type in the program. And then everything must be pessimized, and performance goes in the tank.

    We probably wouldn’t be in such a bad situation if there had been (in C originally, ha, ha, ha) better-typed replacements for memcpy and friends, that informed the compiler what was really being done. And a bunch of helper classes for conditionally-initialized objects of various sorts, that would handle notifying the compiler themselves. Which is a long way of saying the C++ situation is tractable in any particular examined situation, but hopeless when you have to look across the board and at how many places might unwittingly violate the rules.

    Comment by Jeff — 27.02.17 @ 15:41

  3. I thought that char* pointers did in fact implicitly alias anything, but now I see that’s C-specific and C++ is different here. Wow.

    I wonder how many C/C++ developers know that.

    I wonder how anyone can actually be expected to know enough to use C++ safely.

    Comment by Robert O'Callahan — 27.02.17 @ 19:23

  4. Am I correct in understanding that the main problem in your example is that the constructor/destructor are never called? Or am I missing something?

    Also, wth is the purpose of AlignedStorage/aligned_storage in the first place? I have read the doc and it’s not clear to me that it’s useful for anything other than implementing `Vector`.

    Comment by Yoric — 28.02.17 @ 06:56

  5. Is the “memcpy exception” somehow specified in the standard, or is it a de-facto standard only?

    Regarding strict-aliasing, I thought that every C++ shop, including Mozilla, compiled its C++ code with -fno-strict-aliasing? Did that change recently at Mozilla? (I’ve been away for a couple of years).

    Comment by Benoit Jacob — 28.02.17 @ 08:25

  6. Some parts of Mozilla compile with -fno-strict-aliasing, while others compile with strict aliasing enabled. I think it’s module-by-module. JS specifically has strict aliasing on for performance reasons, if memory serves. (Although I think this has varied over time.)

    Comment by Jeff — 28.02.17 @ 09:06

  7. I see. Is the performance gain really worth the footgun? I see the present blog post as evidence that the bar needs to be very high, for it to be worth dealing with strict aliasing!

    Comment by Benoit Jacob — 28.02.17 @ 09:09

  8. Note: I’m very interested in an answer to my “memcpy” question! I’ve heard allusions to such an exception many times, and in those codebases that build with strict aliasing it is often claimed to be the “right” way to do type-punning. Is there some spec language to back that up?

    Comment by Benoit Jacob — 28.02.17 @ 09:10

  9. The main problem in the example is that when C++ implicitly generates the copy constructor and copy assignment operator (and similar for moves, if necessary), it respects the declared types of fields. It does not copy over any user-defined type semantics of what those fields contain, when such contradict the declared type of a field. Explicit definitions of these functions are required to manually transfer user-defined type semantics.

    The purpose of AlignedStorage/aligned_storage is basically just to implement scratchpad space to use to manually store some type of value, without an actual variable of that type being used. (Typically because the type can’t or shouldn’t be default-constructed quite so early.) As long as it’s not embedded in a structure that’s ever copied or moved, it’s largely safe. But once you copy/move it, you have to be extra-careful how you do so. And C++ does you no favors by implicitly generating for you, broken versions of these functions — because it doesn’t and can’t know about your outside-the-type-system repurposing.

    Comment by Jeff — 28.02.17 @ 09:14

  10. Woops, forgot to answer the memcpy bit. The answer is that every bit of the exception is standardized and specified. The name itself, on the other hand, is a pure colloquialism that doesn’t appear in the spec. (On the other hand, the two examples I give where C++ lets you safely memcpy a T with defined semantics come almost directly from the spec, in C++11 [basic.types]p2-3.)

    And yes, memcpy is the spec-correct and proper foundation for implementing type-punning. (Subject, of course, to presumptions about how the compiler’s chosen memory representations of the types.) A number of compilers do, however, say that writing into a union with one type and reading out with another, as a matter of implementation-defined behavior, will implement correct type punning.

    Comment by Jeff — 28.02.17 @ 09:20

  11. I personally tend to the view that strict aliasing is not wholly unreasonable, and that performance wins are worth some extra effort. In a world where C++ is the existing implementation language, rather than raging against the system or deliberately writing code that depends on implementation-specific frobs that don’t necessarily exist in every compiler, I would prefer to sink time and effort into a more stringent system of reviews, with more-knowledgeable experts examining code that does unusual things in the type system. Which is why I suggest in the end enforcing review by a smaller set of people who actually understand what C++ does here, in the final section.

    But no, as far as perf numbers go, I don’t have any at hand.

    Comment by Jeff — 28.02.17 @ 09:26

  12. Thanks for the answer! I need some more hand-holding if you have more time.
    I found the C++11 standard here (is that it?)
    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf
    In the basic.types section, 3.9. on page 73, in paragraphs 2-3, it actually does mention memcpy by name, and:

    – in paragraph 2, it specifies that punning to unsigned-char is well-defined.

    – in paragraph 3, it specifies that memcpy’ing between two objects of the same type, is well-defined.

    But where does it specify that it is well-defined to memcpy-pun between two different types, neither of which is unsigned char?

    Or are you saying that unsigned char is all you need, and is all you use in the Mozilla helpers that you mention (Maybe, AlignedStorage, etc) ?

    Comment by Benoit Jacob — 28.02.17 @ 09:37

  13. Technically C++11 the standard isn’t publicly available at all — you have to pay $megabucks to buy it. But spec drafts from just before and after the official C++11 are available, that are as a practical matter entirely usable as if they were the real thing. I personally use the N3337 draft when I refer to C++11, as linked from this helpful Stack Overflow question. (C++14 does supersede it, and perhaps I should be consulting that or C++17 instead. But old habits die hard, and odds are newer versions haven’t messed with the object model in significant ways.)

    I’m not entirely confident of my analysis of the spec, to answer with 100% certainty where memcpy is definitely permitted for punning. I’ve read analyses (…somewhere, sometime) that suggested this to be true, and I think cited enough spec text to be persuasive, but I don’t remember their details. My best sense of the first principles, however, is that C++11 [basic.types]p4 is the answer:

    The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object is the set of bits that hold the value of type T. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.

    At least as to trivially copyable types, the object representation of a thing is accessible by its unsigned chars. You can always copy those to the side, and they become whatever the type of their destination is. (While retaining the same object/value representation.) The representation is implementation defined. If you copy it into the location of some other trivially copyable different type, that location retains its meaning as that other type, but it now has whatever implementation defined semantics are ascribed to a value of that type, having that value representation.

    Or at least that’s my sense of how type punning by memcpy can be justified using the spec.

    Comment by Jeff — 28.02.17 @ 09:59

  14. unsigned char plus the ability to demand particular alignment (which can be assured even without alignas, if at some cost in memory used) is all you need for conditionally-initialized objects. Then you just have to be careful how you copy those bits around (and that they’re properly destroyed if necessary, &c.). The various Mozilla helpers just get the technique right, and when they implement copyability/assignability/movability, correctly manually create objects of type T where needed, not relying on pure copying around of value representations to instantiate objects of type T in those fresh locations. (Maybe is an especially illustrative case, I think — copy-construction or move-construction of the conditionally-present value into the new location if necessary, and no copying if not, and all that, with careful transfer of values in ways consonant with the C++ object model.)

    Punning the bits of a value of one type into another type requires that p4 bit, I think, and is a bit distinct from this post’s thrust.

    Comment by Jeff — 28.02.17 @ 10:09

RSS feed for comments on this post. TrackBack URI

Leave a comment

HTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>