21.05.18 PSA: stop using `mozilla::PodZero` and `mozilla::PodArrayZero`

Tags: c++, c++11, compilers, list-initialization, memset, mozilla, pod, podzero — Jeff @ 13:08

I’ve blogged about surprising bits of the C++ object model before, and I’m back with more.

Executive summary: Don’t use mozilla::PodZero or mozilla::PodArrayZero. Modern C++ provides better alternatives that don’t presume that writing all zeroes will always correctly initialize the given type. Use constructors, in-class member initializers, and functions like std::fill to zero member fields.

The briefest recap of salient parts of the C++ object model

C++ as a language really wants to know when objects are created so that compilers can know that this memory contains an object of this type. Compilers then can assume that writing an object of one type, won’t conflict with reads/writes of incompatible types.

double foo(double* d, int* i, int z)
{
  *d = 3.14;

  // int/double are incompatible, so this write may be
  // assumed not to change the value of *d.
  *i = z;

  // Therefore *d may be assumed to still be 3.14, so this
  // may be compiled as 3.14 * z without rereading *d.
  return *d * z;
}

You can’t use arbitrary memory as your desired type after a cast. An object of that type must have been explicitly created there: e.g. a local variable of that type must be declared there, a field of that type must be defined and the containing object created, the object must be created via new, &c.

Misinterpreting an object using an incompatible type violates the strict aliasing rules in [basic.lval]p11.

`memset`ting an object

memset lets you write characters over memory. C code routinely used this to fill an array or struct with zeroes or null pointers or similar, assuming all-zeroes writes the right value.

C++ code also sometimes uses memset to zero out an object, either after allocating its memory or in the constructor. This doesn’t create a T (you’d need to placement-new), but it often still “works”. But what if T changes to require initialization? Maybe a field in T gains a constructor (T might never be touched!) or a nonzero initializer, making T a non-trivial type. memset could hide that fresh initialization requirement or (depending when the memset happens) overwrite a necessary initialization.

Problem intensifies

Unfortunately, Mozilla code has provided and promoted a PodZero function that misuses memset this way. So when I built with gcc 8.0 recently (I usually use a home-built clang), I discovered a torrent of build warnings about memset misuse on non-trivial types. A taste:

In file included from /home/jwalden/moz/after/js/src/jit/BitSet.h:12,
                 from /home/jwalden/moz/after/js/src/jit/Safepoints.h:10,
                 from /home/jwalden/moz/after/js/src/jit/JitFrames.h:13,
                 from /home/jwalden/moz/after/js/src/jit/BaselineFrame.h:10,
                 from /home/jwalden/moz/after/js/src/vm/Stack-inl.h:15,
                 from /home/jwalden/moz/after/js/src/vm/Debugger-inl.h:12,
                 from /home/jwalden/moz/after/js/src/vm/DebuggerMemory.cpp:29,
                 from /home/jwalden/moz/after/js/src/dbg/js/src/Unified_cpp_js_src32.cpp:2:
/home/jwalden/moz/after/js/src/jit/JitAllocPolicy.h: In instantiation of ‘T* js::jit::JitAllocPolicy::maybe_pod_calloc(size_t) [with T = js::detail::HashTableEntry<js::HashMapEntry<JS::Value, unsigned int> >; size_t = long unsigned int]’:
/home/jwalden/moz/after/js/src/dbg/dist/include/js/HashTable.h:1293:63:   required from ‘static js::detail::HashTable<T, HashPolicy, AllocPolicy>::Entry* js::detail::HashTable<T, HashPolicy, AllocPolicy>::createTable(AllocPolicy&, uint32_t, js::detail::HashTable<T, HashPolicy, AllocPolicy>::FailureBehavior) [with T = js::HashMapEntry<JS::Value, unsigned int>; HashPolicy = js::HashMap<JS::Value, unsigned int, js::jit::LIRGraph::ValueHasher, js::jit::JitAllocPolicy>::MapHashPolicy; AllocPolicy = js::jit::JitAllocPolicy; js::detail::HashTable<T, HashPolicy, AllocPolicy>::Entry = js::detail::HashTableEntry<js::HashMapEntry<JS::Value, unsigned int> >; uint32_t = unsigned int]’
/home/jwalden/moz/after/js/src/dbg/dist/include/js/HashTable.h:1361:28:   required from ‘bool js::detail::HashTable<T, HashPolicy, AllocPolicy>::init(uint32_t) [with T = js::HashMapEntry<JS::Value, unsigned int>; HashPolicy = js::HashMap<JS::Value, unsigned int, js::jit::LIRGraph::ValueHasher, js::jit::JitAllocPolicy>::MapHashPolicy; AllocPolicy = js::jit::JitAllocPolicy; uint32_t = unsigned int]’
/home/jwalden/moz/after/js/src/dbg/dist/include/js/HashTable.h:92:69:   required from ‘bool js::HashMap<Key, Value, HashPolicy, AllocPolicy>::init(uint32_t) [with Key = JS::Value; Value = unsigned int; HashPolicy = js::jit::LIRGraph::ValueHasher; AllocPolicy = js::jit::JitAllocPolicy; uint32_t = unsigned int]’
/home/jwalden/moz/after/js/src/jit/LIR.h:1901:38:   required from here
/home/jwalden/moz/after/js/src/jit/JitAllocPolicy.h:101:19: warning: ‘void* memset(void*, int, size_t)’ clearing an object of type ‘class js::detail::HashTableEntry<js::HashMapEntry<JS::Value, unsigned int> >’ with no trivial copy-assignment [-Wclass-memaccess]
             memset(p, 0, numElems * sizeof(T));
             ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /home/jwalden/moz/after/js/src/dbg/dist/include/js/TracingAPI.h:11,
                 from /home/jwalden/moz/after/js/src/dbg/dist/include/js/GCPolicyAPI.h:47,
                 from /home/jwalden/moz/after/js/src/dbg/dist/include/js/RootingAPI.h:22,
                 from /home/jwalden/moz/after/js/src/dbg/dist/include/js/CallArgs.h:73,
                 from /home/jwalden/moz/after/js/src/jsapi.h:29,
                 from /home/jwalden/moz/after/js/src/vm/DebuggerMemory.h:10,
                 from /home/jwalden/moz/after/js/src/vm/DebuggerMemory.cpp:7,
                 from /home/jwalden/moz/after/js/src/dbg/js/src/Unified_cpp_js_src32.cpp:2:
/home/jwalden/moz/after/js/src/dbg/dist/include/js/HashTable.h:794:7: note: ‘class js::detail::HashTableEntry<js::HashMapEntry<JS::Value, unsigned int> >’ declared here
 class HashTableEntry
       ^~~~~~~~~~~~~~
Unified_cpp_js_src36.o
In file included from /home/jwalden/moz/after/js/src/dbg/dist/include/js/HashTable.h:19,
                 from /home/jwalden/moz/after/js/src/dbg/dist/include/js/TracingAPI.h:11,
                 from /home/jwalden/moz/after/js/src/dbg/dist/include/js/GCPolicyAPI.h:47,
                 from /home/jwalden/moz/after/js/src/dbg/dist/include/js/RootingAPI.h:22,
                 from /home/jwalden/moz/after/js/src/dbg/dist/include/js/CallArgs.h:73,
                 from /home/jwalden/moz/after/js/src/dbg/dist/include/js/CallNonGenericMethod.h:12,
                 from /home/jwalden/moz/after/js/src/NamespaceImports.h:15,
                 from /home/jwalden/moz/after/js/src/gc/Barrier.h:10,
                 from /home/jwalden/moz/after/js/src/vm/ArgumentsObject.h:12,
                 from /home/jwalden/moz/after/js/src/vm/GeneratorObject.h:10,
                 from /home/jwalden/moz/after/js/src/vm/GeneratorObject.cpp:7,
                 from /home/jwalden/moz/after/js/src/dbg/js/src/Unified_cpp_js_src33.cpp:2:
/home/jwalden/moz/after/js/src/dbg/dist/include/mozilla/PodOperations.h: In instantiation of ‘void mozilla::PodZero(T*) [with T = js::NativeIterator]’:
/home/jwalden/moz/after/js/src/vm/Iteration.cpp:578:15:   required from here
/home/jwalden/moz/after/js/src/dbg/dist/include/mozilla/PodOperations.h:32:9: warning: ‘void* memset(void*, int, size_t)’ clearing an object of type ‘struct js::NativeIterator’ with no trivial copy-assignment; use assignment or value-initialization instead [-Wclass-memaccess]
   memset(aT, 0, sizeof(T));
   ~~~~~~^~~~~~~~~~~~~~~~~~
In file included from /home/jwalden/moz/after/js/src/vm/JSCompartment-inl.h:14,
                 from /home/jwalden/moz/after/js/src/vm/JSObject-inl.h:32,
                 from /home/jwalden/moz/after/js/src/vm/ArrayObject-inl.h:15,
                 from /home/jwalden/moz/after/js/src/vm/GeneratorObject.cpp:11,
                 from /home/jwalden/moz/after/js/src/dbg/js/src/Unified_cpp_js_src33.cpp:2:
/home/jwalden/moz/after/js/src/vm/Iteration.h:32:8: note: ‘struct js::NativeIterator’ declared here
 struct NativeIterator
        ^~~~~~~~~~~~~~

Fixing the problem by not using `mozilla::PodZero`

Historically you’d have to add every single member-initialization to your constructor, duplicating names and risking missing one, but C+11’s in-class initializers allow an elegant fix:

// Add " = nullptr" to initialize these function pointers.
struct AsmJSCacheOps
{
    OpenAsmJSCacheEntryForReadOp openEntryForRead = nullptr;
    CloseAsmJSCacheEntryForReadOp closeEntryForRead = nullptr;
    OpenAsmJSCacheEntryForWriteOp openEntryForWrite = nullptr;
    CloseAsmJSCacheEntryForWriteOp closeEntryForWrite = nullptr;
};

As long as you invoke a constructor, the members will be initialized. (Constructors can initialize a member to override in-class initializers.)

List-initialization using {} is also frequently helpful: you can use it to zero trailing (or all) members of an array or struct without naming/providing them:

class PreliminaryObjectArray
{
  public:
    static const uint32_t COUNT = 20;

  private:
    // All objects with the type which have been allocated. The pointers in
    // this array are weak.
    JSObject* objects[COUNT] = {}; // zeroes

  public:
    PreliminaryObjectArray() = default;

    // ...
};

Finally, C++ offers iterative-mutation functions to fill a container:

#include <algorithm>

// mozilla::Array's default constructor doesn't initialize array
// contents unless the element type is a class with a default
// constructor, and no Array overload exists to zero every
// element.  (You could pass 1024 zeroes, but....)
mozilla::Array<uint32_t, 1024> page; // array contents undefined

std::fill(page.begin(), page.end(), 0); // now contains zeroes
std::fill_n(page.begin(), page.end() - page.begin(), 0); // alternatively

After a long run of fixes to sundry bits of SpiderMonkey code to fix every last one of these issues last week, I’ve returned SpiderMonkey to warning-free with gcc (excluding imported ICU code). The only serious trickiness I ran into was a function of very unusual SpiderMonkey needs that shouldn’t affect code generally.

Fixing these issues is generally very doable. As people update to newer and newer gcc to build, the new -Wclass-memaccess warning that told me about these issues will bug more and more people, and I’m confident all these problems triggered by PodZero can be fixed.

`mozilla::PodZero` and `mozilla::PodArrayZero` are deprecated

PodZero and its array-zeroing variant PodArrayZero are ill-fitted to modern C++ and modern compilers. C++ now offers clean, type-safe ways to initialize memory to zeroes. You should avoid using PodZero and PodArrayZero in new code, replacing it with the initializer syntaxes mentioned above or with standard C++ algorithms to fill in zeroes.

As PodZero is used in a ton of places right now, it’ll likely stick around for some time. But there’s a good chance I’ll rename it to DeprecatedPodZero to highlight its badness and the desire to remove it. You should replace existing uses of it wherever and whenever you can.

Comments (0)

27.02.17 A pitfall in C++ low-level object creation and storage, and how to avoid it

Tags: alignment, c++, c++11, compilers, memcpy, mozilla, programming language, std::aligned_storage, undefined behavior — Jeff @ 14:17

While doing a couple recent reviews, I’ve read lots of code trying to solve the same general problem. Some code wants to store an object of some type (more rarely, an object from among several types), but it can’t do so yet. Some series of operations must occur first: a data structure needing to be put into the right state, or a state machine needing to transition just so. Think of this, perhaps, as the problem of having a conditionally initialized object.

Possibly-unsatisfactory approaches

One solution is to new the object at the proper time, storing nullptr prior to that instant. But this imposes the complexity of a heap allocation and handling its potential failure.

Another solution is to use mozilla::Maybe<T>. But Maybe must track whether the T object has been constructed, even if that status is already tracked elsewhere. And Maybe is potentially twice T‘s size.

The power approach

Tthe most common solution is to placement-new the object into raw storage. (This solution is particularly common in code written by especially-knowledgeable developers.) mfbt has historically provided mozilla::AlignedStorage and mozilla::AlignedStorage2 to implement such raw storage. Each class has a Type member typedef implementing suitable aligned, adequately-sized storage. Bog-standard C++11 offers similar functionality in std::aligned_storage.

Unfortunately, this approach is extremely easy to subtly misuse. And misuse occurs regularly in Mozilla code, and it can and has triggered crashes.

A detour into the C++ object model

C++ offers very fine-grained, low-level control over memory. It’s absurdly easy to access memory representations with the right casts.

But just because it’s easy to access memory representations, doesn’t mean it’s easy to do it safely. The C++ object model generally restricts you to accessing memory according to the actual type stored there at that instant. You can cast a true float* to uint32_t*, but C++ says undefined behavior — literally any behavior at all — may occur if you read from the uint32_t*. These so-called strict aliasing violations are pernicious not just because anything could happen, but because often exactly what you wanted to happen, happens. Broken code often still “works” — until it unpredictably misbehaves, in C++’s view with absolute justification.

A dragon lays waste to men attacking it — Here be dragons (CC-BY-SA, by bagogames)

There’s a big exception to the general by-actual-type rule: the memcpy exception. (Technically it’s a handful of rules that, in concert, permit memcpy. And there are other, non-memcpy-related exceptions.) memcpy(char*, const char*, size_t) has always worked to copy C-compatible objects around without regard to types, and C++’s object model permits this by letting you safely interpret the memory of a T as chars or unsigned chars. If T is trivially copyable, then:

T t1; 
char buf[sizeof(T)];
memcpy(buf, &t1, sizeof(T)); // stash bytes away
// ...time elapses during execution...
memcpy(&t1, buf, sizeof(T)); // restore them
// t1 safely has its original value

You can safely copy a T by the aforementioned character types elsewhere, then back into a T, and things will work. And second:

T t1, t2;
memcpy(&t1, &t2, sizeof(T)); // t1 safely has t2's value

You can safely copy a T into another T, and the second T will have the first T‘s value.

A C++-compatible placement-`new` approach

The placement-new-into-storage approach looks like this. (Real code would almost always use something more interesting than double, but this is the gist of it.)

#include <new> // for placement new

struct ContainsLazyDouble
{
    // Careful: align the storage consistent with the type it'll store.
    alignas(double) char lazyData[sizeof(double)];
    bool hasDouble_;

    // Indirection through these functions, rather than directly casting
    // lazyData to double*, evades a buggy GCC -Wstrict-aliasing warning.
    void* data() { return lazyData; }
    const void* data() const { return lazyData; }

  public:
    ContainsLazyDouble() : hasDouble_(false) {}

    void init(double d) {
      new (data()) double(d);
      hasDouble_ = true;
    }

    bool hasDouble() const { return hasDouble_; }
    double d() const {
      return *reinterpret_cast<const double*>(data());
    }
};

ContainsLazyDouble c;
// c.d(); // BAD, not initialized as double
c.init(3.141592654);
c.d(); // OK

This is safe. c.lazyData was originally char data, but we allocated a new double there, so henceforth that memory contains a double (even though it wasn’t declared as double), not chars (even though it was declared that way). The actual type stored in lazyData at that instant is properly respected.

A C++-incompatible extension of the placement-`new` approach

It’s safe to copy a T by the aforementioned character types. But it’s only safe to do so if 1) T is trivially copyable, and 2) the copied bytes are interpreted as T only within an actual T object. Not into a location to be (re)interpreted as T, but into a T. It’s unsafe to copy a T into a location that doesn’t at that instant contain a T, then reinterpret it as T.

So what happens if we use ContainsLazyDouble‘s implicitly-defined default copy constructor?

ContainsLazyDouble c2 = c;

This default copy constructor copies ContainsLazyDouble member by member, according to their declared types. So c.lazyData is copied as a char array that contains the object representation of c.d(). c2.lazyData therefore contains the same char array. But it doesn’t contain an actual double. It doesn’t matter that those chars encode a double: according to C++, that location does not contain a double.

Dereferencing reinterpret_cast<const double*>(data()) therefore mis-accesses an array of chars by the wrong type, triggering undefined behavior. c2.d() might seem to work if you’re lucky, but C++ doesn’t say it must work.

This is extraordinarily subtle. SpiderMonkey hackers missed this issue in their code until bug 1269319 was debugged and a (partly invalid, on other grounds) GCC compiler bug was filed. Even (more or less) understanding the spec intricacy, I missed this issue in some of the patchwork purporting to fix that bug. (Bug 1341951 provides an actual fix for one of these remaining issues.) Another SpiderMonkey hacker almost introduced another instance of this bug; fortunately I was reviewing the patch and flagged this issue.

Using AlignedStorage<N>::Type, AlignedStorage2<T>::Type, or std::aligned_storage<N>::type doesn’t avoid this problem. We mitigated the problem by deleting AlignedStorage{,2}::Type‘s copy constructors and assignment operators that would always do actual-type-unaware initialization. (Of course we can’t modify std::aligned_storage.) But only careful scrutiny prevents other code from memcpying those types. And memcpy will copy without respecting the actual type stored there at that instant, too. And in practice, developers do try to use memcpy for this when copy construction and assignment are forbidden, and reviewers can miss it.

What’s the solution to this problem?

As long as memcpy and memmove exist, this very subtle issue can’t be eradicated. There is no silver bullet.

The best solution is don’t hand-roll raw storage. This problem doesn’t exist in Maybe, mozilla::Variant, mozilla::MaybeOneOf, mozilla::Vector, and other utility classes designed to possibly hold a value. (Sometimes because we just fixed them.)

But if you must hand-roll a solution, construct an object of the actual type into your raw storage. It isn’t enough to copy the bytes of an object of the actual type into raw storage, then treat the storage as that actual type. For example, in ContainsLazyDouble, a correct copy constructor that respects C++ strict aliasing rules would be:

#include <string.h> // for memcpy

// Add this to ContainsLazyDouble:
ContainsLazyDouble(const ContainsLazyDouble& other)
  : hasDouble_(other.hasDouble_)
{
  if (hasDouble_)
  {
    // The only way to allocate a free-floating T, is to
    // placement-new it, usually also invoking T's copy
    // constructor.
    new (data()) double(other.d());

    // This would also be valid, if almost pointlessly bizarre — but only
    // because double is trivially copyable.  (It wouldn't be safe
    // to do this with a type with a user-defined copy constructor, or
    // virtual functions, or that had to do anything at all to initialize
    // the new object.)
    new (data()) double; // creates an uninitialized double
    memcpy(lazyData, other.lazyData, sizeof(lazyData)); // sets to other.d()
  }
}

// ...and this to the using code:
ContainsLazyDouble c2 = c; // invokes the now-safe copy constructor

The implicitly-generated copy assignment operator will usually require similar changes — or it can be = deleted.

Final considerations

AlignedStorage seems like a good idea. But it’s extremely easy to run afoul of a copy operation that doesn’t preserve the actual object type, by the default copy constructor or assignment operator or by memcpy in entirely-separate code. We’re removing AlignedStorage{,2} so these classes can’t be misused this way. (The former has just been removed from the tree — the latter has many more users and will be harder to kill.) It’s possible to use them correctly, but misuse is too easy to leave these loaded guns in the tree.

If it’s truly necessary to hand-roll a solution, you should hand-roll all of it, all the way down to the buffer of unsigned char with an alignas() attribute. Writing this correctly this is expert-level C++. But it was expert-level C++ even with the aligned-storage helper. You should have to know what you’re doing — and you shouldn’t need the std::aligned_storage crutch to do it.

Moreover, I hope that the extra complexity of hand-rolling discourages non-expert reviewers from reviewing such code. I’d feel significantly more confident Mozilla won’t repeat this mistake if I knew that every use of (for example) alignas were reviewed by (say) froydnj or me. Perhaps we can get some Mercurial tooling in place to enforce a review requirement along those lines.

In the meantime, I hope I’ve made the C++ developers out there who read this, at least somewhat aware of this pitfall, and at best competent to avoid it.

Comments (15)

03.12.14 Working on the JS engine, Episode V

Tags: c++, mozilla, programming, spidermonkey, templates, working on the js engine — Jeff @ 12:40

From a stack trace for a crash:

20:12:01     INFO -   2  libxul.so!bool js::DependentAddPtr<js::HashSet<js::ReadBarriered<js::UnownedBaseShape*>, js::StackBaseShape, js::SystemAllocPolicy> >::add<JS::RootedGeneric<js::StackBaseShape*>, js::UnownedBaseShape*>(js::ExclusiveContext const*, js::HashSet<js::ReadBarriered<js::UnownedBaseShape*>, js::StackBaseShape, js::SystemAllocPolicy>&, JS::RootedGeneric<js::StackBaseShape*> const&, js::UnownedBaseShape* const&) [HashTable.h:3ba384952a02 : 372 + 0x4]

If you can figure out where in that mess the actual method name is without staring at this for at least 15 seconds, I salute you. (Note that when I saw this originally, it wasn’t line-wrapped, making it even less readable.)

I’m not sure how this could be presented better, given the depth and breadth of template use in the class, in the template parameters to that class, in the method, and in the method arguments here.

Comments (5)

08.09.14 Quote of the day

Tags: assertion, c++, mozilla, programming, quote of the day — Jeff @ 15:56

Snipped from irrelevant context:

<jorendorff> In this case I see nearby code asserting that IsCompiled() is true, so I think I have it right

Assertions do more than point out mistakes in code. They also document that code’s intended behavior, permitting faster iteration and modification to that code by future users. Assertions are often more valuable as documentation, than they are as a means to detect bugs. (Although not always. *eyes fuzzers beadily*)

So don’t just assert the tricky requirements: assert the more-obvious ones, too. You may save the next person changing the code (and the person reviewing it, who could be you!) a lot of time.

Comments (0)

31.07.14 mfbt now has `UniquePtr` and `MakeUnique` for managing singly-owned resources

Tags: auto_ptr, c++, c++11, deleter, makeunique, make_unique, move, move assignment, move construction, mozilla, nsAutoPtr, nullptr, rvalue references, smart pointer, uniqueptr, unique_ptr — Jeff @ 15:17

Managing dynamic memory allocations in C++

C++ supports dynamic allocation of objects using new. For new objects to not leak, they must be deleted. This is quite difficult to do correctly in complex code. Smart pointers are the canonical solution. Mozilla has historically used nsAutoPtr, and C++98 provided std::auto_ptr, to manage singly-owned new objects. But nsAutoPtr and std::auto_ptr have a bug: they can be “copied.”

The following code allocates an int. When is that int destroyed? Does destroying ptr1 or ptr2 handle the task? What does ptr1 contain after ptr2‘s gone out of scope?

typedef auto_ptr<int> auto_int;
{
  auto_int ptr1(new int(17));
  {
    auto_int ptr2 = ptr1;
    // destroy ptr2
  }
  // destroy ptr1
}

Copying or assigning an auto_ptr implicitly moves the new object, mutating the input. When ptr2 = ptr1 happens, ptr1 is set to nullptr and ptr2 has a pointer to the allocated int. When ptr2 goes out of scope, it destroys the allocated int. ptr1 is nullptr when it goes out of scope, so destroying it does nothing.

Fixing `auto_ptr`

Implicit-move semantics are safe but very unclear. And because these operations mutate their input, they can’t take a const reference. For example, auto_ptr has an auto_ptr::auto_ptr(auto_ptr&) constructor but not an auto_ptr::auto_ptr(const auto_ptr&) copy constructor. This breaks algorithms requiring copyability.

We can solve these problems with a smart pointer that prohibits copying/assignment unless the input is a temporary value. (C++11 calls these rvalue references, but I’ll use “temporary value” for readability.) If the input’s a temporary value, we can move the resource out of it without disrupting anyone else’s view of it: as a temporary it’ll die before anyone could observe it. (The rvalue reference concept is incredibly subtle. Read that article series a dozen times, and maybe you’ll understand half of it. I’ve spent multiple full days digesting it and still won’t claim full understanding.)

Presenting `mozilla::UniquePtr`

I’ve implemented mozilla::UniquePtr in #include "mozilla/UniquePtr.h" to fit the bill. It’s based on C++11’s std::unique_ptr (not always available right now). UniquePtr provides auto_ptr‘s safety while providing movability but not copyability.

`UniquePtr` template parameters

Using UniquePtr requires the type being owned and what will ultimately be done to generically delete it. The type is the first template argument; the deleter is the (optional) second. The default deleter performs delete for non-array types and delete[] for array types. (This latter improves upon auto_ptr and nsAutoPtr [and the derivative nsAutoArrayPtr], which fail horribly when used with new[].)

UniquePtr<int> i1(new int(8));
UniquePtr<int[]> arr1(new int[17]());

Deleters are callable values, that are called whenever a UniquePtr‘s object should be destroyed. If a custom deleter is used, it’s a really good idea for it to be empty (per mozilla::IsEmpty<D>) so that UniquePtr<T, D> is as space-efficient as a raw pointer.

struct FreePolicy
{
  void operator()(void* ptr) {
    free(ptr);
  }
};

{
  void* m = malloc(4096);
  UniquePtr<void, FreePolicy> mem(m);
  int* i = static_cast<int*>(malloc(sizeof(int)));
  UniquePtr<int, FreePolicy> integer(i);

  // integer.getDeleter()(i) is called
  // mem.getDeleter()(m) is called
}

Basic `UniquePtr` construction and assignment

As you’d expect, no-argument construction initializes to nullptr, a single pointer initializes to that pointer, and a pointer and a deleter initialize embedded pointer and deleter both.

UniquePtr<int> i1;
assert(i1 == nullptr);
UniquePtr<int> i2(new int(8));
assert(i2 != nullptr);
UniquePtr<int, FreePolicy> i3(nullptr, FreePolicy());

Move construction and assignment

All remaining constructors and assignment operators accept only nullptr or compatible, temporary UniquePtr values. These values have well-defined ownership, in marked contrast to raw pointers.

class B
{
    int i;

  public:
    B(int i) : i(i) {}
    virtual ~B() {} // virtual required so delete (B*)(pointer to D) calls ~D()
};

class D : public B
{
  public:
    D(int i) : B(i) {}
};

UniquePtr<B> MakeB(int i)
{
  typedef UniquePtr<B>::DeleterType BDeleter;

  // OK to convert UniquePtr<D, BDeleter> to UniquePtr<B>:
  // Note: For UniquePtr interconversion, both pointer and deleter
  //       types must be compatible!  Thus BDeleter here.
  return UniquePtr<D, BDeleter>(new D(i));
}

UniquePtr<B> b1(MakeB(66)); // OK: temporary value moved into b1

UniquePtr<B> b2(b1); // ERROR: b1 not a temporary, would confuse
                     // single ownership, forbidden

UniquePtr<B> b3;

b3 = b1;  // ERROR: b1 not a temporary, would confuse
          // single ownership, forbidden

b3 = MakeB(76); // OK: return value moved into b3
b3 = nullptr;   // OK: can't confuse ownership of nullptr

What if you really do want to move a resource from one UniquePtr to another? You can explicitly request a move using mozilla::Move() from #include "mozilla/Move.h".

int* i = new int(37);
UniquePtr<int> i1(i);

UniquePtr<int> i2(Move(i1));
assert(i1 == nullptr);
assert(i2.get() == i);

i1 = Move(i2);
assert(i1.get() == i);
assert(i2 == nullptr);

Move transforms the type of its argument into a temporary value type. Move doesn’t have any effects of its own. Rather, it’s the job of users such as UniquePtr to ascribe special semantics to operations accepting temporary values. (If no special semantics are provided, temporary values match only const reference types as in C++98.)

Observing a `UniquePtr`‘s value

The dereferencing operators (-> and *) and conversion to bool behave as expected for any smart pointer. The raw pointer value can be accessed using get() if absolutely needed. (This should be uncommon, as the only pointer to the resource should live in the UniquePtr.) UniquePtr may also be compared against nullptr (but not against raw pointers).

int* i = new int(8);
UniquePtr<int> p(i);
if (p)
  *p = 42;
assert(p != nullptr);
assert(p.get() == i);
assert(*p == 42);

Changing a `UniquePtr`‘s value

Three mutation methods beyond assignment are available. A UniquePtr may be reset() to a raw pointer or to nullptr. The raw pointer may be extracted, and the UniquePtr cleared, using release(). Finally, UniquePtrs may be swapped.

int* i = new int(42);
int* i2;
UniquePtr<int> i3, i4;
{
  UniquePtr<int> integer(i);
  assert(i == integer.get());

  i2 = integer.release();
  assert(integer == nullptr);

  integer.reset(i2);
  assert(integer.get() == i2);

  integer.reset(new int(93)); // deletes i2

  i3 = Move(integer); // better than release()

  i3.swap(i4);
  Swap(i3, i4); // mozilla::Swap, that is
}

When a UniquePtr loses ownership of its resource, the embedded deleter will dispose of the managed pointer, in accord with the single-ownership concept. release() is the sole exception: it clears the UniquePtr and returns the raw pointer previously in it, without calling the deleter. This is a somewhat dangerous idiom. (Mozilla’s smart pointers typically call this forget(), and WebKit’s WTF calls this leak(). UniquePtr uses release() only for consistency with unique_ptr.) It’s generally much better to make the user take a UniquePtr, then transfer ownership using Move().

Array fillips

UniquePtr<T> and UniquePtr<T[]> share the same interface, with a few substantial differences. UniquePtr<T[]> defines an operator[] to permit indexing. As mentioned earlier, UniquePtr<T[]> by default will delete[] its resource, rather than delete it. As a corollary, UniquePtr<T[]> requires an exact type match when constructed or mutated using a pointer. (It’s an error to delete[] an array through a pointer to the wrong array element type, because delete[] has to know the element size to destruct each element. Not accepting other pointer types thus eliminates this class of errors.)

struct B {};
struct D : B {};
UniquePtr<B[]> bs;
// bs.reset(new D[17]()); // ERROR: requires B*, not D*
bs.reset(new B[5]());
bs[1] = B();

And a `mozilla::MakeUnique` helper function

Typing out new T every time a UniquePtr is created or initialized can get old. We’ve added a helper function, MakeUnique<T>, that combines new object (or array) creation with creation of a corresponding UniquePtr. The nice thing about MakeUnique is that it’s in some sense foolproof: if you only create new objects in UniquePtrs, you can’t leak or double-delete unless you leak the UniquePtr‘s owner, misuse a get(), or drop the result of release() on the floor. I recommend always using MakeUnique instead of new for single-ownership objects.

struct S { S(int i, double d) {} };

UniquePtr<S> s1 = MakeUnique<S>(17, 42.0);   // new S(17, 42.0)
UniquePtr<int> i1 = MakeUnique<int>(42);     // new int(42)
UniquePtr<int[]> i2 = MakeUnique<int[]>(17); // new int[17]()


// Given familiarity with UniquePtr, these work particularly
// well with C++11 auto: just recognize MakeUnique means new,
// T means single object, and T[] means array.
auto s2 = MakeUnique<S>(17, 42.0); // new S(17, 42.0)
auto i3 = MakeUnique<int>(42);     // new int(42)
auto i4 = MakeUnique<int[]>(17);   // new int[17]()

MakeUnique<T>(...args) computes new T(...args). MakeUnique of an array takes an array length and constructs the correspondingly-sized array.

In the long run we probably should expect everyone to recognize the MakeUnique idiom so that we can use auto here and cut down on redundant typing. In the short run, feel free to do whichever you prefer.

Update: Beware! Due to compiler limitations affecting gcc less than 4.6, passing literal nullptr as an argument to a MakeUnique call will fail to compile only on b2g-ics. Everywhere else will pass. You have been warned. The only alternative I can think of is to pass static_cast<T*>(nullptr) instead, or assign to a local variable and pass that instead. Love that b2g compiler!

Conclusion

UniquePtr was a free-time hacking project last Christmas week, that I mostly finished but ran out of steam on when work resumed. Only recently have I found time to finish it up and land it, yet we already have a couple hundred uses of it and MakeUnique. Please add more uses, and make our existing new code safer!

A final note: please use UniquePtr instead of mozilla::Scoped. UniquePtr is more standard, better-tested, and better-documented (particularly on the vast expanses of the web, where most unique_ptr documentation also suffices for UniquePtr). Scoped is now deprecated — don’t use it in new code!

Comments (0)

Where's Walden?

Mozilla, politics, economics, law, backpacking, cycling, and other random desiderata

21.05.18

PSA: stop using `mozilla::PodZero` and `mozilla::PodArrayZero`

The briefest recap of salient parts of the C++ object model

`memset`ting an object

Problem intensifies

Fixing the problem by not using `mozilla::PodZero`

`mozilla::PodZero` and `mozilla::PodArrayZero` are deprecated

27.02.17

A pitfall in C++ low-level object creation and storage, and how to avoid it

Possibly-unsatisfactory approaches

The power approach

A detour into the C++ object model

A C++-compatible placement-`new` approach

A C++-incompatible extension of the placement-`new` approach

What’s the solution to this problem?

Final considerations

03.12.14

Working on the JS engine, Episode V

08.09.14

Quote of the day

31.07.14

mfbt now has `UniquePtr` and `MakeUnique` for managing singly-owned resources

Managing dynamic memory allocations in C++

Fixing `auto_ptr`

Presenting `mozilla::UniquePtr`

`UniquePtr` template parameters

Basic `UniquePtr` construction and assignment

Move construction and assignment

Observing a `UniquePtr`‘s value

Changing a `UniquePtr`‘s value

Array fillips

And a `mozilla::MakeUnique` helper function

Conclusion

Where's Walden?

Mozilla, politics, economics, law, backpacking, cycling, and other random desiderata

21.05.18

PSA: stop using mozilla::PodZero and mozilla::PodArrayZero

The briefest recap of salient parts of the C++ object model

memsetting an object

Problem intensifies

Fixing the problem by not using mozilla::PodZero

mozilla::PodZero and mozilla::PodArrayZero are deprecated

27.02.17

A pitfall in C++ low-level object creation and storage, and how to avoid it

Possibly-unsatisfactory approaches

The power approach

A detour into the C++ object model

A C++-compatible placement-new approach

A C++-incompatible extension of the placement-new approach

What’s the solution to this problem?

Final considerations

03.12.14

Working on the JS engine, Episode V

08.09.14

Quote of the day

31.07.14

mfbt now has UniquePtr and MakeUnique for managing singly-owned resources

Managing dynamic memory allocations in C++

Fixing auto_ptr

Presenting mozilla::UniquePtr

UniquePtr template parameters

Basic UniquePtr construction and assignment

Move construction and assignment

Observing a UniquePtr‘s value

Changing a UniquePtr‘s value

Array fillips

And a mozilla::MakeUnique helper function

Conclusion

PSA: stop using `mozilla::PodZero` and `mozilla::PodArrayZero`

`memset`ting an object

Fixing the problem by not using `mozilla::PodZero`

`mozilla::PodZero` and `mozilla::PodArrayZero` are deprecated

A C++-compatible placement-`new` approach

A C++-incompatible extension of the placement-`new` approach

mfbt now has `UniquePtr` and `MakeUnique` for managing singly-owned resources

Fixing `auto_ptr`

Presenting `mozilla::UniquePtr`

`UniquePtr` template parameters

Basic `UniquePtr` construction and assignment

Observing a `UniquePtr`‘s value

Changing a `UniquePtr`‘s value

And a `mozilla::MakeUnique` helper function