31.07.14

mfbt now has UniquePtr and MakeUnique for managing singly-owned resources

Managing dynamic memory allocations in C++

C++ supports dynamic allocation of objects using new. For new objects to not leak, they must be deleted. This is quite difficult to do correctly in complex code. Smart pointers are the canonical solution. Mozilla has historically used nsAutoPtr, and C++98 provided std::auto_ptr, to manage singly-owned new objects. But nsAutoPtr and std::auto_ptr have a bug: they can be “copied.”

The following code allocates an int. When is that int destroyed? Does destroying ptr1 or ptr2 handle the task? What does ptr1 contain after ptr2‘s gone out of scope?

typedef auto_ptr<int> auto_int;
{
  auto_int ptr1(new int(17));
  {
    auto_int ptr2 = ptr1;
    // destroy ptr2
  }
  // destroy ptr1
}

Copying or assigning an auto_ptr implicitly moves the new object, mutating the input. When ptr2 = ptr1 happens, ptr1 is set to nullptr and ptr2 has a pointer to the allocated int. When ptr2 goes out of scope, it destroys the allocated int. ptr1 is nullptr when it goes out of scope, so destroying it does nothing.

Fixing auto_ptr

Implicit-move semantics are safe but very unclear. And because these operations mutate their input, they can’t take a const reference. For example, auto_ptr has an auto_ptr::auto_ptr(auto_ptr&) constructor but not an auto_ptr::auto_ptr(const auto_ptr&) copy constructor. This breaks algorithms requiring copyability.

We can solve these problems with a smart pointer that prohibits copying/assignment unless the input is a temporary value. (C++11 calls these rvalue references, but I’ll use “temporary value” for readability.) If the input’s a temporary value, we can move the resource out of it without disrupting anyone else’s view of it: as a temporary it’ll die before anyone could observe it. (The rvalue reference concept is incredibly subtle. Read that article series a dozen times, and maybe you’ll understand half of it. I’ve spent multiple full days digesting it and still won’t claim full understanding.)

Presenting mozilla::UniquePtr

I’ve implemented mozilla::UniquePtr in #include "mozilla/UniquePtr.h" to fit the bill. It’s based on C++11′s std::unique_ptr (not always available right now). UniquePtr provides auto_ptr‘s safety while providing movability but not copyability.

UniquePtr template parameters

Using UniquePtr requires the type being owned and what will ultimately be done to generically delete it. The type is the first template argument; the deleter is the (optional) second. The default deleter performs delete for non-array types and delete[] for array types. (This latter improves upon auto_ptr and nsAutoPtr [and the derivative nsAutoArrayPtr], which fail horribly when used with new[].)

UniquePtr<int> i1(new int(8));
UniquePtr<int[]> arr1(new int[17]());

Deleters are callable values, that are called whenever a UniquePtr‘s object should be destroyed. If a custom deleter is used, it’s a really good idea for it to be empty (per mozilla::IsEmpty<D>) so that UniquePtr<T, D> is as space-efficient as a raw pointer.

struct FreePolicy
{
  void operator()(void* ptr) {
    free(ptr);
  }
};

{
  void* m = malloc(4096);
  UniquePtr<void, FreePolicy> mem(m);
  int* i = static_cast<int*>(malloc(sizeof(int)));
  UniquePtr<int, FreePolicy> integer(i);

  // integer.getDeleter()(i) is called
  // mem.getDeleter()(m) is called
}

Basic UniquePtr construction and assignment

As you’d expect, no-argument construction initializes to nullptr, a single pointer initializes to that pointer, and a pointer and a deleter initialize embedded pointer and deleter both.

UniquePtr<int> i1;
assert(i1 == nullptr);
UniquePtr<int> i2(new int(8));
assert(i2 != nullptr);
UniquePtr<int, FreePolicy> i3(nullptr, FreePolicy());

Move construction and assignment

All remaining constructors and assignment operators accept only nullptr or compatible, temporary UniquePtr values. These values have well-defined ownership, in marked contrast to raw pointers.

class B
{
    int i;

  public:
    B(int i) : i(i) {}
    virtual ~B() {} // virtual required so delete (B*)(pointer to D) calls ~D()
};

class D : public B
{
  public:
    D(int i) : B(i) {}
};

UniquePtr<B> MakeB(int i)
{
  typedef UniquePtr<B>::DeleterType BDeleter;

  // OK to convert UniquePtr<D, BDeleter> to UniquePtr<B>:
  // Note: For UniquePtr interconversion, both pointer and deleter
  //       types must be compatible!  Thus BDeleter here.
  return UniquePtr<D, BDeleter>(new D(i));
}

UniquePtr<B> b1(MakeB(66)); // OK: temporary value moved into b1

UniquePtr<B> b2(b1); // ERROR: b1 not a temporary, would confuse
                     // single ownership, forbidden

UniquePtr<B> b3;

b3 = b1;  // ERROR: b1 not a temporary, would confuse
          // single ownership, forbidden

b3 = MakeB(76); // OK: return value moved into b3
b3 = nullptr;   // OK: can't confuse ownership of nullptr

What if you really do want to move a resource from one UniquePtr to another? You can explicitly request a move using mozilla::Move() from #include "mozilla/Move.h".

int* i = new int(37);
UniquePtr<int> i1(i);

UniquePtr<int> i2(Move(i1));
assert(i1 == nullptr);
assert(i2.get() == i);

i1 = Move(i2);
assert(i1.get() == i);
assert(i2 == nullptr);

Move transforms the type of its argument into a temporary value type. Move doesn’t have any effects of its own. Rather, it’s the job of users such as UniquePtr to ascribe special semantics to operations accepting temporary values. (If no special semantics are provided, temporary values match only const reference types as in C++98.)

Observing a UniquePtr‘s value

The dereferencing operators (-> and *) and conversion to bool behave as expected for any smart pointer. The raw pointer value can be accessed using get() if absolutely needed. (This should be uncommon, as the only pointer to the resource should live in the UniquePtr.) UniquePtr may also be compared against nullptr (but not against raw pointers).

int* i = new int(8);
UniquePtr<int> p(i);
if (p)
  *p = 42;
assert(p != nullptr);
assert(p.get() == i);
assert(*p == 42);

Changing a UniquePtr‘s value

Three mutation methods beyond assignment are available. A UniquePtr may be reset() to a raw pointer or to nullptr. The raw pointer may be extracted, and the UniquePtr cleared, using release(). Finally, UniquePtrs may be swapped.

int* i = new int(42);
int* i2;
UniquePtr<int> i3, i4;
{
  UniquePtr<int> integer(i);
  assert(i == integer.get());

  i2 = integer.release();
  assert(integer == nullptr);

  integer.reset(i2);
  assert(integer.get() == i2);

  integer.reset(new int(93)); // deletes i2

  i3 = Move(integer); // better than release()

  i3.swap(i4);
  Swap(i3, i4); // mozilla::Swap, that is
}

When a UniquePtr loses ownership of its resource, the embedded deleter will dispose of the managed pointer, in accord with the single-ownership concept. release() is the sole exception: it clears the UniquePtr and returns the raw pointer previously in it, without calling the deleter. This is a somewhat dangerous idiom. (Mozilla’s smart pointers typically call this forget(), and WebKit’s WTF calls this leak(). UniquePtr uses release() only for consistency with unique_ptr.) It’s generally much better to make the user take a UniquePtr, then transfer ownership using Move().

Array fillips

UniquePtr<T> and UniquePtr<T[]> share the same interface, with a few substantial differences. UniquePtr<T[]> defines an operator[] to permit indexing. As mentioned earlier, UniquePtr<T[]> by default will delete[] its resource, rather than delete it. As a corollary, UniquePtr<T[]> requires an exact type match when constructed or mutated using a pointer. (It’s an error to delete[] an array through a pointer to the wrong array element type, because delete[] has to know the element size to destruct each element. Not accepting other pointer types thus eliminates this class of errors.)

struct B {};
struct D : B {};
UniquePtr<B[]> bs;
// bs.reset(new D[17]()); // ERROR: requires B*, not D*
bs.reset(new B[5]());
bs[1] = B();

And a mozilla::MakeUnique helper function

Typing out new T every time a UniquePtr is created or initialized can get old. We’ve added a helper function, MakeUnique<T>, that combines new object (or array) creation with creation of a corresponding UniquePtr. The nice thing about MakeUnique is that it’s in some sense foolproof: if you only create new objects in UniquePtrs, you can’t leak or double-delete unless you leak the UniquePtr‘s owner, misuse a get(), or drop the result of release() on the floor. I recommend always using MakeUnique instead of new for single-ownership objects.

struct S { S(int i, double d) {} };

UniquePtr<S> s1 = MakeUnique<S>(17, 42.0);   // new S(17, 42.0)
UniquePtr<int> i1 = MakeUnique<int>(42);     // new int(42)
UniquePtr<int[]> i2 = MakeUnique<int[]>(17); // new int[17]()


// Given familiarity with UniquePtr, these work particularly
// well with C++11 auto: just recognize MakeUnique means new,
// T means single object, and T[] means array.
auto s2 = MakeUnique<S>(17, 42.0); // new S(17, 42.0)
auto i3 = MakeUnique<int>(42);     // new int(42)
auto i4 = MakeUnique<int[]>(17);   // new int[17]()

MakeUnique<T>(...args) computes new T(...args). MakeUnique of an array takes an array length and constructs the correspondingly-sized array.

In the long run we probably should expect everyone to recognize the MakeUnique idiom so that we can use auto here and cut down on redundant typing. In the short run, feel free to do whichever you prefer.

Update: Beware! Due to compiler limitations affecting gcc less than 4.6, passing literal nullptr as an argument to a MakeUnique call will fail to compile only on b2g-ics. Everywhere else will pass. You have been warned. The only alternative I can think of is to pass static_cast<T*>(nullptr) instead, or assign to a local variable and pass that instead. Love that b2g compiler!

Conclusion

UniquePtr was a free-time hacking project last Christmas week, that I mostly finished but ran out of steam on when work resumed. Only recently have I found time to finish it up and land it, yet we already have a couple hundred uses of it and MakeUnique. Please add more uses, and make our existing new code safer!

A final note: please use UniquePtr instead of mozilla::Scoped. UniquePtr is more standard, better-tested, and better-documented (particularly on the vast expanses of the web, where most unique_ptr documentation also suffices for UniquePtr). Scoped is now deprecated — don’t use it in new code!

25.07.14

New mach build feature: build-complete notifications on Linux

Tags: , , , , , — Jeff @ 15:19

Spurred on by gps‘s recent mach blogging (and a blogging dry spell to rectify), I thought it’d be worth noting a new mach feature I landed in mozilla-inbound yesterday: build-complete notifications on Linux.

On OS X, mach build spawns a desktop notification when a build completes. It’s handy when the terminal where the build’s running is out of view — often the case given how long builds take. I learned about this feature when stuck on a loaner Mac for a few months due to laptop issues, and I found the notification quite handy. When I returned to Linux, I wanted the same thing there. evilpie had already filed bug 981146 with a patch using DBus notifications, but he didn’t have time to finish it. So I picked it up and did the last 5% to land it. Woo notifications!

(Minor caveat: you won’t get a notification if your build completes in under five minutes. Five minutes is probably too long; some systems build fast enough that you’d never get a notification. gps thinks this should be shorter and ideally configurable. I’m not aware of an existing bug for this.)

15.04.14

Iterating a number sequence for lulz and jail time

Hello, readers! Today I bring you two posts about law: one Mozilla-related, one not. This is the Mozilla-related post. Mozillians may already know this background, but I’ll review for those who don’t.

The “hack”

In 2010 Goatse Security (don’t look them up) discovered a flaw in AT&T’s website. AT&T’s site detected accesses from iPads, extracted a unique account number sent by the iPad, then replied with a private account email address. Account numbers were guessable, so if someone “spoofed” their UA to look like the iPad browser, they could harvest private email addresses using their guesses.

The lulz

Andrew Auernheimer ("weev") wearing an old-school AT&T baseball cap
Andrew Auernheimer, i.e. weev, CC-BY-SA

The people who figured this out were classic Internet trolls interested (to a degree) in minor mayhem (“lulz”) because they could, and they scraped 114000+ email addresses. Eventually Andrew Auernheimer (known online as “weev”) sent the list to Gawker for an exclusive.

The sky is falling!

AT&T, Apple, the people whose addresses had been scraped, and/or the government panicked and freaked out. The government argued that Auernheimer violated the Computer Fraud and Abuse Act, “exceeding authorized access” by UA-spoofing and loading pages using guessed account numbers.

This is a broad interpretation of “authorized access”. Auernheimer evaded no security measures, only accessed public, non-login-protected pages using common techniques. Anyone who could guess the address could view those pages using common browser addons. People guess at the existence of web addresses all the time. This site’s addresses appear of the form “/year/month/day/post-title/”. The monthly archive links to the side on my site have the form “/year/month/”. It’s a good guess that changing these components does what you expect: no dastardly hacking skills required, just logical guesses and experimentation. And automation’s hardly nefarious.

So what’s Mozilla’s brief with this?

Developers UA-spoof all the time for a variety of innocuous reasons. Newspapers have UA-spoofed during online price discrimination investigations. If UA spoofing is a crime, many people not out for lulz are in trouble, subject to a federal attorney’s whims.

The same is true for constructing addresses by modifying embedded numbers. I’ve provided one example. Jesse once wrote a generic implementation of the technique. Wikipedia uses these tactics internally, for example in the Supreme Court infobox template to linkify docket numbers.

Mozilla thus signed onto an amicus brief in the case. The brief laid out the reasons why the actions the government considered criminal, were “commonplace, legitimate techniques”.

The cool part of the brief

I read the brief last summer through one of Auernheimer’s attorneys at the inestimable Volokh Conspiracy. I’ve been lightly meaning to blog about this discussion of number-changing ever since:

Changing the value of X in the AT&T webpage address is trivial to do. For example, to visit this Court’s homepage, one might type the address “http://www.ca3.uscourts.gov/” into the address bar of the browser window. The browser sends an HTTP request to the Court website, which will respond with this Court’s homepage. Changing the “3” to “4” by typing in the browser window address bar returns the Court of Appeals for the Fourth Circuit’s homepage. Changing the “3” to a “12” returns an error message.

Illustrating the number-guessing technique (and implying its limitations in the “12″ part) via the circuit courts’ own websites? Brilliant.

Back to Auernheimer

The court recently threw out Auernheimer’s conviction. Not on CFAA grounds — on more esoteric matters of filing the case in the wrong court. But the opinion contains dicta implying that breaching a password gate or code-based barrier may be necessary to achieve a conviction. The government could bring the case in the right court, but with the implied warning here, it seems risky.

Sympathy

Auernheimer isn’t necessarily a sympathetic defendant. It’s arguably impolite and discourteous to publicly disclose a site vulnerability without giving the site notice and time to fix the issue. It may be “hard to feel sorry for them being handed federal criminal charges” as Ars Technica suggested.

But that doesn’t mean he committed a crime or shouldn’t be defended for doing things web developers often do. Justice means defending people who have broken no laws, when they are threatened with prosecution. It doesn’t mean failing to defend someone just because you don’t like his (legal) actions. Prosecution here was wrong.

One final note

I heard about the AT&T issue and the brief outside Mozilla. I’m unsure what Mozilla channel I should have followed, to observe or discuss the decision to sign onto this brief. Mozilla was right to sign on here. But our input processes for that decision could be better.

04.09.13

mozilla/IntegerPrintfMacros.h now provides PRId32 and friends macros, for printfing uint32_t and so on

Tags: , , , , , — Jeff @ 09:37

Printing numbers using printf

The printf family of functions take a format string, containing both regular text and special formatting specifiers, and at least as many additional arguments as there are formatting specifiers in the format string. Each formatting specifier is supposed to indicate the type of the corresponding argument. Then, via compiler-specific magic, that argument value is accessed and formatted as directed.

C originally only had char, short, int, and long integer types (in signed and unsigned versions). So the original set of format specifiers only supported interpreting arguments as one of those types.

Fixed-size integers

With the rise of <stdint.h>, it’s common to want to print a uint32_t, or an int64_t, or similar. But if you don’t know what type uint32_t is, how do you know what format specifier to use? C99 defines macros in <inttypes.h> that expand to suitable format specifiers. For example, if uint32_t is actually unsigned long, then the PRIu32 macro might be defined as "lu".

uint32_t u = 3141592654;
printf("u: %" PRIu32 "\n", u);

Unfortunately <inttypes.h> isn’t available everywhere. So for now, we have to reimplement it ourselves. The new mfbt header mfbt/IntegerPrintfMacros.h, available via #include "mozilla/IntegerPrintfMacros.h", provides all the PRI* macros exposed by <inttypes.h>: by delegating to that header when present, and by reimplementing it when not. Go use it. (Note that all Mozilla code has __STDC_LIMIT_MACROS, __STDC_FORMAT_MACROS, and __STDC_CONST_MACROS defined, so you don’t need to do anything special to get the macros — just #include "mozilla/IntegerPrintfMacros.h".)

Limitations

The implementations of <inttypes.h> in all the various standard libraries/compilers we care about don’t always provide definitions of these macros that are free of format string warnings. This is, of course, inconceivable. We can reimplement the header as needed to fix these problems, but it seemed best to avoid that til someone really, really cared.

<inttypes.h> also defines format specifiers for fixed-width integers, for use with the scanf family of functions that read a number from a string. IntegerPrintfMacros.h does not provide these macros. (At least, not everywhere. You are not granted any license to use them if they happen to be incidentally provided.) First, it’s actually impossible to implement the entire interface for the Microsoft C runtime library. (For example: no specifier will write a number into an unsigned char*; this is necessary to implement SCNu8.) Second, sscanf is a dangerous function, because if the number in the input string doesn’t fit in the target location, anything (undefined behavior, that is) can happen.

uint8_t u;
sscanf("256", "%" SCNu8, &u); // I just ate ALL YOUR COOKIES

IntegerPrintfMacros.h does implement imaxabs, imaxdiv, strtoimax, strtoumax, wcstoimax, and wcstoumax. I mention this only for completeness: I doubt any Mozilla code needs these.

25.08.13

37 days and one year later: part 2: routine, and shelter

This is part two of a series of posts discussing various aspects of a bike trip I did across the United States in 2012. Part 1 discussed the start of the trip and choosing a route. This post discusses my daily routine and where I sheltered each night.

The daily grind

After the first-day snafu, the trip went basically as planned.

I started biking each day sometime in the morning (from as early as 04:00 to as late as 11:45). I finished sometime before or within a couple hours of dark (in the range of 17:00 to 22:00, dependent on my destination) after typical distances of 90-130 miles. Knowing I was on a marathon, I deliberately never pushed for any real length of time. When I hit an uphill, I shifted to the lowest gear that felt comfortable and kept pedaling; I never attempted to power up a hill. And in flatlands I traveled at whatever pace was comfortable, not aiming for speed.

Cyclocomputer showing 6:27:22, 97.08mi on my last day, at the Atlantic Ocean
Fairly typical stats from the last day

Around home through Bay Area flatlands I usually push myself and average 17-18mph during riding time, depending where and how far I go. On this trip 14-16mph was more common, and I had days well below that. Somewhat hilariously, when I returned I found myself in worse cycling shape by this metric: I was slower than my previous average for awhile, until I could, er, get back into shape. (I also returned well out of shape for playing ultimate frisbee, as I expected would happen from not running and walking little for over a month. When I first played after returning, I had plenty of endurance. But my muscles quickly made it abundantly clear that if I sprinted or made a break, I would hurt myself.)

Shelter

At night I stayed a variety of places. About half the time I camped in a one-man Eureka Solitare tent. (There’s no better 2.5-pound three-season tent out there for its $90 price. Its only demerits are its fiberglass poles [which long ago I was forced to replace with aluminum poles, that have posed no problems] and, occasionally, its not being freestanding.) I slept in a 45-degree bag (too warm!) and a short-length inflatable sleeping pad. These nights were usually in campgrounds, but I stayed in city parks several times in the middle of the country, when allowed. The rest of the time I stayed in motels of varying quality, from $40 to $100+ for the night, sometimes with a meal, sometimes with a pool, sometimes with nothing.

There were a few nights where I neither camped nor stayed at a motel. A local resident of Ordway, CO graciously shared her home with cyclists, and I ended up staying there a night with a couple other cyclists, some heading west, some heading east. The city of Farmington, MO maintains Al’s Place, a hostel for cyclists on the TransAmerica, and I stayed there a night with another cyclist heading east. I also visited The Place, a hostel in Damascus, VA that I’d stopped at while hiking the A.T. And at the end of the trip, in Yorktown, Grace Episcopal Church provided space for cyclists to stay: much appreciated as a base for me to regroup before heading to an airport to fly home.

One additional hostel that I didn’t visit deserves special note. The TransAmerica Trail was first inaugurated in a 1976 mass cross-country ride. One woman along the way, June Curry, put out a sign informing passing cyclists that they could get water at her house if they wanted. One thing snowballed into another, and eventually, somehow, she found herself opening a hostel as a place for passing cyclists to stay, offering much other hospitality as well. Unfortunately June Curry died just before I started my trip, so I couldn’t meet her. :-( But I’d heard the hostel would still be open and running when I passed through, and even if it weren’t, it’d be worth a visit just to learn about the place. The day I’d hoped to stay, however, was the day after my longest day the entire way — which meant I’d roll in fairly late, certainly after dark. I tried calling ahead, multiple times, to see if it’d be okay showing up later. But I couldn’t get a response, and after a last attempt before the sun went down, I gave up and went with alternative lodging. :-(

If my pace were more leisurely, I might have tried out Warm Showers, a site for on-the-road cyclists looking for a place to stay overnight. But as I mostly didn’t know where I’d be til end of day (I set aggressive goals that I didn’t always reach, or only reached late in the evening — see the June Curry story above), the last-minute scheduling seemed way too much hassle for both me and any person who might be willing to host me for a night. It seemed much better to use campgrounds or motels that expect people to spontaneously show up (and more to the point, are specifically paid market rates for it), than to put people hosting mostly for fun through any hassle.

Next time: mileage, elevation, and route scenery.

Older »