26.04.13

mozilla/PodOperations.h: functions for zeroing, assigning to, copying, and comparing plain old data objects

Tags: , , , , , , , — Jeff @ 13:20

Recently I introduced the new header mozilla/PodOperations.h to mfbt, moving its contents out of SpiderMonkey so for general use. This header makes various operations on memory for objects easier and safer.

The problem

Often in C or C++ one might want to set the contents of an object to zero — perhaps to initialize it:

mMetrics = new gfxFont::Metrics;
::memset(mMetrics, 0, sizeof(*mMetrics));

Or perhaps the same might need to be done for a range of objects:

memset(mTreeData.Elements(), 0, mTreeData.Length() * sizeof(mTreeData[0]));

Or perhaps one might want to set the contents of an object to those of another object:

memcpy(&e, buf, sizeof(e));

Or perhaps a range of objects must be copied:

memcpy(to + aOffset, aBuffer, aLength * sizeof(PRUnichar));

Or perhaps a range of objects must be memory-equivalence-compared:

return memcmp(k->chars(), l->chars(), k->length() * sizeof(jschar)) == 0;

What do all these cases have in common? They all require using a sizeof() operation.

The problem

C and C++, as low-level languages very much focused on the actual memory, place great importance in the size of an object. Programmers often think much less about sizes. It’s pretty easy to write code without having to think about memory. But some cases require it, and because it doesn’t happen regularly, it’s easy to make mistakes. Even experienced programmers can screw it up if they don’t think carefully.

This is particularly likely for operations on arrays of objects. If the object’s size isn’t 1, forgetting a sizeof means an array of objects might not be completely cleared, copied, or compared. This has led to Mozilla security bugs in the past. (Although, the best I can find now is bug 688877, which doesn’t use quite the same operations, and can’t be solved with these methods, but which demonstrates the same sort of issue.)

The solution

Using the prodigious magic of C++ templates, the new mfbt/PodOperations.h abstracts away the sizeof in all the above examples, implements bounds-checking assertions as appropriate, and is type-safe (doesn’t require implicit casts to void*).

  • Zeroing
    • PodZero(T* t): set the contents of *t to 0
    • PodZero(T* t, size_t count): set the contents of count elements starting at t to 0
    • PodArrayZero(T (&t)[N]): set the contents of the array t (with a compile-time size) to 0
  • Assigning
    • PodAssign(T* dst, const T* src): set the contents of *dst to *src — locations can’t overlap (no self-assignments)
  • Copying
    • PodCopy(T* dst, const T* src, size_t count): copy count elements starting at src to dst — ranges can’t overlap
  • Comparison
    • PodEqual(const T* one, const T* two, size_t len): true or false indicating whether len elements at one are memory-identical to len elements at two

Random questions

Why “Pod”?

POD is a C++ term of art abbreviation for “plain old data”. A type that’s plain old data is, roughly: a built-in type; a pointer or enum that’s represented like a built-in type; a user-defined class without any virtual methods or inheritance or user-defined constructors or destructors (including in any of its base classes), whose non-static members are themselves plain old data; or an array of a type that’s plain old data. (There are a couple other restrictions that don’t matter here and would take too long to explain anyway.)

One implication of a type being POD is that (systemic interactions aside) you can copy an object of that type using memcpy. The file and method names simply play on that. Arguably it’s not the best, clearest term in the world — especially as these methods aren’t restricted to POD types. (One intended use is for initializing classes that are non-POD, where the initial state is fully-zeroed.) But it roughly gets the job done, no better names quickly spring to mind, and renaming would have been pain without much gain.

What are all these “optimizations” in these methods?

When these operations were added to SpiderMonkey a few years ago, various people (principally Luke, if I remember right) benchmarked these operations when used in various places in SpiderMonkey. It turned out that “trivial” uses of memcmp, &c. wouldn’t always be optimally compiled by the compiler to fast, SIMD-optimizable loops. Thus we introduced special cases. Newer compilers may do better, such that we have less need for the optimizations. But the worst that happens with them is slightly slower code — not correctness bugs. If you have real-world data (inchoate fears don’t count 🙂 ) showing these optimizations aren’t needed now, file a bug and we can adapt them as needed.

7 Comments »

  1. Why did you need a separately named function to zero a whole array?

    Comment by Neil Rashbrook — 30.04.13 @ 03:32

  2. This comment from PodOperations.h should be informative:

    /*
     * Arrays implicitly convert to pointers to their first element, which is
     * dangerous when combined with the above PodZero definitions.  Adding an
     * overload for arrays is ambiguous, so we need another identifier.  The
     * ambiguous overload is left to catch mistaken uses of PodZero; if you get a
     * compile error involving PodZero and array types, use PodArrayZero instead.
     */
    template<typename T, size_t N>
    static void PodZero(T (&t)[N]) MOZ_DELETE;
    template<typename T, size_t N>
    static void PodZero(T (&t)[N], size_t nelem) MOZ_DELETE;
    

    I considered adding those details, then figured people would either ask 🙂 or look at the header for an answer. Plus I have a tendency to be too wordy, so erring on the side of saying less seemed a good idea. 😉

    Comment by Jeff — 30.04.13 @ 08:53

  3. A dumb question: is NSPR still the low-level runtime for Firefox code, and if so does Mozilla Framework Based on Templates add to it or replace it?

    Also, your date format is confusing, is today 05.06.13 or 06.05.13? You’re in the USA but seem to be using Euro-style DD.MM.YY. ISO8601 please!

    Comment by skierpage — 05.05.13 @ 23:45

  4. NSPR is an ancient codebase and stable API that’s purely C. This makes it unsafer in many places, because you can’t use C++ features like destructors. Also, its data structures have to be opaque pointers accessed with standalone methods — much less convenient, and harder to use, than C++ classes with methods accessible. Additionally NSPR is treated as effectively a third-party project — nsprpub/ in mozilla-central is an unmodified import of the NSPR codebase. This makes it much harder to modify it with any real speed, which is sometimes necessary for security fixes.

    For that reason, new moderately-fundamental functionality we tend to put in mfbt. The result is cleaner C++ APIs that are much nicer to work with.

    I’m not sure I’d say NSPR is “the” low-level runtime, or that mfbt adds to it or replaces it. They’re sibling things, serving somewhat different use cases. Given NSPR’s API is fugly C, and not nicer C++, it might be nice to do away with it at some point. That point’s a long time out, tho, if it ever happens.

    Dates here are European-style, because I took a German class once and got in the habit of writing my dates that way. 🙂 I’m not sure I care about making them unambiguous enough to change, but it might happen sometime.

    Comment by Jeff — 06.05.13 @ 13:23

  5. Well much to my surprise VC2008 seems to compile my test program correctly:

    template <typename T>
    static inline void ZeroFill(T &t)
    {
      memset(&t, sizeof(T), 0);
    }
    
    void test()
    {
      char *cp;
      char ca[1];
      ZeroFill(cp); // fills 4/8 bytes
      ZeroFill(ca); // fills 1 byte
    }
    

    No specialisation needed?

    Comment by Neil Rashbrook — 07.05.13 @ 05:03

  6. The point of the specialization in the header was to make obvious whether you were clearing an array, or clearing the first element of it. Having one method that handles both cases is possible. It’s just not as explicit as would be nice, about what behavior it’s doing. (Especially if you take templates and the like into account.)

    Comment by Jeff — 08.05.13 @ 15:47

  7. […] Mozilla code has provided and promoted a PodZero function that misuses memset this way. So when I built with gcc 8.0 recently (I usually […]

    Pingback by Where's Walden? » PSA: stop using mozilla::PodZero and mozilla::PodArrayZero — 21.05.18 @ 13:08

RSS feed for comments on this post. TrackBack URI

Leave a comment

HTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>