Recently I introduced the new header
mozilla/PodOperations.h to mfbt, moving its contents out of SpiderMonkey so for general use. This header makes various operations on memory for objects easier and safer.
Often in C or C++ one might want to set the contents of an object to zero — perhaps to initialize it:
mMetrics = new gfxFont::Metrics; ::memset(mMetrics, 0, sizeof(*mMetrics));
Or perhaps the same might need to be done for a range of objects:
memset(mTreeData.Elements(), 0, mTreeData.Length() * sizeof(mTreeData));
Or perhaps one might want to set the contents of an object to those of another object:
memcpy(&e, buf, sizeof(e));
Or perhaps a range of objects must be copied:
memcpy(to + aOffset, aBuffer, aLength * sizeof(PRUnichar));
Or perhaps a range of objects must be memory-equivalence-compared:
return memcmp(k->chars(), l->chars(), k->length() * sizeof(jschar)) == 0;
What do all these cases have in common? They all require using a
C and C++, as low-level languages very much focused on the actual memory, place great importance in the size of an object. Programmers often think much less about sizes. It’s pretty easy to write code without having to think about memory. But some cases require it, and because it doesn’t happen regularly, it’s easy to make mistakes. Even experienced programmers can screw it up if they don’t think carefully.
This is particularly likely for operations on arrays of objects. If the object’s size isn’t
1, forgetting a
sizeof means an array of objects might not be completely cleared, copied, or compared. This has led to Mozilla security bugs in the past. (Although, the best I can find now is bug 688877, which doesn’t use quite the same operations, and can’t be solved with these methods, but which demonstrates the same sort of issue.)
Using the prodigious magic of C++ templates, the new
mfbt/PodOperations.h abstracts away the
sizeof in all the above examples, implements bounds-checking assertions as appropriate, and is type-safe (doesn’t require implicit casts to
PodZero(T* t): set the contents of
PodZero(T* t, size_t count): set the contents of
countelements starting at
PodArrayZero(T (&t)[N]): set the contents of the array
t(with a compile-time size) to 0
PodAssign(T* dst, const T* src): set the contents of
*src— locations can’t overlap (no self-assignments)
PodCopy(T* dst, const T* src, size_t count): copy
countelements starting at
dst— ranges can’t overlap
PodEqual(const T* one, const T* two, size_t len):
oneare memory-identical to
POD is a C++ term of art abbreviation for “plain old data”. A type that’s plain old data is, roughly: a built-in type; a pointer or
enum that’s represented like a built-in type; a user-defined class without any virtual methods or inheritance or user-defined constructors or destructors (including in any of its base classes), whose non-
static members are themselves plain old data; or an array of a type that’s plain old data. (There are a couple other restrictions that don’t matter here and would take too long to explain anyway.)
One implication of a type being POD is that (systemic interactions aside) you can copy an object of that type using
memcpy. The file and method names simply play on that. Arguably it’s not the best, clearest term in the world — especially as these methods aren’t restricted to POD types. (One intended use is for initializing classes that are non-POD, where the initial state is fully-zeroed.) But it roughly gets the job done, no better names quickly spring to mind, and renaming would have been pain without much gain.
What are all these “optimizations” in these methods?
When these operations were added to SpiderMonkey a few years ago, various people (principally Luke, if I remember right) benchmarked these operations when used in various places in SpiderMonkey. It turned out that “trivial” uses of
memcmp, &c. wouldn’t always be optimally compiled by the compiler to fast, SIMD-optimizable loops. Thus we introduced special cases. Newer compilers may do better, such that we have less need for the optimizations. But the worst that happens with them is slightly slower code — not correctness bugs. If you have real-world data (inchoate fears don’t count ) showing these optimizations aren’t needed now, file a bug and we can adapt them as needed.