04.09.13

mozilla/IntegerPrintfMacros.h now provides PRId32 and friends macros, for printfing uint32_t and so on

Tags: , , , , , — Jeff @ 09:37

Printing numbers using printf

The printf family of functions take a format string, containing both regular text and special formatting specifiers, and at least as many additional arguments as there are formatting specifiers in the format string. Each formatting specifier is supposed to indicate the type of the corresponding argument. Then, via compiler-specific magic, that argument value is accessed and formatted as directed.

C originally only had char, short, int, and long integer types (in signed and unsigned versions). So the original set of format specifiers only supported interpreting arguments as one of those types.

Fixed-size integers

With the rise of <stdint.h>, it’s common to want to print a uint32_t, or an int64_t, or similar. But if you don’t know what type uint32_t is, how do you know what format specifier to use? C99 defines macros in <inttypes.h> that expand to suitable format specifiers. For example, if uint32_t is actually unsigned long, then the PRIu32 macro might be defined as "lu".

uint32_t u = 3141592654;
printf("u: %" PRIu32 "\n", u);

Unfortunately <inttypes.h> isn’t available everywhere. So for now, we have to reimplement it ourselves. The new mfbt header mfbt/IntegerPrintfMacros.h, available via #include "mozilla/IntegerPrintfMacros.h", provides all the PRI* macros exposed by <inttypes.h>: by delegating to that header when present, and by reimplementing it when not. Go use it. (Note that all Mozilla code has __STDC_LIMIT_MACROS, __STDC_FORMAT_MACROS, and __STDC_CONST_MACROS defined, so you don’t need to do anything special to get the macros — just #include "mozilla/IntegerPrintfMacros.h".)

Limitations

The implementations of <inttypes.h> in all the various standard libraries/compilers we care about don’t always provide definitions of these macros that are free of format string warnings. This is, of course, inconceivable. We can reimplement the header as needed to fix these problems, but it seemed best to avoid that til someone really, really cared.

<inttypes.h> also defines format specifiers for fixed-width integers, for use with the scanf family of functions that read a number from a string. IntegerPrintfMacros.h does not provide these macros. (At least, not everywhere. You are not granted any license to use them if they happen to be incidentally provided.) First, it’s actually impossible to implement the entire interface for the Microsoft C runtime library. (For example: no specifier will write a number into an unsigned char*; this is necessary to implement SCNu8.) Second, sscanf is a dangerous function, because if the number in the input string doesn’t fit in the target location, anything (undefined behavior, that is) can happen.

uint8_t u;
sscanf("256", "%" SCNu8, &u); // I just ate ALL YOUR COOKIES

IntegerPrintfMacros.h does implement imaxabs, imaxdiv, strtoimax, strtoumax, wcstoimax, and wcstoumax. I mention this only for completeness: I doubt any Mozilla code needs these.

30.04.13

Introducing mozilla::Abs to mfbt

Tags: , , , , , , , , , — Jeff @ 08:17

Computing absolute values in C/C++

C includes various functions for computing the absolute value of a signed number. C++98 implementations add the C functions to namespace std, and it adds abs() overloads to namespace std so std::abs works on everything. For a long time Mozilla used NS_ABS to compute absolute value, but recently we switched to std::abs. This works on many systems, but it has a few issues.

Issues with std::abs

std::abs is split across two headers

With some compilers, the integral overloads are in <cstdlib> and the floating point overloads are in <cmath>. This led to confusion when std::abs compiled on one type but not on another, in the same file. (Or worse, when it worked with just one #include because of that developer’s compiler.) The solution was to include both headers even if only one was needed. This is pretty obscure.

std::abs(int64_t) doesn’t work everywhere

On many systems <stdint.h> has typedef long long int64_t;. But long long was only added in C99 and C++11, and some compilers don’t have long long std::abs(long long), so int64_t i = 0; std::abs(i); won’t compile. We “solved” this with compiler-specific #ifdefs around custom std::abs specializations in a somewhat-central header. (That’s three headers to include!) C++ says this has undefined behavior, and indeed it’ll break as we update compilers.

std::abs(int32_t(INT32_MIN)) doesn’t work

The integral abs overloads don’t work on the most-negative value of each signed integer type. On twos-complement machines (nearly everything), the absolute value of the smallest integer of a signed type won’t fit in that type. (For example, INT8_MIN is -128, INT8_MAX is +127, and +128 won’t fit in int8_t.) The integral abs functions take and return signed types. If the smallest integer flows through, behavior is undefined: as absolute-value is usually implemented, the value is returned unchanged. This has caused Mozilla bugs.

Mozilla code should use mozilla::Abs, not std::abs

Unfortunately the only solution is to implement our own absolute-value function. mozilla::Abs in "mozilla/MathAlgorithms.h" is overloaded for all signed integral types and the floating point types, and the integral overloads return the unsigned type. Thus you should use mozilla::Abs to compute absolute values. Be careful about signedness: don’t assign directly into a signed type! That loses mozilla::Abs‘s ability to accept all inputs and will cause bugs. Ideally this would be a compiler warning, but we don’t use -Wconversion or Microsoft equivalents and so can’t do better.

26.04.13

mozilla/PodOperations.h: functions for zeroing, assigning to, copying, and comparing plain old data objects

Tags: , , , , , , , — Jeff @ 13:20

Recently I introduced the new header mozilla/PodOperations.h to mfbt, moving its contents out of SpiderMonkey so for general use. This header makes various operations on memory for objects easier and safer.

The problem

Often in C or C++ one might want to set the contents of an object to zero — perhaps to initialize it:

mMetrics = new gfxFont::Metrics;
::memset(mMetrics, 0, sizeof(*mMetrics));

Or perhaps the same might need to be done for a range of objects:

memset(mTreeData.Elements(), 0, mTreeData.Length() * sizeof(mTreeData[0]));

Or perhaps one might want to set the contents of an object to those of another object:

memcpy(&e, buf, sizeof(e));

Or perhaps a range of objects must be copied:

memcpy(to + aOffset, aBuffer, aLength * sizeof(PRUnichar));

Or perhaps a range of objects must be memory-equivalence-compared:

return memcmp(k->chars(), l->chars(), k->length() * sizeof(jschar)) == 0;

What do all these cases have in common? They all require using a sizeof() operation.

The problem

C and C++, as low-level languages very much focused on the actual memory, place great importance in the size of an object. Programmers often think much less about sizes. It’s pretty easy to write code without having to think about memory. But some cases require it, and because it doesn’t happen regularly, it’s easy to make mistakes. Even experienced programmers can screw it up if they don’t think carefully.

This is particularly likely for operations on arrays of objects. If the object’s size isn’t 1, forgetting a sizeof means an array of objects might not be completely cleared, copied, or compared. This has led to Mozilla security bugs in the past. (Although, the best I can find now is bug 688877, which doesn’t use quite the same operations, and can’t be solved with these methods, but which demonstrates the same sort of issue.)

The solution

Using the prodigious magic of C++ templates, the new mfbt/PodOperations.h abstracts away the sizeof in all the above examples, implements bounds-checking assertions as appropriate, and is type-safe (doesn’t require implicit casts to void*).

  • Zeroing
    • PodZero(T* t): set the contents of *t to 0
    • PodZero(T* t, size_t count): set the contents of count elements starting at t to 0
    • PodArrayZero(T (&t)[N]): set the contents of the array t (with a compile-time size) to 0
  • Assigning
    • PodAssign(T* dst, const T* src): set the contents of *dst to *src — locations can’t overlap (no self-assignments)
  • Copying
    • PodCopy(T* dst, const T* src, size_t count): copy count elements starting at src to dst — ranges can’t overlap
  • Comparison
    • PodEqual(const T* one, const T* two, size_t len): true or false indicating whether len elements at one are memory-identical to len elements at two

Random questions

Why “Pod”?

POD is a C++ term of art abbreviation for “plain old data”. A type that’s plain old data is, roughly: a built-in type; a pointer or enum that’s represented like a built-in type; a user-defined class without any virtual methods or inheritance or user-defined constructors or destructors (including in any of its base classes), whose non-static members are themselves plain old data; or an array of a type that’s plain old data. (There are a couple other restrictions that don’t matter here and would take too long to explain anyway.)

One implication of a type being POD is that (systemic interactions aside) you can copy an object of that type using memcpy. The file and method names simply play on that. Arguably it’s not the best, clearest term in the world — especially as these methods aren’t restricted to POD types. (One intended use is for initializing classes that are non-POD, where the initial state is fully-zeroed.) But it roughly gets the job done, no better names quickly spring to mind, and renaming would have been pain without much gain.

What are all these “optimizations” in these methods?

When these operations were added to SpiderMonkey a few years ago, various people (principally Luke, if I remember right) benchmarked these operations when used in various places in SpiderMonkey. It turned out that “trivial” uses of memcmp, &c. wouldn’t always be optimally compiled by the compiler to fast, SIMD-optimizable loops. Thus we introduced special cases. Newer compilers may do better, such that we have less need for the optimizations. But the worst that happens with them is slightly slower code — not correctness bugs. If you have real-world data (inchoate fears don’t count :-) ) showing these optimizations aren’t needed now, file a bug and we can adapt them as needed.

26.12.11

Introducing mozilla/Assertions.h to mfbt

Recently I landed changes to the Mozilla Framework Based on Templates (mfbt) implementing Assertions.h, the new canonical assertions implementation in C/C++ Mozilla code.

Runtime assertions

Using assertions, a developer can efficiently detect when his code goes awry because internal invariants were broken. Mozilla has many assertion facilities. NS_ASSERTION is the oldest, but unfortunately it can be ignored, and therefore historically has been. We later introduced NS_ABORT_IF_FALSE as as an actual assertion that fails hard, and it’s now widely used. But it’s quite unwieldy, and it can’t be used by code that doesn’t want to depend on XPCOM. (Who would?)

mfbt addresses latent concerns with existing runtime assertions by introducing MOZ_ASSERT, MOZ_ASSERT_IF, MOZ_ALWAYS_TRUE, MOZ_ALWAYS_FALSE, and MOZ_NOT_REACHED macros to make performing true assertions dead simple.

MOZ_ASSERT(expr) and MOZ_ASSERT_IF(ifexpr, expr)

MOZ_ASSERT(expr) is straightforward: pass an expression as its sole argument, and in debug builds, if that expression is falsy, the assertion fails and execution halts in a debuggable way.

#include "mozilla/Assertions.h"

void frobnicate(Thing* thing)
{
  MOZ_ASSERT(thing);
  thing->frob();
}

MOZ_ASSERT_IF(ifexpr, expr) addresses the case when you want to assert something, where the check to decide whether to assert isn’t otherwise needed. You’d rather not muddy up your code by adding an #ifdef and if statement around your assertion. (MOZ_ASSERT(!ifexpr || expr) is a workaround, but it’s not very readable.) SpiderMonkey’s experience suggests Mozilla code will get good mileage from MOZ_ASSERT_IF.

#include "mozilla/Assertions.h"

class Error
{
    const char* optionalDescription;

  public:
    /* If specified, desc must not be empty. */
    Error(const char* desc = NULL)
    {
      MOZ_ASSERT_IF(desc != NULL, desc[0] != '\0');
      optionalDescription = desc;
    }
};

MOZ_ALWAYS_TRUE(expr) and MOZ_ALWAYS_FALSE(expr)

Sometimes the expression for an assertion must always be executed for its side effects, and it can’t just be executed in debug builds. MOZ_ALWAYS_TRUE(expr) and MOZ_ALWAYS_FALSE(expr) support this idiom. These macros always evaluate their argument, and in debug builds that argument is asserted truthy or falsy.

#include "mozilla/Assertions.h"

/* JS_ValueToBoolean was fallible but no longer is. */
MOZ_ALWAYS_TRUE(JS_ValueToBoolean(cx, v, &b));

MOZ_NOT_REACHED(reason)

MOZ_NOT_REACHED(reason) indicates that the given point can’t be reached during execution: simply hitting it is a bug. (Think of it as a more-explicit form of asserting false.) It takes as an argument an explanation of why that point shouldn’t have been reachable.

#include "mozilla/Assertions.h"

// ...in a language parser...
void handle(BooleanLiteralNode node)
{
  if (node.isTrue())
    handleTrueLiteral();
  else if (node.isFalse())
    handleFalseLiteral();
  else
    MOZ_NOT_REACHED("boolean literal that's not true or false?");
}

Compile-time assertions

Most assertions must happen at runtime. But some assertions are static, depending only on constants, and could be checked during compilation. A compile time check is better than a runtime check: the developer need not ensure a test exercises that code, because the compiler itself enforces the assertion. Properly crafted static assertions can never be unwittingly broken.

MOZ_STATIC_ASSERT(cond, reason)

MOZ_STATIC_ASSERT(cond, reason) asserts a condition at compile time. In newer compilers, if the assertion fails, the compiler will also include reason in diagnostics.

#include "mozilla/Assertions.h"

struct S { ... };
MOZ_STATIC_ASSERT(sizeof(S) % sizeof(size_t) == 0,
                  "S should be a multiple of word size for efficiency");

MOZ_STATIC_ASSERT is implemented with an impressive pile of hacks which should work perfectly everywhere — except, rarely, gcc 4.2 (the current OS X compiler) when compiling C++ code. The failure mode requires MOZ_STATIC_ASSERT on line N not in an extern "C" code block and a second MOZ_STATIC_ASSERT on the same line N (in a different file) in an extern "C" block. And those two files have to be used when compiling a single file, with the extern "C"‘d assertion second. This seems improbable, so we’ll risk it til we drop gcc 4.2 support.

Possible improvements

Assertions.h is reasonably complete, but I have a few ideas I’ve been considering for improvements. Let me know what you think of them in comments.

Add an optional reason argument to MOZ_ASSERT, and maybe to MOZ_ALWAYS_TRUE and MOZ_ALWAYS_FALSE

MOZ_ASSERT takes only the condition to test as an argument. In contrast NS_ASSERTION and NS_ABORT_IF_FALSE take the condition and an explanatory string. MOZ_ASSERT‘s lack of explanation derives purely from its ancestry in the JS_ASSERT macro: it wasn’t deliberate.

Would it be helpful if MOZ_ASSERT, MOZ_ALWAYS_TRUE, and MOZ_ALWAYS_FALSE optionally took a reason? (Optional because some assertions, e.g. many non-null assertions, are self-explanatory.) We’d have to disable assertions for compilers not implementing variadic macros (I think our supported compilers implement them), or possibly lose the condition expression in the assertion failure message. A reason would make it easier to convert existing NS_ABORT_IF_FALSEs to MOZ_ASSERTs. Should we add an optional second argument to MOZ_ASSERT and the others?

Include __assume(0) or __builtin_unreachable() in MOZ_NOT_REACHED

__builtin_unreachable() and __assume(0) inform the compiler that subsequent code can’t be executed, providing optimization opportunities. It’s unclear how this affects debugging feedback like stack traces. If the optimizations destroy Breakpad-ability, that may be too big a loss. More research is needed here.

Another possibility might be to use __builtin_trap(). This may not communicate an optimization opportunity comparable to that provided by the other two options. (It can’t be equally informative because execution must be able to continue past a trap. Thus the two have different impacts on variable lifetimes. Whether __builtin_trap otherwise communicates “unlikelihood” well enough isn’t clear.) Perhaps __builtin_trap could be used in debug builds, and __builtin_unreachable could be used in optimized builds. Again: more research needed.

Use C11′s _Static_assert in MOZ_STATIC_ASSERT

New editions of C and C++ include built-in static assertion syntax. MOZ_STATIC_ASSERT expands to C++11′s static_assert(2 + 2 == 4, "ya rly") syntax when possible. It could be made to expand to C11′s _Static_assert('A' == 'A', "no wai") syntax in some other cases, but frankly I don’t hack enough C code to care as long as the static assertion actually happens. :-) This is bug 713531 if you’re interested in investigating.

Want more information?

Read Assertions.h. mfbt code has a high standard for code comments in interface descriptions, and for file names (the current Util.h being a notable exception which will be fixed). We want it to be reasonably obvious where to find what you need and how to use it by skimming mfbt/‘s contents and then skimming a few files’ contents. Good comments are key to that. You should find Assertions.h quite readable; please file a bug if you have improvements to suggest.

26.11.11

Introducing MOZ_FINAL: prevent inheriting from a class, or prevent overriding a virtual function

Tags: , , , , , , , — Jeff @ 09:17

The inexorable march of progress continues in the Mozilla Framework Based on Templates with the addition of MOZ_FINAL, through which you can limit various forms of inheritance in C++.

Traditional C++ inheritance mostly can’t be controlled

In C++98 any class can be inherited. (An extremely obscure pattern will prevent this, but it has down sides.) Sometimes this makes sense: it’s natural to subclass an abstract List class as LinkedList, or LinkedList as CircularLinkedList. But sometimes this doesn’t make sense. StringBuilder certainly could inherit from Vector<char>, but doing so might expose many Vector methods that don’t belong in the StringBuilder concept. It would be more sensible for StringBuilder to contain a private Vector<char> which StringBuilder member methods manipulated. Preventing Vector from being used as a base class would be one way (not necessarily the best one) to avoid this conceptual error. But C++98 doesn’t let you easily do that.

Even when inheritance is desired, sometimes you don’t want completely-virtual methods. Sometimes you’d like a class to implement a virtual method (virtual perhaps because its base declared it so) which derived classes can’t override. Perhaps you want to rely on that method being implemented only by your base class, or perhaps you want it “fixed” as an optimization. Again in C++98, you can’t do this: public virtual functions are overridable.

C++11′s contextual final keyword

C++11 introduces a contextual final keyword for these purposes. To prevent a class from being inheritable, add final to its definition just after the class name (the class can’t be unnamed).

struct Base1 final
{
  virtual void foo();
};

// ERROR: can't inherit from B.
struct Derived1 : public Base1 { };

struct Base2 { };

// Derived classes can be final too.
struct Derived2 final : public Base2 { };

Similarly, a virtual member function can be marked as not overridable by placing the contextual final keyword at the end of its declaration, before a terminating semicolon, body, or = 0.

struct Base
{
  virtual void foo() final;
};

struct Derived : public Base
{
  // ERROR: Base::foo was final.
  virtual void foo() { }
};

Introducing MOZ_FINAL

mfbt now includes support for marking classes and virtual member functions as final using the MOZ_FINAL macro in mozilla/Attributes.h. Simply place it in the same position as final would occur in the C++11 syntax:

#include "mozilla/Attributes.h"

class Base
{
  public:
    virtual void foo();
};

class Derived final : public Base
{
  public:
    /*
     * MOZ_FINAL and MOZ_OVERRIDE are composable; as a matter of
     * style, they should appear in the order MOZ_FINAL MOZ_OVERRIDE,
     * not the other way around.
     */
    virtual void foo() MOZ_FINAL MOZ_OVERRIDE { }
    virtual void bar() MOZ_FINAL;
    virtual void baz() MOZ_FINAL = 0;
};

MOZ_FINAL expands to the C++11 syntax or its compiler-specific equivalent whenever possible, turning violations of final semantics into compile-time errors. The same compilers that usefully expand MOZ_OVERRIDE also usefully expand MOZ_FINAL, so misuse will be quickly noted.

One interesting use for MOZ_FINAL is to tell the compiler that one worrisome C++ trick sometimes isn’t. This is the virtual-methods-without-virtual-destructor trick. It’s used when a class must have virtual functions, but code doesn’t want to pay the price of destruction having virtual-call overhead.

class Base
{
  public:
    virtual void method() { }
    ~Base() { }
};

void destroy(Base* b)
{
  delete b; // may cause a warning
}

Some compilers warn when they instantiate a class with virtual methods but without a virtual destructor. Other compilers only emit this warning when a pointer to a class instance is deleted. The reason is that in C++, behavior is undefined if the static type of the instance being deleted isn’t the same as its runtime type and its destructor isn’t virtual. In other words, if ~Base() is non-virtual, destroy(new Base) is perfectly fine, but destroy(new DerivedFromBase) is not. The warning makes sense if destruction might miss a base class — but if the class is marked final, it never will! Clang silences its warning if the class is final, and I hope that MSVC will shortly do the same.

What about NS_FINAL_CLASS?

As with MOZ_OVERRIDE we had a gunky XPCOM static-analysis NS_FINAL_CLASS macro for final classes. (We had no equivalent for final methods.) NS_FINAL_CLASS too was misplaced as far as C++11 syntax was concerned, and it too has been deprecated. Almost all uses of NS_FINAL_CLASS have now been removed (the one remaining use I’ve left for the moment due to an apparent Clang bug I haven’t tracked down yet), and it shouldn’t be used.

(Side note: In replacing NS_FINAL_CLASS with MOZ_FINAL, I discovered that some of the existing annotations have been buggy for months! Clearly no one’s done static analysis builds in awhile. The moral of the story: compiler static analyses that happen for every single build are vastly superior to user static analyses that happen only in special builds.)

Summary

If you don’t want a class to be inheritable, add MOZ_FINAL to its definition after the class name. If you don’t want a virtual member function to be overridden in derived classes, add MOZ_FINAL at the end of its declaration. Some compilers will then enforce your wishes, and you can rely on these requirements rather than hope for the best.

Older »