19.04.12

Introducing mozilla/FloatingPoint.h: methods for floating point types and values

Tags: , , , , , , — Jeff @ 19:00

The latest addition to the Mozilla Framework Based on Templates (mfbt) is mozilla/FloatingPoint.h. This header implements various floating point functionality.

Functionality overview

mozilla/FloatingPoint.h currently implements the following functionality, all centered around working with double-precision floating point numbers. (There’s no single-precision support only because Mozilla seemingly doesn’t need it. The only code I can find that does this sort of thing for single-precision numbers is nsCoord.h, and that only barely. We can add single-precision equivalents when we need them.)

MOZ_DOUBLE_IS_NaN(double d)
Determines whether a value is NaN (not a number).
MOZ_DOUBLE_IS_INFINITE(double d)
Determines whether a value is positive or negative infinity.
MOZ_DOUBLE_IS_FINITE(double d)
Determines whether a number is finite — that is, not NaN or positive or negative infinity.
MOZ_DOUBLE_IS_NEGATIVE(double d)
Determines whether a number is negative. This is useful because d < 0 does not answer this question! There are two zero values, +0 and -0, and IEEE-754 requires that (-0 < 0) be false. (There are good reasons for this, but this isn’t the place to get into them. If you haven’t read it, read What Every Computer Scientist Should Know About Floating-Point Arithmetic right now. It probably gives the answer, and much much more knowledge as well.) This method properly distinguishes -0 as being negative.
MOZ_DOUBLE_IS_NEGATIVE_ZERO(double d)
Determines whether a number is -0.
MOZ_DOUBLE_EXPONENT(double d)

Returns the exponent portion of the number. Floating point numbers are represented as a sign bit s, a binary fraction b0..p, and an exponent E. The represented number, then, is (-1)s(b0..p)2E. This method returns the number E for a floating point number.
MOZ_DOUBLE_POSITIVE_INFINITY()
Returns positive infinity.
MOZ_DOUBLE_NEGATIVE_INFINITY()
Returns negative infinity.
MOZ_DOUBLE_SPECIFIC_NaN(int signbit, uint64_t significand)
Computes a specific NaN value, with a bit pattern specified by provided parameters. The bit layout IEEE-754 specifies for floating point formats interestingly requires that multiple bit patterns be treated as NaN values. This method allows the user to create such custom NaN values if he needs to. (99% of code should never, ever touch this method. Instead, most code should use…)
MOZ_DOUBLE_NaN()

Computes an unspecified NaN value. If you need a NaN value and you don’t know that you need a specific NaN, use this method instead to get one.
MOZ_DOUBLE_MIN_VALUE()
Returns the smallest non-zero positive double value.
MOZ_DOUBLE_IS_INT32(double d, int32_t* i)
Determines if the provided number is a signed 32-bit integer. (-0 doesn’t count as such; +0, the “normal” zero value, does.) If it is, *i will be set to that value when the method returns.

(There’s one more method in the header, currently, that used to have users. Sometime in the last couple months, however, that method’s users all disappeared, and I didn’t notice it when rebasing. Thus I’ll be removing it shortly, and I haven’t mentioned it here.)

Why add some of these methods? Aren’t isinf, isnan, and so on good enough?

There are standard methods implementing some (but not all) of this functionality. In the best of all possible worlds we could simply use isnan and other such methods directly. In practice we’ve encountered a number of problems.

First, Microsoft’s compilers gratuitously Think Different and don’t expose isnan and friends, so on those platforms you’d have to use _isnan instead. (std::isnan isn’t usable there because some of our code still must work as both C and C++.) Obviously, we don’t want to #ifdef every place we need to use the method.

Second, we’ve found various compilers have bugs when using either the standard methods or Microsoft’s bogo-named methods. Most commonly this bustage occurs with PGO builds; interestingly, both MSVC and gcc have problems here, despite their optimizers obviously not sharing any code.

Third, we’ve found even some obvious bitwise algorithms trigger PGO build failures, again on entirely different compilers. (You can’t win.)

Basically, then, we can’t use the standard methods, we can’t use some bitwise methods, and whatever we do we have to be really careful about to make sure we don’t break anything. Hopefully this header will satisfy those requirements.

Where’s this header again?

The header is located at mfbt/FloatingPoint.h in the source tree. However, per standard mfbt practice, you should use #include "mozilla/FloatingPoint.h" to include it. Knock yourself out using it.

13.03.12

Memes

WebKit (well, one twisted soul, I think) has started a meme collection. It’s true that Mozilla has a quotes database, which sometimes we are even gracious enough to share with them [more here]). But we have no meme collection.

Gentlemen, we must not allow a meme gap!
Buck Turgidson does not approve.

Mozilla Internetizens, fix this. Pronto.

UPDATE: jdm follows through with Mozilla Memes. Next step: ADDRESS THE GAP.

UPDATE 2: The gauntlet has been thrown down.

02.03.12

25.01.12

SpiderMonkey no longer supports sharp variables

Tags: , , , , — Jeff @ 10:17

ECMAScript object literals

ECMAScript, the standard underlying the JavaScript language, can represent simple objects and arrays using literals.

var arr = [1, 2, 3];
var obj = { property: "ohai" };

But it can’t represent all objects and arrays.

Cyclic and non-tree objects

ECMAScript literals can’t represent circular objects and arrays which (perhaps at some nesting distance) have properties referring to themselves (in other words, the objects form cyclic graphs). Nor can they (faithfully) represent objects which contain some other object multiple times (in other words, the objects form a directed acyclic graph which is not also a tree).

var arr = [1, "overwritten", 3];
arr[1] = arr; // cyclic
var obj = { property: "ohai", nest: {} };
obj.nest.parent = obj; // cyclic
obj.secondCopy = obj.nest; // non-tree, obj.nest is repeated

Sharp variables

SpiderMonkey historically supported extension syntax to represent such graphs under the name sharp variables. Sharp variables were inspired by Common Lisp syntax, and they enabled naming an object or array literal before it had been fully evaluated (even, to a limited extent, interacting with it). Netscape proposed sharp variables for inclusion in ES3, but the proposal was rejected as being too domain-specific and being arguably ugly. Since then the extension has lingered in SpiderMonkey but has seen very little use.

// Identical semantics using sharp variables
var arr = #1=[1, #1#, 3]; // #n= names an object being created, #n# refers to it
var obj = #1={ property: "ohai", nest: #2={ parent: #1# }, secondCopy: #2# };

No other ECMAScript implementer has since shown interest in implementing sharp variables. And with renewed efforts to evolve ECMAScript syntax, special characters like # are increasingly precious. Thus we’ve decided it’s time to remove sharp variable support from SpiderMonkey.

One benefit to removing sharp variables is that we can remove a good chunk of rarely-used code (and attack surface: sharp variables have been a source of some number of likely vulnerabilities) from SpiderMonkey. A syntax-removal patch added 79 lines and removed 1112 lines, including tests; not including tests, it added 42 lines and removed 677 lines. A subsequent patch to remove generation of sharp variable syntax from object decompilation added 65 lines and removed 128 lines. Removing sharp variables will also permit some simplifications now that evaluating a literal can’t have side effects beyond those in any nested property initializers.

Alternatives to sharp variables

Sharp variables may have been sometimes convenient, but they were mere syntactic sugar. It should be simple to convert any use of sharp variables to an equivalent sequence of property additions. If you were sufficiently aware of sharp variables to use them to represent non-tree objects, I trust I don’t have to explain how to do this.

Somewhat more interesting are the cases where decompiling an object produced sharp variable syntax, as when decompiling a cyclic object during debugging. (It’s worth noting in passing that decompilation will not infinitely recur: instead, it’ll bottom out with an empty object or an omitted property.) Jason Orendorff has written a sharps mini-library implementing decompilation of cyclic and non-tree objects which may be useful for this task.

When to expect this change

The sharp variable documentation on MDN has long noted that sharp variables were deprecated and would likely to be removed; a few months ago that warning was upgraded to a firm statement that they would be removed. The sharp variable removal patch that landed yesterday completes the process. The removal will first appear in either today or tomorrow’s nightly; in a week’s time it’ll make its way into the aurora branch, then the beta branch, and finally into Firefox 12. Versions of Firefox prior to 12 will not be affected by this removal, including the extended-support release Firefox 10.

26.12.11

Introducing mozilla/Assertions.h to mfbt

Recently I landed changes to the Mozilla Framework Based on Templates (mfbt) implementing Assertions.h, the new canonical assertions implementation in C/C++ Mozilla code.

Runtime assertions

Using assertions, a developer can efficiently detect when his code goes awry because internal invariants were broken. Mozilla has many assertion facilities. NS_ASSERTION is the oldest, but unfortunately it can be ignored, and therefore historically has been. We later introduced NS_ABORT_IF_FALSE as as an actual assertion that fails hard, and it’s now widely used. But it’s quite unwieldy, and it can’t be used by code that doesn’t want to depend on XPCOM. (Who would?)

mfbt addresses latent concerns with existing runtime assertions by introducing MOZ_ASSERT, MOZ_ASSERT_IF, MOZ_ALWAYS_TRUE, MOZ_ALWAYS_FALSE, and MOZ_NOT_REACHED macros to make performing true assertions dead simple.

MOZ_ASSERT(expr) and MOZ_ASSERT_IF(ifexpr, expr)

MOZ_ASSERT(expr) is straightforward: pass an expression as its sole argument, and in debug builds, if that expression is falsy, the assertion fails and execution halts in a debuggable way.

#include "mozilla/Assertions.h"

void frobnicate(Thing* thing)
{
  MOZ_ASSERT(thing);
  thing->frob();
}

MOZ_ASSERT_IF(ifexpr, expr) addresses the case when you want to assert something, where the check to decide whether to assert isn’t otherwise needed. You’d rather not muddy up your code by adding an #ifdef and if statement around your assertion. (MOZ_ASSERT(!ifexpr || expr) is a workaround, but it’s not very readable.) SpiderMonkey’s experience suggests Mozilla code will get good mileage from MOZ_ASSERT_IF.

#include "mozilla/Assertions.h"

class Error
{
    const char* optionalDescription;

  public:
    /* If specified, desc must not be empty. */
    Error(const char* desc = NULL)
    {
      MOZ_ASSERT_IF(desc != NULL, desc[0] != '\0');
      optionalDescription = desc;
    }
};

MOZ_ALWAYS_TRUE(expr) and MOZ_ALWAYS_FALSE(expr)

Sometimes the expression for an assertion must always be executed for its side effects, and it can’t just be executed in debug builds. MOZ_ALWAYS_TRUE(expr) and MOZ_ALWAYS_FALSE(expr) support this idiom. These macros always evaluate their argument, and in debug builds that argument is asserted truthy or falsy.

#include "mozilla/Assertions.h"

/* JS_ValueToBoolean was fallible but no longer is. */
MOZ_ALWAYS_TRUE(JS_ValueToBoolean(cx, v, &b));

MOZ_NOT_REACHED(reason)

MOZ_NOT_REACHED(reason) indicates that the given point can’t be reached during execution: simply hitting it is a bug. (Think of it as a more-explicit form of asserting false.) It takes as an argument an explanation of why that point shouldn’t have been reachable.

#include "mozilla/Assertions.h"

// ...in a language parser...
void handle(BooleanLiteralNode node)
{
  if (node.isTrue())
    handleTrueLiteral();
  else if (node.isFalse())
    handleFalseLiteral();
  else
    MOZ_NOT_REACHED("boolean literal that's not true or false?");
}

Compile-time assertions

Most assertions must happen at runtime. But some assertions are static, depending only on constants, and could be checked during compilation. A compile time check is better than a runtime check: the developer need not ensure a test exercises that code, because the compiler itself enforces the assertion. Properly crafted static assertions can never be unwittingly broken.

MOZ_STATIC_ASSERT(cond, reason)

MOZ_STATIC_ASSERT(cond, reason) asserts a condition at compile time. In newer compilers, if the assertion fails, the compiler will also include reason in diagnostics.

#include "mozilla/Assertions.h"

struct S { ... };
MOZ_STATIC_ASSERT(sizeof(S) % sizeof(size_t) == 0,
                  "S should be a multiple of word size for efficiency");

MOZ_STATIC_ASSERT is implemented with an impressive pile of hacks which should work perfectly everywhere — except, rarely, gcc 4.2 (the current OS X compiler) when compiling C++ code. The failure mode requires MOZ_STATIC_ASSERT on line N not in an extern "C" code block and a second MOZ_STATIC_ASSERT on the same line N (in a different file) in an extern "C" block. And those two files have to be used when compiling a single file, with the extern "C"‘d assertion second. This seems improbable, so we’ll risk it til we drop gcc 4.2 support.

Possible improvements

Assertions.h is reasonably complete, but I have a few ideas I’ve been considering for improvements. Let me know what you think of them in comments.

Add an optional reason argument to MOZ_ASSERT, and maybe to MOZ_ALWAYS_TRUE and MOZ_ALWAYS_FALSE

MOZ_ASSERT takes only the condition to test as an argument. In contrast NS_ASSERTION and NS_ABORT_IF_FALSE take the condition and an explanatory string. MOZ_ASSERT‘s lack of explanation derives purely from its ancestry in the JS_ASSERT macro: it wasn’t deliberate.

Would it be helpful if MOZ_ASSERT, MOZ_ALWAYS_TRUE, and MOZ_ALWAYS_FALSE optionally took a reason? (Optional because some assertions, e.g. many non-null assertions, are self-explanatory.) We’d have to disable assertions for compilers not implementing variadic macros (I think our supported compilers implement them), or possibly lose the condition expression in the assertion failure message. A reason would make it easier to convert existing NS_ABORT_IF_FALSEs to MOZ_ASSERTs. Should we add an optional second argument to MOZ_ASSERT and the others?

Include __assume(0) or __builtin_unreachable() in MOZ_NOT_REACHED

__builtin_unreachable() and __assume(0) inform the compiler that subsequent code can’t be executed, providing optimization opportunities. It’s unclear how this affects debugging feedback like stack traces. If the optimizations destroy Breakpad-ability, that may be too big a loss. More research is needed here.

Another possibility might be to use __builtin_trap(). This may not communicate an optimization opportunity comparable to that provided by the other two options. (It can’t be equally informative because execution must be able to continue past a trap. Thus the two have different impacts on variable lifetimes. Whether __builtin_trap otherwise communicates “unlikelihood” well enough isn’t clear.) Perhaps __builtin_trap could be used in debug builds, and __builtin_unreachable could be used in optimized builds. Again: more research needed.

Use C11’s _Static_assert in MOZ_STATIC_ASSERT

New editions of C and C++ include built-in static assertion syntax. MOZ_STATIC_ASSERT expands to C++11’s static_assert(2 + 2 == 4, "ya rly") syntax when possible. It could be made to expand to C11’s _Static_assert('A' == 'A', "no wai") syntax in some other cases, but frankly I don’t hack enough C code to care as long as the static assertion actually happens. 🙂 This is bug 713531 if you’re interested in investigating.

Want more information?

Read Assertions.h. mfbt code has a high standard for code comments in interface descriptions, and for file names (the current Util.h being a notable exception which will be fixed). We want it to be reasonably obvious where to find what you need and how to use it by skimming mfbt/‘s contents and then skimming a few files’ contents. Good comments are key to that. You should find Assertions.h quite readable; please file a bug if you have improvements to suggest.

« NewerOlder »