30.04.13

Introducing mozilla::Abs to mfbt

Tags: , , , , , , , , , — Jeff @ 08:17

Computing absolute values in C/C++

C includes various functions for computing the absolute value of a signed number. C++98 implementations add the C functions to namespace std, and it adds abs() overloads to namespace std so std::abs works on everything. For a long time Mozilla used NS_ABS to compute absolute value, but recently we switched to std::abs. This works on many systems, but it has a few issues.

Issues with std::abs

std::abs is split across two headers

With some compilers, the integral overloads are in <cstdlib> and the floating point overloads are in <cmath>. This led to confusion when std::abs compiled on one type but not on another, in the same file. (Or worse, when it worked with just one #include because of that developer’s compiler.) The solution was to include both headers even if only one was needed. This is pretty obscure.

std::abs(int64_t) doesn’t work everywhere

On many systems <stdint.h> has typedef long long int64_t;. But long long was only added in C99 and C++11, and some compilers don’t have long long std::abs(long long), so int64_t i = 0; std::abs(i); won’t compile. We “solved” this with compiler-specific #ifdefs around custom std::abs specializations in a somewhat-central header. (That’s three headers to include!) C++ says this has undefined behavior, and indeed it’ll break as we update compilers.

std::abs(int32_t(INT32_MIN)) doesn’t work

The integral abs overloads don’t work on the most-negative value of each signed integer type. On twos-complement machines (nearly everything), the absolute value of the smallest integer of a signed type won’t fit in that type. (For example, INT8_MIN is -128, INT8_MAX is +127, and +128 won’t fit in int8_t.) The integral abs functions take and return signed types. If the smallest integer flows through, behavior is undefined: as absolute-value is usually implemented, the value is returned unchanged. This has caused Mozilla bugs.

Mozilla code should use mozilla::Abs, not std::abs

Unfortunately the only solution is to implement our own absolute-value function. mozilla::Abs in "mozilla/MathAlgorithms.h" is overloaded for all signed integral types and the floating point types, and the integral overloads return the unsigned type. Thus you should use mozilla::Abs to compute absolute values. Be careful about signedness: don’t assign directly into a signed type! That loses mozilla::Abs‘s ability to accept all inputs and will cause bugs. Ideally this would be a compiler warning, but we don’t use -Wconversion or Microsoft equivalents and so can’t do better.

26.04.13

mozilla/PodOperations.h: functions for zeroing, assigning to, copying, and comparing plain old data objects

Tags: , , , , , , , — Jeff @ 13:20

Recently I introduced the new header mozilla/PodOperations.h to mfbt, moving its contents out of SpiderMonkey so for general use. This header makes various operations on memory for objects easier and safer.

The problem

Often in C or C++ one might want to set the contents of an object to zero — perhaps to initialize it:

mMetrics = new gfxFont::Metrics;
::memset(mMetrics, 0, sizeof(*mMetrics));

Or perhaps the same might need to be done for a range of objects:

memset(mTreeData.Elements(), 0, mTreeData.Length() * sizeof(mTreeData[0]));

Or perhaps one might want to set the contents of an object to those of another object:

memcpy(&e, buf, sizeof(e));

Or perhaps a range of objects must be copied:

memcpy(to + aOffset, aBuffer, aLength * sizeof(PRUnichar));

Or perhaps a range of objects must be memory-equivalence-compared:

return memcmp(k->chars(), l->chars(), k->length() * sizeof(jschar)) == 0;

What do all these cases have in common? They all require using a sizeof() operation.

The problem

C and C++, as low-level languages very much focused on the actual memory, place great importance in the size of an object. Programmers often think much less about sizes. It’s pretty easy to write code without having to think about memory. But some cases require it, and because it doesn’t happen regularly, it’s easy to make mistakes. Even experienced programmers can screw it up if they don’t think carefully.

This is particularly likely for operations on arrays of objects. If the object’s size isn’t 1, forgetting a sizeof means an array of objects might not be completely cleared, copied, or compared. This has led to Mozilla security bugs in the past. (Although, the best I can find now is bug 688877, which doesn’t use quite the same operations, and can’t be solved with these methods, but which demonstrates the same sort of issue.)

The solution

Using the prodigious magic of C++ templates, the new mfbt/PodOperations.h abstracts away the sizeof in all the above examples, implements bounds-checking assertions as appropriate, and is type-safe (doesn’t require implicit casts to void*).

  • Zeroing
    • PodZero(T* t): set the contents of *t to 0
    • PodZero(T* t, size_t count): set the contents of count elements starting at t to 0
    • PodArrayZero(T (&t)[N]): set the contents of the array t (with a compile-time size) to 0
  • Assigning
    • PodAssign(T* dst, const T* src): set the contents of *dst to *src — locations can’t overlap (no self-assignments)
  • Copying
    • PodCopy(T* dst, const T* src, size_t count): copy count elements starting at src to dst — ranges can’t overlap
  • Comparison
    • PodEqual(const T* one, const T* two, size_t len): true or false indicating whether len elements at one are memory-identical to len elements at two

Random questions

Why “Pod”?

POD is a C++ term of art abbreviation for “plain old data”. A type that’s plain old data is, roughly: a built-in type; a pointer or enum that’s represented like a built-in type; a user-defined class without any virtual methods or inheritance or user-defined constructors or destructors (including in any of its base classes), whose non-static members are themselves plain old data; or an array of a type that’s plain old data. (There are a couple other restrictions that don’t matter here and would take too long to explain anyway.)

One implication of a type being POD is that (systemic interactions aside) you can copy an object of that type using memcpy. The file and method names simply play on that. Arguably it’s not the best, clearest term in the world — especially as these methods aren’t restricted to POD types. (One intended use is for initializing classes that are non-POD, where the initial state is fully-zeroed.) But it roughly gets the job done, no better names quickly spring to mind, and renaming would have been pain without much gain.

What are all these “optimizations” in these methods?

When these operations were added to SpiderMonkey a few years ago, various people (principally Luke, if I remember right) benchmarked these operations when used in various places in SpiderMonkey. It turned out that “trivial” uses of memcmp, &c. wouldn’t always be optimally compiled by the compiler to fast, SIMD-optimizable loops. Thus we introduced special cases. Newer compilers may do better, such that we have less need for the optimizations. But the worst that happens with them is slightly slower code — not correctness bugs. If you have real-world data (inchoate fears don’t count :-) ) showing these optimizations aren’t needed now, file a bug and we can adapt them as needed.

26.07.12

Checking in from Milford, UT

Tags: , , , — Jeff @ 09:47

I’m in Milford, UT right now after a night in a hotel. It’s been eight days of biking so far at a fairly stiff pace, out of necessity due to the 37-day cap. Nevada (particularly) and Utah are so spread out and empty that most of my days have been 100+ through it, including one 135 mile day (and, perhaps surprisingly, an even worse day yesterday at 118 miles, due to the first real headwind of the trip). And don’t forget there’s no water between towns, and often no houses, even. But this is what’s between the west coast and the east coast, so you make it work. What must be done, can be done. (In my case, using two completely-filled 100oz. water bladders, which yesterday was just enough to get me through the longest waterless stretch on the route at 84 miles. Although I was sucking dry for the last dozen miles of it, which happily coincided with a long downhill stretch where I didn’t miss the water too much.)

I said I’d try to occasionally post here, but I’m finding that for this trip Twitter’s a much better medium both in terms of ease of use and suitability for pictures plus a few words, so look there for future updates. They’re pretty intermittent, tho — cell coverage out here is very sparse, and by all accounts T-Mobile is the worst provider in the world to have out here (rookie error on my part), so I tend to save up drafts and post them all at once when I do find coverage.

Anyway, back to riding now. Onward through Utah to the Continental Divide, and to the Great Plains beyond!

17.07.12

37 awesome days

I tend to take very long vacations. Coding gives me the flexibility to work from anywhere, so when I travel, I keep working by default and take days off when something special arises. Thus I usually take vacation in very short increments, but very occasionally I’ll be gone awhile. And when I’m gone awhile, I’m gone: no hacking, no work, just focused on the instant.

My last serious-length vacation was August-September last year. And since then, I’ve taken only a day and a half of vacation (although I’ve shifted a few more days or fractions thereof to evenings or weekends). It’s time for a truly long vacation.

Screenshot of a browser showing Mozilla's PTO app, indicating 224 hours of PTO starting July 18
Yeah, I’m pretty much using it all up.

For several years I’ve had a list of long trips I’ve decided I will take: the Appalachian Trail, the John Muir Trail, the Coast to Coast Walk in England, and the Pacific Crest Trail. I’ve done the first two in 2008 and 2010 and the third last year. The fourth requires more than just a vacation, so I haven’t gotten to it yet. This leaves one last big trip: biking across the United States.

Tomorrow I take a much-needed break to recharge and recuperate (in a manner of speaking) by biking from the Pacific to the Atlantic. (Ironically, the first leg out of San Francisco is a ferry to Vallejo.) I have a commitment at the back end August 25 in San Francisco, and a less-critical one (more biking, believe it or not!) August 26. The 24th must be a day to fly back, so I have 37 days to bike the ~3784 miles of the Western Express Route (San Francisco, CA to Pueblo, CO) and part of the TransAmerica Trail (Pueblo to Yorktown, VA). This is an aggressive pace, to put it mildly; but I’ve biked enough hundred-mile days before, singly and seriatim, that I believe it’s doable with effort and focus.

Unlike in past trips, I won’t be incommunicado this time. I’ll pass through towns regularly, so I’ll have consistent ability to access the Internet. And I died a little, but I bought two months of cell/data service to cover the trip. So it goes. I won’t be regularly checking email (or bugmail, or doing reviews). But I’ll try to make a quick post from time to time with a picture and a few words.

I could say a little about gear — my twenty-five pound carrying capacity in panniers on a seatpost-mounted rack, the Kindle I purchased for reading end-of-day (which I’ve enjoyed considerably for the last week…as has my credit card), the 25-ounce sleeping bag I’ll carry, the tent I’ll use. I could also say a little about the hazards — the western isolation (you Europeans have no idea what that means), the western desert (one Utah day will be 50 miles without water, then 74 miles without water), the high summer climate, the other traffic, and simple exhaustion. But none of that’s important compared to the fact that 1) this is finally happening, and 2) it starts tomorrow.

“And now I think I am quite ready to go on another journey.” Let’s do this.

19.04.12

Introducing mozilla/FloatingPoint.h: methods for floating point types and values

Tags: , , , , , , — Jeff @ 19:00

The latest addition to the Mozilla Framework Based on Templates (mfbt) is mozilla/FloatingPoint.h. This header implements various floating point functionality.

Functionality overview

mozilla/FloatingPoint.h currently implements the following functionality, all centered around working with double-precision floating point numbers. (There’s no single-precision support only because Mozilla seemingly doesn’t need it. The only code I can find that does this sort of thing for single-precision numbers is nsCoord.h, and that only barely. We can add single-precision equivalents when we need them.)

MOZ_DOUBLE_IS_NaN(double d)
Determines whether a value is NaN (not a number).
MOZ_DOUBLE_IS_INFINITE(double d)
Determines whether a value is positive or negative infinity.
MOZ_DOUBLE_IS_FINITE(double d)
Determines whether a number is finite — that is, not NaN or positive or negative infinity.
MOZ_DOUBLE_IS_NEGATIVE(double d)
Determines whether a number is negative. This is useful because d < 0 does not answer this question! There are two zero values, +0 and -0, and IEEE-754 requires that (-0 < 0) be false. (There are good reasons for this, but this isn’t the place to get into them. If you haven’t read it, read What Every Computer Scientist Should Know About Floating-Point Arithmetic right now. It probably gives the answer, and much much more knowledge as well.) This method properly distinguishes -0 as being negative.
MOZ_DOUBLE_IS_NEGATIVE_ZERO(double d)
Determines whether a number is -0.
MOZ_DOUBLE_EXPONENT(double d)
Returns the exponent portion of the number. Floating point numbers are represented as a sign bit s, a binary fraction b0..p, and an exponent E. The represented number, then, is (-1)s(b0..p)2E. This method returns the number E for a floating point number.
MOZ_DOUBLE_POSITIVE_INFINITY()
Returns positive infinity.
MOZ_DOUBLE_NEGATIVE_INFINITY()
Returns negative infinity.
MOZ_DOUBLE_SPECIFIC_NaN(int signbit, uint64_t significand)
Computes a specific NaN value, with a bit pattern specified by provided parameters. The bit layout IEEE-754 specifies for floating point formats interestingly requires that multiple bit patterns be treated as NaN values. This method allows the user to create such custom NaN values if he needs to. (99% of code should never, ever touch this method. Instead, most code should use…)
MOZ_DOUBLE_NaN()
Computes an unspecified NaN value. If you need a NaN value and you don’t know that you need a specific NaN, use this method instead to get one.
MOZ_DOUBLE_MIN_VALUE()
Returns the smallest non-zero positive double value.
MOZ_DOUBLE_IS_INT32(double d, int32_t* i)
Determines if the provided number is a signed 32-bit integer. (-0 doesn’t count as such; +0, the “normal” zero value, does.) If it is, *i will be set to that value when the method returns.

(There’s one more method in the header, currently, that used to have users. Sometime in the last couple months, however, that method’s users all disappeared, and I didn’t notice it when rebasing. Thus I’ll be removing it shortly, and I haven’t mentioned it here.)

Why add some of these methods? Aren’t isinf, isnan, and so on good enough?

There are standard methods implementing some (but not all) of this functionality. In the best of all possible worlds we could simply use isnan and other such methods directly. In practice we’ve encountered a number of problems.

First, Microsoft’s compilers gratuitously Think Different and don’t expose isnan and friends, so on those platforms you’d have to use _isnan instead. (std::isnan isn’t usable there because some of our code still must work as both C and C++.) Obviously, we don’t want to #ifdef every place we need to use the method.

Second, we’ve found various compilers have bugs when using either the standard methods or Microsoft’s bogo-named methods. Most commonly this bustage occurs with PGO builds; interestingly, both MSVC and gcc have problems here, despite their optimizers obviously not sharing any code.

Third, we’ve found even some obvious bitwise algorithms trigger PGO build failures, again on entirely different compilers. (You can’t win.)

Basically, then, we can’t use the standard methods, we can’t use some bitwise methods, and whatever we do we have to be really careful about to make sure we don’t break anything. Hopefully this header will satisfy those requirements.

Where’s this header again?

The header is located at mfbt/FloatingPoint.h in the source tree. However, per standard mfbt practice, you should use #include "mozilla/FloatingPoint.h" to include it. Knock yourself out using it.

Older »