24.08.13

37 days and one year later: part 1: the start and choosing a route

One year ago, after 37 days of biking around ~3875mi total starting in San Francisco, I reached Yorktown, VA to finish biking across the country. An exact day-to-day accounting would likely bog down in uninteresting logistics (particularly given the way I traveled — other approaches would likely yield more interesting day-by-day commentary). Instead, I’m going to cover a variety of topics of interest from the trip, in somewhat random order, in series. If you want a very cursory, sometimes out-of-order account of the trip, reading approximately July 18 to August 25 of my Twitter stream covers it.

Me in the traditional arms-upraised pose, next to my bike and (appropriately) the Victory Monument at Yorktown, with the Chesapeake Bay (and the Atlantic Ocean) in the background
The secret to my speed: obviously the cycling jersey

An inauspicious start

The trip got off to a bumpy start the Tuesday night before I planned to leave. I planned to ride my spiffy, super-light carbon-fiber racing bike. I use it for regular transport, so I waited to get a final tune-up til the last minute, picking it up the evening before I departed. I began loading it with panniers and gear. Racing bikes don’t have mounts for carrying gear, so I’d use a seatpost rack (with correspondingly light ~16-pound load). When I began attaching the rack, I noticed the clamp matched a much smaller-diameter seatpost. Looking at how the clamp would make contact with the seatpost, it suddenly occurred to me that attaching a seatpost rack to a carbon fiber seatpost might not be a good idea. Carbon fiber is strong along its length, not laterally: the clamp could easily crush the seatpost.

A red carbon-fiber racing bike
Shiny! But really not the thing to use for touring

Wednesday morning, I asked the bike shop if they had an aluminum seatpost of the right size. They wouldn’t have one til Friday. Other local shops didn’t have any, either. Replacing the seatpost was out.

Seeing no other options…I went to the first bike shop, bought a non-carbon road bike that fit me, walked home with it, transferred gear and pedals to it, and biked to Caltrain to head to San Francisco to start the trip.

Thus I crossed the country on a bike I bought the day I left.

Me standing underneath a "Welcome to Illinois" sign, with my bike leaning against the sign just next to me; a sign with directions to a mental health center is just visible
Too bad that mental health center wasn’t closer to the start of the trip, there might have been hope for me then

This is crazy. But not quite as crazy as it sounds. I’d purchased a 2012 Scattante R-570; I’d previously owned the 2010 version, so I knew I’d be comfortable. And months before, I’d considered getting a touring-oriented bike for extra carrying capacity. But I’ve never spent money very easily. I had the money, but I didn’t want to spend it if I didn’t have to.

Now I was in a “have to” situation. Riding a totally untested bike would rightly scare most people to death. Most people would probably cancel the trip or substantially change plans. But my philosophy is that what must be done, can be done. So I did it.

Other than lost biking time (day 1 was 23.76mi route miles rather than the ~100mi I’d intended — no small loss, but not huge, either), all I lost was the ability to buy the bike on sale for ~$160 less. It could have been worse.

Choosing a route

I traveled pretty much entirely with the aid of the Adventure Cycling Association‘s route maps. I considered finding my own route, but I discarded the idea for lack of time and being unsure I’d enjoy route-planning. In hindsight this was clearly the right choice. Unless you enjoy route-planning for its own sake, buy existing cycling maps. You’ll get better routes, and more cycling-useful information, than you can create on the fly. (Plus GPS units cost hundreds of dollars and must be charged every night.)

Route profile for the section of road from Grover to Lake Powell in Utah
A profile from an ACA map, that’s likely harder to find outside of prepared maps

The 4200-mile TransAmerica Trail goes from Oregon into Montana, southeast to Pueblo in Colorado, then east to Virginia and the coast. It’s the most well-known and commonly-used cross-country route. The 1580-mile Western Express goes from San Francisco to Pueblo. Most people do the TransAmerica because it avoids much waterless desert and elevation change. For me, convenience and available time made the Western Express and eastern TransAmerica a no-brainer.

A definite perk to using an existing route is that the roads will be good for cycling. Often I was on relatively empty back roads, or on state roads with light traffic. The worst roads were in the Rockies in Colorado, likely because of the terrain. The worst regularly-bad road occurred between Cimarron and Sapinero along US-50: a narrow, winding stretch of road with little shoulder and a bunch of RV traffic, where I should have occasionally taken the entire lane rather than let anyone unsafely pass me. Colorado also had the worst irregularly-bad stretches of road, along CO-145 due to road construction. There were two two-mile stretches of riding through gravel where roads were being re-oiled, which I rode through (what choice did I have?) past Motorcycles use extreme caution signs on 700×23 tires (less than an inch wide). Good times. And the stretch from Telluride to Placerville had so much construction dust I sometimes couldn’t see ten feet; I had to stop and turn on head and tail lights to be visible. But generally, ignoring these rare exceptions, the roads were great.

Next time, the daily grind and shelter.

12.08.13

Micro-feature from ES6, now in Firefox Aurora and Nightly: binary and octal numbers

A couple years ago when SpiderMonkey’s implementation of strict mode was completed, I observed that strict mode forbids octal number syntax. There was some evidence that novice programmers used leading zeroes as alignment devices, leading to unexpected results:

var sum = 015 + // === 13, not 15!
          197;
// sum === 210, not 212

But some users (Mozilla extensions and server-side node.js packages in particular) still want octal syntax, usually for file permissions. ES6 thus adds new octal syntax that won’t trip up novices. Hexadecimal numbers are formed with the prefix 0x or 0X followed by hexadecimal digits. Octal numbers are similarly formed using 0o or 0O followed by octal digits:

var DEFAULT_PERMS = 0o644; // kosher anywhere, including strict mode code

(Yes, it was intentional to allow the 0O prefix [zero followed by a capital O] despite its total unreadability. Consistency trumped readability in TC39, as I learned when questioning the wisdom of 0O as prefix. I think that decision is debatable, and the alternative is certainly not “nanny language design”. But I don’t much care as long as I never see it. 🙂 I recommend never using the capital version and applying a cluestick to anyone who does.)

Some developers also need binary syntax, which ECMAScript has never provided. ES6 thus adds analogous binary syntax using the letter b (lowercase or uppercase):

var FLT_SIGNBIT  = 0b10000000000000000000000000000000;
var FLT_EXPONENT = 0b01111111100000000000000000000000;
var FLT_MANTISSA = 0b00000000011111111111111111111111;

Try out both new syntaxes in Firefox Aurora or, if you’re feeling adventurous, in a Firefox nightly. Use the profile manager if you don’t want your regular Firefox browsing history touched.

If you’ve ever needed octal or binary numbers, hopefully these additions will brighten your day a little. 🙂

05.08.13

New in Firefox 23: the length property of an array can be made non-writable (but you shouldn’t do it)

Properties and their attributes

Properties of JavaScript objects include attributes for enumerability (whether the property shows up in a for-in loop on the object) and configurability (whether the property can be deleted or changed in certain ways). Getter/setter properties also include get and set attributes storing those functions, and value properties include attributes for writability and value.

Array properties’ attributes

Arrays are objects; array properties are structurally identical to properties of all other objects. But arrays have long-standing, unusual behavior concerning their elements and their lengths. These oddities cause array properties to look like other properties but behave quite differently.

The length property

The length property of an array looks like a data property but when set acts like an accessor property.

var arr = [0, 1, 2, 3];
var desc = Object.getOwnPropertyDescriptor(arr, "length");
print(desc.value); // 4
print(desc.writable); // true
print("get" in desc); // false
print("set" in desc); // false

print("0" in arr); // true
arr.length = 0;
print("0" in arr); // false (!)

In ES5 terms, the length property is a data property. But arrays have a special [[DefineOwnProperty]] hook, invoked whenever a property is added, set, or modified, that imposes special behavior on array length changes.

The element properties of arrays

Arrays’ [[DefineOwnProperty]] also imposes special behavior on array elements. Array elements also look like data properties, but if you add an element beyond the length, it’s as if a setter were called — the length grows to accommodate the element.

var arr = [0, 1, 2, 3];
var desc = Object.getOwnPropertyDescriptor(arr, "0");
print(desc.value); // 0
print(desc.writable); // true
print("get" in desc); // false
print("set" in desc); // false

print(arr.length); // 4
arr[7] = 0;
print(arr.length); // 8 (!)

Arrays are unlike any other objects, and so JS array implementations are highly customized. These customizations allow the length and elements to act as specified when modified. They also make array element access about as fast as array element accesses in languages like C++.

Object.defineProperty implementations and arrays

Customized array representations complicate Object.defineProperty. Defining array elements isn’t a problem, as increasing the length for added elements is long-standing behavior. But defining array length is problematic: if the length can be made non-writable, every place that modifies array elements must respect that.

Most engines’ initial Object.defineProperty implementations didn’t correctly support redefining array lengths. Providing a usable implementation for non-array objects was top priority; array support was secondary. SpiderMonkey’s initial Object.defineProperty implementation threw a TypeError when redefining length, stating this was “not currently supported”. Fully-correct behavior required changes to our object representation.

Earlier this year, Brian Hackett’s work in bug 827490 changed our object representation enough to implement length redefinition. I fixed bug 858381 in April to make Object.defineProperty work for array lengths. Those changes will be in Firefox 23 tomorrow.

Should you make array lengths non-writable?

You can change an array’s length without redefining it, so the only new capability is making an array length non-writable. Compatibility aside, should you make array lengths non-writable? I don’t think so.

Non-writable array length forbids certain operations:

  • You can’t change the length.
  • You can’t add an element past that length.
  • You can’t call methods that increase (e.g. Array.prototype.push) or decrease (e.g. Array.prototype.pop) the length. (These methods do sometimes modify the array, in well-specified ways that don’t change the length, before throwing.)

But these are purely restrictions. Any operation that succeeds on an array with non-writable length, succeeds on the same array with writable length. You wouldn’t do any of these things anyway, to an array whose length you’re treating as fixed. So why mark it non-writable at all? There’s no functionality-based reason for good code to have non-writable array lengths.

Fixed-length arrays’ only value is in maybe permitting optimizations dependent on immutable length. Making length immutable permits minimizing array elements’ memory use. (Arrays usually over-allocate memory to avoid O(n2) behavior when repeatedly extending the array.) But if it saves memory (this is highly allocator-sensitive), it won’t save much. Fixed-length arrays may permit bounds-check elimination in very circumscribed situations. But these are micro-optimizations you’d be hard-pressed to notice in practice.

In conclusion: I don’t think you should use non-writable array lengths. They’re required by ES5, so we’ll support them. But there’s no good reason to use them.

30.04.13

Introducing mozilla::Abs to mfbt

Tags: , , , , , , , , , — Jeff @ 08:17

Computing absolute values in C/C++

C includes various functions for computing the absolute value of a signed number. C++98 implementations add the C functions to namespace std, and it adds abs() overloads to namespace std so std::abs works on everything. For a long time Mozilla used NS_ABS to compute absolute value, but recently we switched to std::abs. This works on many systems, but it has a few issues.

Issues with std::abs

std::abs is split across two headers

With some compilers, the integral overloads are in <cstdlib> and the floating point overloads are in <cmath>. This led to confusion when std::abs compiled on one type but not on another, in the same file. (Or worse, when it worked with just one #include because of that developer’s compiler.) The solution was to include both headers even if only one was needed. This is pretty obscure.

std::abs(int64_t) doesn’t work everywhere

On many systems <stdint.h> has typedef long long int64_t;. But long long was only added in C99 and C++11, and some compilers don’t have long long std::abs(long long), so int64_t i = 0; std::abs(i); won’t compile. We “solved” this with compiler-specific #ifdefs around custom std::abs specializations in a somewhat-central header. (That’s three headers to include!) C++ says this has undefined behavior, and indeed it’ll break as we update compilers.

std::abs(int32_t(INT32_MIN)) doesn’t work

The integral abs overloads don’t work on the most-negative value of each signed integer type. On twos-complement machines (nearly everything), the absolute value of the smallest integer of a signed type won’t fit in that type. (For example, INT8_MIN is -128, INT8_MAX is +127, and +128 won’t fit in int8_t.) The integral abs functions take and return signed types. If the smallest integer flows through, behavior is undefined: as absolute-value is usually implemented, the value is returned unchanged. This has caused Mozilla bugs.

Mozilla code should use mozilla::Abs, not std::abs

Unfortunately the only solution is to implement our own absolute-value function. mozilla::Abs in "mozilla/MathAlgorithms.h" is overloaded for all signed integral types and the floating point types, and the integral overloads return the unsigned type. Thus you should use mozilla::Abs to compute absolute values. Be careful about signedness: don’t assign directly into a signed type! That loses mozilla::Abs‘s ability to accept all inputs and will cause bugs. Ideally this would be a compiler warning, but we don’t use -Wconversion or Microsoft equivalents and so can’t do better.

26.04.13

mozilla/PodOperations.h: functions for zeroing, assigning to, copying, and comparing plain old data objects

Tags: , , , , , , , — Jeff @ 13:20

Recently I introduced the new header mozilla/PodOperations.h to mfbt, moving its contents out of SpiderMonkey so for general use. This header makes various operations on memory for objects easier and safer.

The problem

Often in C or C++ one might want to set the contents of an object to zero — perhaps to initialize it:

mMetrics = new gfxFont::Metrics;
::memset(mMetrics, 0, sizeof(*mMetrics));

Or perhaps the same might need to be done for a range of objects:

memset(mTreeData.Elements(), 0, mTreeData.Length() * sizeof(mTreeData[0]));

Or perhaps one might want to set the contents of an object to those of another object:

memcpy(&e, buf, sizeof(e));

Or perhaps a range of objects must be copied:

memcpy(to + aOffset, aBuffer, aLength * sizeof(PRUnichar));

Or perhaps a range of objects must be memory-equivalence-compared:

return memcmp(k->chars(), l->chars(), k->length() * sizeof(jschar)) == 0;

What do all these cases have in common? They all require using a sizeof() operation.

The problem

C and C++, as low-level languages very much focused on the actual memory, place great importance in the size of an object. Programmers often think much less about sizes. It’s pretty easy to write code without having to think about memory. But some cases require it, and because it doesn’t happen regularly, it’s easy to make mistakes. Even experienced programmers can screw it up if they don’t think carefully.

This is particularly likely for operations on arrays of objects. If the object’s size isn’t 1, forgetting a sizeof means an array of objects might not be completely cleared, copied, or compared. This has led to Mozilla security bugs in the past. (Although, the best I can find now is bug 688877, which doesn’t use quite the same operations, and can’t be solved with these methods, but which demonstrates the same sort of issue.)

The solution

Using the prodigious magic of C++ templates, the new mfbt/PodOperations.h abstracts away the sizeof in all the above examples, implements bounds-checking assertions as appropriate, and is type-safe (doesn’t require implicit casts to void*).

  • Zeroing
    • PodZero(T* t): set the contents of *t to 0
    • PodZero(T* t, size_t count): set the contents of count elements starting at t to 0
    • PodArrayZero(T (&t)[N]): set the contents of the array t (with a compile-time size) to 0
  • Assigning
    • PodAssign(T* dst, const T* src): set the contents of *dst to *src — locations can’t overlap (no self-assignments)
  • Copying
    • PodCopy(T* dst, const T* src, size_t count): copy count elements starting at src to dst — ranges can’t overlap
  • Comparison
    • PodEqual(const T* one, const T* two, size_t len): true or false indicating whether len elements at one are memory-identical to len elements at two

Random questions

Why “Pod”?

POD is a C++ term of art abbreviation for “plain old data”. A type that’s plain old data is, roughly: a built-in type; a pointer or enum that’s represented like a built-in type; a user-defined class without any virtual methods or inheritance or user-defined constructors or destructors (including in any of its base classes), whose non-static members are themselves plain old data; or an array of a type that’s plain old data. (There are a couple other restrictions that don’t matter here and would take too long to explain anyway.)

One implication of a type being POD is that (systemic interactions aside) you can copy an object of that type using memcpy. The file and method names simply play on that. Arguably it’s not the best, clearest term in the world — especially as these methods aren’t restricted to POD types. (One intended use is for initializing classes that are non-POD, where the initial state is fully-zeroed.) But it roughly gets the job done, no better names quickly spring to mind, and renaming would have been pain without much gain.

What are all these “optimizations” in these methods?

When these operations were added to SpiderMonkey a few years ago, various people (principally Luke, if I remember right) benchmarked these operations when used in various places in SpiderMonkey. It turned out that “trivial” uses of memcmp, &c. wouldn’t always be optimally compiled by the compiler to fast, SIMD-optimizable loops. Thus we introduced special cases. Newer compilers may do better, such that we have less need for the optimizations. But the worst that happens with them is slightly slower code — not correctness bugs. If you have real-world data (inchoate fears don’t count 🙂 ) showing these optimizations aren’t needed now, file a bug and we can adapt them as needed.

« NewerOlder »