20.10.11

Implementing mozilla::ArrayLength and mozilla::ArrayEnd, and some followup work

Tags: , , , , , , — Jeff @ 16:03

In my last post I announced the addition of mozilla::ArrayLength and mozilla::ArrayEnd to the Mozilla Framework Based on Templates, and I noted I was leaving a description of how these methods were implemented to a followup post. This is that post.

The C++ template trick used to implement mozilla::ArrayLength

The implementations of these methods are surprisingly simple:

template<typename T, size_t N>
size_t
ArrayLength(T (&arr)[N])
{
  return N;
}

template<typename T, size_t N>
T*
ArrayEnd(T (&arr)[N])
{
  return arr + ArrayLength(arr);
}

The trick is this: you can templatize an array based on its compile-time length. Here we templatize both methods on: the type of the elements of the array, so that each is polymorphic; and the number of elements in the array. Then inside the method we can refer to that length, a constant known at compile time, and simply return it to implement the desired semantics.

Templatizing on the length of an array may not seem too unusual. The part that may be a little unfamiliar is how the array is described as a parameter of the template method: T (&arr)[N]. This declares the argument to be a reference to an array of N elements of type T. Its being a reference is important: we don’t actually care about the array contents at all, and we don’t want to copy them to call the method. All we care about is its type, which we can capture without further cost using a reference.

This technique is uncommon, but it’s not new to Mozilla. Perhaps you’ve wondered at some point why Mozilla’s string classes have both EqualsLiteral and EqualsASCII: the former for use only when comparing to “an actual literal string”, the latter for use when comparing to any const char*. You can probably guess why this interface occurs now: EqualsLiteral is a method templatized on the length of the actual literal string passed to it. It can be more efficient than EqualsASCII because it knows the length of the compared string.

Using this trick in other code

I did most of the work to convert NS_ARRAY_LENGTH to ArrayLength with a script. But I still had to look over the results of the script to make sure things were sane before proceeding with it. In doing so, I noticed a decent number of places where an array was created, then it and its length were being passed as arguments to another method. For example:

void nsHtml5Atoms::AddRefAtoms()
{
  NS_RegisterStaticAtoms(Html5Atoms_info, ArrayLength(Html5Atoms_info));
}

Using ArrayLength here is safer than hard-coding a length. But safer still would be to not require callers to pass an array and a length separately — rather to pass them together. We can do this by pushing the template trick down another level, into NS_RegisterStaticAtoms (or at least into an internal method used everywhere, if the external API must be preserved for some reason):

static nsresult
RegisterStaticAtoms(const nsStaticAtom* aAtoms, PRUint32 aAtomCount)
{
  // ...old implementation...
}

template<size_t N>
nsresult
NS_RegisterStaticAtoms(const nsStaticAtom (&aAtoms)[N])
{
  return RegisterStaticAtoms(aAtoms, N);
}

The pointer-and-length method generally still needs to stick around somewhere, and it does in this rewrite here. It’s just that it wouldn’t be the interface user code would see, or it wouldn’t be the primary interface (for example, it might be protected or similar).

NS_RegisterStaticAtoms was just one such method which could use improvement. In a quick skim I also see:

…as potential spots that could be improved — at least for some callers — with some additional templatization on array length.

I didn’t look super-closely at these, so I might have missed some. Or I might have been over-generous in what could be rewritten to templatize on length, seeing only the one or two places that pass fixed lengths and missing the majority of cases that don’t. But there’s definitely a lot of cleaning that could be done here.

A call for help

Passing arrays and lengths separately is dangerous if you don’t know what you’re doing. The trick used here eliminates it, in certain cases. The more we can use this pattern, the more we can fundamentally reduce the danger of separating data from its length.

I don’t have time to do the above work myself. (I barely had time to do the ArrayLength work, really. I only did it because I’d sort of started the ball rolling in a separate bug, so I felt some obligation to make sure it got done.) And it’s not particularly hard work, requiring especial knowledge of the relevant code. It’s a better use of my time for me to work on JavaScript engine code, or on other code I know particularly well, than to do this work. But for someone interested in getting started working on Gecko C++ code, it would be a good first project. I’ve filed bug 696242 for this task; if anyone’s interested in a good first bug for getting into Mozilla C++ coding, feel free to start with that one. If you have questions about what to do, in any way, feel free to ask them there.

On the other hand, if you have any questions about the technique in question, or the way it’s used in Mozilla, feel free to ask them here. But if you want to contribute to fixing the issues I’ve noted, let’s keep them to the bug, if possible.

Computing the length or end of a compile-time constant-length array: {JS,NS}_ARRAY_{END,LENGTH} are out, mozilla::Array{Length,End} are in

Tags: , , , , , , , — Jeff @ 16:03

Determining the length of a fixed-length array

Suppose in C++ you want to perform variations of some simple task several times. One way to do this is to loop over the variations in an array to perform each task:

/* Defines the necessary envvars to 1. */
int setVariables()
{
  static const char* names[] = { "FOO", "BAR", "BAZ", "QUUX", "EIT", "GOATS" };
  for (int i = 0; i < 6; i++)
    if (0 > setenv(names[i], "1")) return -1;
  return 0;
}

Manually looping by index is prone to error. One particular issue with loop-by-index is that you must correctly compute the extent of iteration. Hard-coding a constant works, but what if the array changes? The constant must also change, which isn’t obvious to someone not looking carefully at all uses of the array.

The traditional way to get an array’s length is with a macro using sizeof:

#define NS_ARRAY_LENGTH(arr)  (sizeof(arr) / sizeof((arr)[0]))

This works but has problems. First, it’s a macro, which means it has the usual macro issues. For example, macros are untyped, so you can pass in “wrong” arguments to them and may not get type errors. This is the second problem: NS_ARRAY_LENGTH cheerfully accepts non-array pointers and returns a completely bogus length.

  const char* str = "long enough string";
  char* copy = (char*) malloc(NS_ARRAY_LENGTH(str)); // usually 4 or 8
  strcpy(copy, str); // buffer overflow!

Introducing mozilla::ArrayLength and mozilla::ArrayEnd

Seeing an opportunity to both kill a moderately obscure macro and to further improve the capabilities of the Mozilla Framework Based on Templates, I took it. Now, rather than use these macros, you can #include "mozilla/Util.h" and use mozilla::ArrayLength to compute the length of a compile-time array. You can also use mozilla::ArrayEnd to compute arr + ArrayLength(arr). Both these methods (not macros!) use C++ template magic to only accept arrays with compile-time-fixed length, failing to compile if something else is provided.

Limitations

Unfortunately, ISO C++ limitations make it impossible to write a method completely replacing the macro. So the macros still exist, and in rare cases they remain the correct answer.

The array can’t depend on an unnamed type (class, struct, union) or a local type

According to C++ §14.3.1 paragraph 2, “A local type [or an] unnamed type…shall not be used as a template-argument for a template type-parameter.” C++ makes this a compile error:

size_t numsLength()
{
  // unnamed struct, also locally defined
  static const struct { int i; } nums[] = { { 1 }, { 2 }, { 3 } };

  return mozilla::ArrayLength(nums);
}

It’s easy to avoid both limitations: move local types to global code, and name them.

// now defined globally, and with a name
struct Number { int i; };
size_t numsLength()
{
  static const Number nums[] = { 1, 2, 3 };
  return mozilla::ArrayLength(nums);
}

mozilla::ArrayLength(arr) isn’t a constant, NS_ARRAY_LENGTH(arr) is

Some contexts in C++ require a compile-time constant expression: template parameters, (in C++ but not C99) for local array lengths, for array lengths in typedefs, for the value of enum initializers, for static/compile-time assertions (which are usually bootstrapped off these other locations), and perhaps others. A function call, even one evaluating to a compile-time constant, is not a constant expression.

One other context doesn’t require a constant but strongly wants one: the values of static variables, inside classes and methods and out. If the value is a function call, even if it computes a constant, the compiler might make it a static initialization, delaying startup.

The long and short of it is that everything in the code below is a bad idea:

int arr[] = { 1, 2, 3, 5 };
static size_t len = ArrayLength(arr); // not an error, but don't do it
void t(JSContext* cx)
{
  js::Vector<int, ArrayLength(arr)> v(cx); // non-constant template parameter
  int local[ArrayLength(arr)]; // variadic arrays not okay in C++
  typedef int Mirror[ArrayLength(arr)]; // non-constant array length
  enum { L = ArrayLength(arr); }; // non-constant initializer
  PR_STATIC_ASSERT(4 == ArrayLength(arr)); // special case of one of the others
}

In these situations you should continue to use NS_ARRAY_LENGTH (or in SpiderMonkey, JS_ARRAY_LENGTH).

mozilla/Util.h is fragile with respect to IPDL headers, for include order

mozilla/Util.h includes mozilla/Types.h, which includes jstypes.h, which includes jsotypes.h, which defines certain fixed-width integer types: int8, int16, uint8, uint16, and so on. It happens that ipc/chromium/src/base/basictypes.h also defines these integer types — but incompatibly on Windows only. This header is, alas, included through every IPDL-generated header. In order to safely include any mfbt header in a file which also includes an IPDL-generated header, you must include the IPDL-generated header first. So when landing patches using mozilla/Util.h, watch out for Windows-specific bustage.

Removing the limitations

The limitations on the type of elements in arrays passed to ArrayLength are unavoidable limitations of C++. But C++11 removes these limitations, and compilers will likely implement support fairly quickly. When that happens we’ll be able to stop caring about the local-or-unnamed problem, not even needing to work around it.

The compile-time-constant limitation is likewise a limitation of C++. It too will go away in C++11 with the constexpr keyword. This modifier specifies that a function provided constant arguments computes a constant. The compiler must allow calls to the function that have constant arguments to be used as compile-time constants. Thus when compilers support constexpr, we can add it to the declaration of ArrayLength and begin using ArrayLength in compile-time-constant contexts. This is more low-hanging C++11 fruit that compilers will pick up soon. (Indeed, GCC 4.6 already implements it.)

Last, we have the Windows-specific #include ordering requirement. We have some ideas for getting around this problem, and we hope to have a solution soon.

A gotcha

Both these methods have a small gotcha: their behavior may not be intuitive when applied to C strings. What does sizeof("foo") evaluate to? If you think of "foo" as a string, you might say 3. But in reality "foo" is much better thought of as an array — and strings are '\0'-terminated. So actually, sizeof("foo") == 4. This was the case with NS_ARRAY_LENGTH, too, so it’s not new behavior. But if you use these methods without considering this, you might end up misusing them.

Conclusion

Avoid NS_ARRAY_LENGTH when possible, and use mozilla::ArrayLength or mozilla::ArrayEnd instead. And watch out when using them on strings, because their behavior might not be what you wanted.

(Curious how these methods are defined, and what C++ magic is used? See my next post.)