08.12.09

An exercise in XPCOM programming, redux

The Exercise

Last time our hero was engaged in solving this posed streams problem:

Suppose you wish to complete one conceptually simple task in stream programming: copying a stream, i.e. reading all data from one stream and writing it all into another, where both streams are nonblocking. (Such a copier might buffer data read before it can be immediately written; assume this is a requirement for the purposes of this exercise.) Suppose for the moment that there is no readily available implementation of the nsIAsyncStreamCopier interface, so you have to roll your own stream copier. In what situation is it necessary to asynchronously wait with flags = WAIT_CLOSURE_ONLY to efficiently implement stream copying?

(Refer to the original post for full background if you’re not familiar with streams.)

The Answer

Copying from one nonblocking stream (the source) to another (the sink) involves waiting for the source to be able to provide data, reading in that data, waiting for the sink to be ready to accept data, and writing out buffered data. In the simple case there’s always data available to read and always space to write it. Let’s break down the cases where these aren’t the case:

There’s no data available to read from the source
Just wait for data to be available (assuming the sink hasn’t hit an error, if it has the copy’s done)
There’s no space in the sink to write data
There’s data available to write to the sink
Wait for that amount of data, or a fraction of it, to be writable
There’s no data available to write to the sink
???

What should happen in the final alternation? Suppose you waited for some amount of data to be writable, we’ll say 1 byte. What happens when the sink becomes that far unblocked? You’d have to be notified, and if there’s still no data to write you’re back where you started. Maybe you can bump up the amount you wait for, but how far should you bump? Increase arithmetically? Double? Any amount you bump to might result in more notifications when there’s nothing you can do.

There’s a further problem with waiting for some amount of data, one you’d only know if you were familiar with the async copying interfaces: the amount you specify when calling asyncWait to request notification when the stream’s unblocked again is only a hint. That is, the implementation is free to notify whenever it wants, so long as it’s not notifying when the state of the stream is unchanged from being previously blocked and unclosed. A stream might notify whenever any data is available, even if you bump up the amount. Therefore, if you just wait for an amount of data, and wait again if you have no data to write, you’ll be notified almost immediately (after any pending tasks in the thread event loop). Repeat this a few times and suddenly you’re spinning doing nothing, which is clearly inefficient. The main async stream classes in the tree ignore the requested count precisely as described here, so this isn’t simply an academic problem that we could ignore.

Here’s where you need WAIT_CLOSURE_ONLY. Until you have data to write, you don’t care about how much can be written to the sink. What you really care about is knowing if the sink closes (or gets in an error state), so you can stop copying immediately when that happens, rather than wait (perhaps indefinitely) until you have data to write and only determine when writing it that the sink’s closed (or in error). Using WAIT_CLOSURE_ONLY whenever you haven’t hit errors but don’t have any data to write neatly solves the problem of efficiently learning if the sink dies.

01.12.09

An exercise in XPCOM stream programming

If you’ve done any programming with XPCOM, at some time you’ve probably had to work with streams. A little background in case you haven’t, then a small thought exercise:

Streams

A stream is an object from which you read data or to which you write data. In XPCOM an input stream stream is a stream from which you read data; an output stream is a stream to which you write data. In an ideal world a stream is either open (indicating data may be read or written to it) or closed (indicating that the stream is no longer readable, or that no more data can be written to it), and that’s all there is to it. File objects in Python function very much like ideal streams.

In the real world, truly useful streams have further limitations (or characteristics). How much data can be read from an input stream right now? Can a given amount of data be written to an output stream right now? Should reading or writing proceed until completion when right now isn’t possible but sometime later might be, or should it halt immediately with an error indicating that reading or writing would block program execution? One might ignore these concerns in simplistic scenarios such as those which short Python scripts might be used to address. In complex applications, particularly those which must remain responsive to user input, these concerns may be quite important. You can’t display a useful progress bar if the stream you’re reading from represents the download of a 3GB file over a slow network and reading from the stream blocks program execution.

Streams which immediately halt with an error when reading or writing would block execution are nonblocking streams. Efficient use of such streams requires a way to wait until the desired amount of data can be written to or read from a stream. XPCOM efficiently supports nonblocking streams through an asyncWait method which will notify at some later time when the desired amount of data can be written to or read from the stream, without blocking. At the moment there are two flavors of asynchronous waiting: waiting until the desired amount can be read or written, and waiting until the given stream has been closed. At the interface level, the former is indicated by flags = 0, while the latter is indicated by flags = WAIT_CLOSURE_ONLY.

The Exercise

Suppose you wish to complete one conceptually simple task in stream programming: copying a stream, i.e. reading all data from one stream and writing it all into another, where both streams are nonblocking. (Such a copier might buffer data read before it can be immediately written; assume this is a requirement for the purposes of this exercise.) Suppose for the moment that there is no readily available implementation of the nsIAsyncStreamCopier interface, so you have to roll your own stream copier. In what situation is it necessary to asynchronously wait with flags = WAIT_CLOSURE_ONLY to efficiently implement stream copying?

Hints

If you want a hint (arguably the answer, if you can interpret the code), take a look at the uses of WAIT_CLOSURE_ONLY in xpcom/io/nsStreamUtils.cpp. You may perhaps find further hints in bug 513854, the bug which brought this somewhat quirky need for flags = WAIT_CLOSURE_ONLY to my attention.

Questions?

I come to this problem with more experience and familiarity with streams than most people will have. If anything in the above description is unclear, ask questions in the comment section — I did the best I could to make the problem and its background understandable, but I may easily have done so less well than intended.