08.12.09

An exercise in XPCOM programming, redux

The Exercise

Last time our hero was engaged in solving this posed streams problem:

Suppose you wish to complete one conceptually simple task in stream programming: copying a stream, i.e. reading all data from one stream and writing it all into another, where both streams are nonblocking. (Such a copier might buffer data read before it can be immediately written; assume this is a requirement for the purposes of this exercise.) Suppose for the moment that there is no readily available implementation of the nsIAsyncStreamCopier interface, so you have to roll your own stream copier. In what situation is it necessary to asynchronously wait with flags = WAIT_CLOSURE_ONLY to efficiently implement stream copying?

(Refer to the original post for full background if you’re not familiar with streams.)

The Answer

Copying from one nonblocking stream (the source) to another (the sink) involves waiting for the source to be able to provide data, reading in that data, waiting for the sink to be ready to accept data, and writing out buffered data. In the simple case there’s always data available to read and always space to write it. Let’s break down the cases where these aren’t the case:

There’s no data available to read from the source
Just wait for data to be available (assuming the sink hasn’t hit an error, if it has the copy’s done)
There’s no space in the sink to write data
There’s data available to write to the sink
Wait for that amount of data, or a fraction of it, to be writable
There’s no data available to write to the sink
???

What should happen in the final alternation? Suppose you waited for some amount of data to be writable, we’ll say 1 byte. What happens when the sink becomes that far unblocked? You’d have to be notified, and if there’s still no data to write you’re back where you started. Maybe you can bump up the amount you wait for, but how far should you bump? Increase arithmetically? Double? Any amount you bump to might result in more notifications when there’s nothing you can do.

There’s a further problem with waiting for some amount of data, one you’d only know if you were familiar with the async copying interfaces: the amount you specify when calling asyncWait to request notification when the stream’s unblocked again is only a hint. That is, the implementation is free to notify whenever it wants, so long as it’s not notifying when the state of the stream is unchanged from being previously blocked and unclosed. A stream might notify whenever any data is available, even if you bump up the amount. Therefore, if you just wait for an amount of data, and wait again if you have no data to write, you’ll be notified almost immediately (after any pending tasks in the thread event loop). Repeat this a few times and suddenly you’re spinning doing nothing, which is clearly inefficient. The main async stream classes in the tree ignore the requested count precisely as described here, so this isn’t simply an academic problem that we could ignore.

Here’s where you need WAIT_CLOSURE_ONLY. Until you have data to write, you don’t care about how much can be written to the sink. What you really care about is knowing if the sink closes (or gets in an error state), so you can stop copying immediately when that happens, rather than wait (perhaps indefinitely) until you have data to write and only determine when writing it that the sink’s closed (or in error). Using WAIT_CLOSURE_ONLY whenever you haven’t hit errors but don’t have any data to write neatly solves the problem of efficiently learning if the sink dies.

2 Comments »

  1. Sorry, but I can’t see how you even get into your final state. If you have no data, you wait on the source, not the sink. Once you have data, you wait on the sink, not the source. Eventually you have no data again, and the cycle repeats.

    Comment by Neil — 09.12.09 @ 07:19

  2. The problem with waiting on only one of the source or the sink is that if the other closes while your back’s turned, you don’t know about it. Since you don’t know about it, this failed copy is going to remain in progress potentially for much longer than necessary. In some cases in the server, where I needed to think about these things, the delay could be arbitrarily long as the request handler might have chosen to send data at some extremely slow rate. (The server needs response timeouts, to be sure, but that’s a separate concern.) Once you know nothing more remains to be done, the most efficient thing to do is to stop doing useless work as soon as you can — and for that, you need to wait until the source closes rather than just ignoring it entirely for some period of time.

    Comment by Jeff Walden — 09.12.09 @ 10:47

RSS feed for comments on this post. TrackBack URI

Leave a comment

HTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>