08.12.09

An exercise in XPCOM programming, redux

The Exercise

Last time our hero was engaged in solving this posed streams problem:

Suppose you wish to complete one conceptually simple task in stream programming: copying a stream, i.e. reading all data from one stream and writing it all into another, where both streams are nonblocking. (Such a copier might buffer data read before it can be immediately written; assume this is a requirement for the purposes of this exercise.) Suppose for the moment that there is no readily available implementation of the nsIAsyncStreamCopier interface, so you have to roll your own stream copier. In what situation is it necessary to asynchronously wait with flags = WAIT_CLOSURE_ONLY to efficiently implement stream copying?

(Refer to the original post for full background if you’re not familiar with streams.)

The Answer

Copying from one nonblocking stream (the source) to another (the sink) involves waiting for the source to be able to provide data, reading in that data, waiting for the sink to be ready to accept data, and writing out buffered data. In the simple case there’s always data available to read and always space to write it. Let’s break down the cases where these aren’t the case:

There’s no data available to read from the source
Just wait for data to be available (assuming the sink hasn’t hit an error, if it has the copy’s done)
There’s no space in the sink to write data
There’s data available to write to the sink
Wait for that amount of data, or a fraction of it, to be writable
There’s no data available to write to the sink
???

What should happen in the final alternation? Suppose you waited for some amount of data to be writable, we’ll say 1 byte. What happens when the sink becomes that far unblocked? You’d have to be notified, and if there’s still no data to write you’re back where you started. Maybe you can bump up the amount you wait for, but how far should you bump? Increase arithmetically? Double? Any amount you bump to might result in more notifications when there’s nothing you can do.

There’s a further problem with waiting for some amount of data, one you’d only know if you were familiar with the async copying interfaces: the amount you specify when calling asyncWait to request notification when the stream’s unblocked again is only a hint. That is, the implementation is free to notify whenever it wants, so long as it’s not notifying when the state of the stream is unchanged from being previously blocked and unclosed. A stream might notify whenever any data is available, even if you bump up the amount. Therefore, if you just wait for an amount of data, and wait again if you have no data to write, you’ll be notified almost immediately (after any pending tasks in the thread event loop). Repeat this a few times and suddenly you’re spinning doing nothing, which is clearly inefficient. The main async stream classes in the tree ignore the requested count precisely as described here, so this isn’t simply an academic problem that we could ignore.

Here’s where you need WAIT_CLOSURE_ONLY. Until you have data to write, you don’t care about how much can be written to the sink. What you really care about is knowing if the sink closes (or gets in an error state), so you can stop copying immediately when that happens, rather than wait (perhaps indefinitely) until you have data to write and only determine when writing it that the sink’s closed (or in error). Using WAIT_CLOSURE_ONLY whenever you haven’t hit errors but don’t have any data to write neatly solves the problem of efficiently learning if the sink dies.

01.12.09

An exercise in XPCOM stream programming

If you’ve done any programming with XPCOM, at some time you’ve probably had to work with streams. A little background in case you haven’t, then a small thought exercise:

Streams

A stream is an object from which you read data or to which you write data. In XPCOM an input stream stream is a stream from which you read data; an output stream is a stream to which you write data. In an ideal world a stream is either open (indicating data may be read or written to it) or closed (indicating that the stream is no longer readable, or that no more data can be written to it), and that’s all there is to it. File objects in Python function very much like ideal streams.

In the real world, truly useful streams have further limitations (or characteristics). How much data can be read from an input stream right now? Can a given amount of data be written to an output stream right now? Should reading or writing proceed until completion when right now isn’t possible but sometime later might be, or should it halt immediately with an error indicating that reading or writing would block program execution? One might ignore these concerns in simplistic scenarios such as those which short Python scripts might be used to address. In complex applications, particularly those which must remain responsive to user input, these concerns may be quite important. You can’t display a useful progress bar if the stream you’re reading from represents the download of a 3GB file over a slow network and reading from the stream blocks program execution.

Streams which immediately halt with an error when reading or writing would block execution are nonblocking streams. Efficient use of such streams requires a way to wait until the desired amount of data can be written to or read from a stream. XPCOM efficiently supports nonblocking streams through an asyncWait method which will notify at some later time when the desired amount of data can be written to or read from the stream, without blocking. At the moment there are two flavors of asynchronous waiting: waiting until the desired amount can be read or written, and waiting until the given stream has been closed. At the interface level, the former is indicated by flags = 0, while the latter is indicated by flags = WAIT_CLOSURE_ONLY.

The Exercise

Suppose you wish to complete one conceptually simple task in stream programming: copying a stream, i.e. reading all data from one stream and writing it all into another, where both streams are nonblocking. (Such a copier might buffer data read before it can be immediately written; assume this is a requirement for the purposes of this exercise.) Suppose for the moment that there is no readily available implementation of the nsIAsyncStreamCopier interface, so you have to roll your own stream copier. In what situation is it necessary to asynchronously wait with flags = WAIT_CLOSURE_ONLY to efficiently implement stream copying?

Hints

If you want a hint (arguably the answer, if you can interpret the code), take a look at the uses of WAIT_CLOSURE_ONLY in xpcom/io/nsStreamUtils.cpp. You may perhaps find further hints in bug 513854, the bug which brought this somewhat quirky need for flags = WAIT_CLOSURE_ONLY to my attention.

Questions?

I come to this problem with more experience and familiarity with streams than most people will have. If anything in the above description is unclear, ask questions in the comment section — I did the best I could to make the problem and its background understandable, but I may easily have done so less well than intended.

07.11.09

I know what you Googled this summer, last summer, and the summer before (but not much before then)

Tags: , , , , — Jeff @ 17:09

Google collects a lot of information about its users. Or, more accurately, users give an awful lot of information to Google. (If you hadn’t guessed, I have little sympathy for people who complain about Google invading their privacy: if you don’t like the ways Google can use the information you give it, don’t use Google.) It’s therefore not surprising Google comes in for a good share of complaints about its “invasions of privacy” or some similar alarmism. Recently I stumbled across mention of one service Google now provides to give users insight into what information Google tracks about them: Google Dashboard, a one-stop shop directing you to modifiable views of much of the information Google has recorded about your interactions with it. It currently covers these Google services:

  • General account details (password, email address, &c.)
  • Alerts
  • Calendar
  • Contacts
  • Docs (& Spreadsheets)
  • Gmail
  • iGoogle
  • Orkut
  • Product Search
  • Profile (the link you see at the top of results if you search for a person who’s created one and made it publicly available)
  • Reader
  • Talk
  • Web History
  • YouTube

These services “are not yet available in this dashboard”:

  • Google App Engine
  • Google Groups
  • Google Book Search
  • Google Subscribed Links
  • Google WiFi

Skimming through the data yields this information about me, at a general level:

  • Searching (since May 12, 2007):
    • Total searches: 16026 (speculation on where that puts me overall by searches/day? I’m guessing top 5%, probably an even smaller percentage)
    • Total sponsored results viewed: 23
    • Total sponsored results viewed from searches with no intention of buying anything (i.e. I searched to learn information not meant for my potential use in making a purchase): 17
    • Total sponsored results which resulted in purchases: definitely 1, maybe 2 depending how broadly you define “purchase”, possibly 3 if you count one as minimally contributing to an eventual purchase that was ultimately made based on recommendations from friends
    • Total sponsored results clicked resulting in purchases not previously planned: 0
    • I’d always thought advertising basically doesn’t work on me; this seems like solid numerical evidence of that
  • I basically haven’t touched my calendar in over two years (not surprising, I’ve never had success keeping and regularly using a calendar)
  • I’ve created two docs/spreadsheets (one to track acid3 progress, one to track shared apartment/utility/etc. expenses with Jesse)
  • I have 12450 conversations in Gmail (most of it just archival storage of my college dorm’s mailing list, some other mail I’ve mostly ignored)
  • I have a tab and a theme in iGoogle, which I basically never use (prefer Ctrl+K in Firefox, or the non-customized home page)
  • I have one album in Orkut with nothing in it (probably auto-generated in the days when I was thinking of investigating Orkut’s JavaScript sandboxing implementation like I did Facebook’s; the account’s otherwise dormant)
  • I have four items in a Google shopping list, all dating back almost five years ago, all of which I still don’t have (“need” is far too strong a word for any of them)
  • I have 61 Reader subscriptions
  • No contacts in Talk, not even sure I’ve used it since it first came out
  • My YouTube account information until just now claimed I still live in Cambridge, MA

Of course, the search part is the most interesting bit, but there’s still a little gravy for me in the data on the other services. Does Dashboard reveal anything interesting to you about your interactions with Google?

Edit: Something else worth noting, after further exploration: their current UI for examining manual route changes in maps is clearly more prototyped than polished. It appears that every route change shows up as its own “search” in the map history UI, which results in dozens of “searches” showing up for viewing a single set of directions and modifying them to reflect some other choice of roads. (Except when I merely want to place a location on a map, I change the automatically-determined route nearly every time because I can’t bike on freeways like US-101, and nearly every generated route traveling up or down the peninsula uses it.)

23.10.09

pbcopy and pbpaste for Linux

Tags: , , , , , , — Jeff @ 14:43

Mac OS X has the useful commands pbcopy and pbpaste. pbcopy reads the contents of standard input into the clipboard; pbpaste writes the contents of the clipboard to standard output. These commands aren’t part of the standard set of commands on Linux, but they’re easily added. Simply install the XSel program via a package management system or directly from source, then add these lines to ~/.bashrc. Voilà! Easy commandline access to the clipboard.

alias pbcopy='xsel --clipboard --input'
alias pbpaste='xsel --clipboard --output'

16.10.09

Working on the JS engine, redux

In a continuation of a topic started by mrbkap, I present you this gem of a gdb command I needed to use today:

cond 8 \
  (*$9 && \
   ((*$9)->id&7) == 4 && \
   (*(jschar**)((uintptr_t) (*$9)->id + 4))[0] == 'o' && \
   (*(jschar**)((uintptr_t) (*$9)->id + 4))[1] == 'f' && \
   (*(jschar**)((uintptr_t) (*$9)->id + 4))[2] == 'f')

For minimal background, breakpoint 8 was the result of watch *$9, and of course $9 = (JSScopeProperty **) 0x7ffff2f08038.

By the way, did you know the gdb command line supports line continuations? I didn’t, before I had to think about how the above command would display without any. :-) This is the above command as I originally wrote it:

cond 8 (*$9 && ((*$9)->id&7) == 4 && (*(jschar**)((uintptr_t) (*$9)->id + 4))[0] == 'o' && (*(jschar**)((uintptr_t) (*$9)->id + 4))[1] == 'f' && (*(jschar**)((uintptr_t) (*$9)->id + 4))[2] == 'f')

« NewerOlder »