Is that only for the first press of a sequence, or slow for the others, too?

Is it still slow if you feed one press, wait for drives to spindown, then feed another?

Perhaps it is paging libc (the C library), since you're using the high level stream I/O utils, rather than simply calling open(), write(), and close() directly (which map to kernel syscalls, and shouldn't cause any new libc activity).

???