The kernel normally buffers pretty much everything. If one uses O_DIRECT when opening files, then the page cache will be bypassed, and I/O will happen directly on the userspace buffers. This is great for doing bulk copies and the like. But the kernel still batches sequential sectors together, as otherwise I/O throughput would be terrible (think, 1MB/sec rather than 80MB/sec from a modern drive).

For sector-by-sector copy/recovery of a partially bad disk, I prefer to use a very low-level driver API to force sector-at-a-time whenever an error is reported. I wrote a throwaway tool for this once, but have lost it. It used the IDE_TASKFILE interface (ioctls).

Cheers