(and thus the root of the problem: player s/w trying to maintain a duplicate of the kernel's own cache, thus taking near 2X the RAM in doing so)
I'm not sure what you mean by this. There certainly isn't enough RAM in the player for the kernel's block cache to get to anything like the same size as the player's chunk cache. The player uses mlock, so it'd be more accurate to say that the player "successfully overrides" the kernel's cache, rather than "tries to maintain a duplicate". The extra memory used (as compared to, say, using O_DIRECT; mmap was buggy in ARM kernels that old) is the size of a single chunk: 64K. Which the kernel then evicts from (what it finds to be) its very meagre block cache when then next 64K is requested.
Peter