Quote:

for (j = 0 ; j < numFrames ; j++)
tempVal[j] += buffer[j];


Unroll the loop (some compilers may partially do this already for you), and reverse the loop index to count down to zero (saves 1-2 instructions, usually).

So, something like:Code:

// Assumes numFrames is multiple of 4
for ( j = numFrames - 4; j != 0; j -= 4) {
tempVal[j] = buffer[j];
tempVal[j+1] = buffer[j+1];
tempVal[j+2] = buffer[j+2];
tempVal[j+3] = buffer[j+3];
}


Switching to pointers rather than table indexes might save another couple of instructions per iteration.

But for *real* speed, one could use MMX/SSE(2) etc.. instructions. Any experts here?

Cheers