I was under the assumption that every thread would use it's own stack, which is default 8k, which is the same for normal processes. If I look at the difference in size however, I see a major difference :
output from top :
PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
26 root 0 0 96 96 64 S 0 0.0 0.8 0:00 ds2
321 root 0 0 312 312 236 S 0 0.0 2.8 0:00 displayserve
threaded = 312-236 = 76 k
mprocess = 96 - 64 = 32 k
running the free command before and after starting threaded ds shows a memory increase of 92 k, which would indicate two stacks (8k+8k) are used, as there are two threads running after startup.
Context-switches are probably a lot more efficient using threads, but aren't an issue when another realtime app is running, which will always cause efficient context-switching hard to guarantee.
I'll try to experiment some more; I ran these tests with a very old version of displayserver and I will try to put some load on the threads/processes.
I was able to open 10+ applets concurrently with the player app in mprocess mode, while in threaded mode it would shut down the complete app when trying it. Probably also caused by the real-time nature of the player app. It probably also happens in the mprocess-version, but isn't noticed as only the last process is terminated...