Sounds like what you're building today is pretty much what the very original JVM was intended to be: a lightweight way of shipping code to run on set-top boxes and other devices with very limited resources.

I suspect that you'll eventually go down the same route as everybody else, moving from bytecode to some sort of internal representation where you can run an optimizer (e.g., to hoist bounds checks outside of a loop). Given your memory constraints, I can't just recommend that you adopt Dalvik or equivalent wholesale, since they're going to want to run a full-blown optimizing compiler. However, if you could run that elsewhere and just run the compiled output on your embedded device, then you might be in good shape.

FYI, there's a thing called J2ME (https://en.wikipedia.org/wiki/Java_Platform,_Micro_Edition) meant explicitly for your sort of constrained environment, and it appears to still be under active development. I have no idea what the licensing terms are, so caveat emptor.

And, lastly, I'm no lawyer, but given how Oracle is suing Google over Android's use of Java, make sure your own lawyers are cool with what you're doing.