"Raw bitmap" means sending the little header that the VFD requires and then sending all vfd_width by vfd_height bits of data to the display, rewriting the entire screen every frame. The source of the data could be hijack's png (which would require decoding PNG, and be a waste of time), or the raw version, or reading /dev/display. No matter how you do it you still need to process the data, it's 2-bit (or 4, really) and needs to be converted to 1-bit. Then it's row-wise, and most of the VFDs expect pixels column-wise, just to be annoying, so you have to write it out 'sideways'.

But the amount of data that can be sent down the serial is limited, and much of the screen stays the same most of the time (except for some visual modes), so drawing the entire frame every frame is a big waste of bandwidth when you could just say "change this pixel to off, and this one to on, draw a line here, draw a bitmap there" etc. The clever bit is trying to do it quickly (which usually, but not always, means with the fewest bytes sent to the VFD), but you film it, and then play it back.. backwards.