The problem with both methods, although arising from slightly different reasons, is with rotation between the images. In the second case, the aircraft has to fly in a dead straight line relative to the ground, or the images fit along either a curve (if it's turning) or a slanted line (in the case of crosswinds). Of course both simultaneously are possible.
In the first case, if the aircraft is yawing about it's center axis, the two images will be rotated relative to each other.
Output from a small gyro compass could be used to correct for both of these phenomena. If you can get the rotational position at the time the frame is captured (you need to integrate because the gyro provides rates [dr/dt]) then you can rotate the images by that amount prior to stitching them together. Cameras attached to yaw-axis gyro-stabilized gibals could also correct for small instability, but it would probably affect the flight dynamics.
There is a technology called a "MEMS" gyro (Microelectromechanical Systems) that is basically a gyro in a chip. They are fairly inexpensive. A friend of mine is using them for navigation system design where we used to use ring-laser gyros (many years ago when I worked on these systems as an engineer). Much easier and less expensive -- probably in the range of a crazy rocket-scientist hobbyist...
Using MEMS, it's possible to make a full-blown IMU (internal measurement unit) fairly inexpensively. The IMU has 3 gyros and 3 accelerometers on orthogonal axes. By integrating the outputs, you can know the position of the device over time. For short flights, using cheap components, you could know the position and orientation of the vehicle much, much more precisely than GPS (and in all 6 axes).