ELF binary comparison

Posted by: tonyc

ELF binary comparison - 07/01/2002 10:49

By day I work a lot with C/C++ under Solaris 8. The binaries we generate are ELF format. I would like to provide my configuration management folks with a reliable way to run diffs after a build. Not diffs of the source files, that's easy... But they want to be sure that the *binary* files that are generated are functionally equivalent between builds.

For instance. Let's say we have build 2.0 of our software and they need to make a patch. They run a build, but if they use "diff" or "sum" on the resulting shared objects, every single shared object will show up as a difference, even though only one of them might have actually had any code changed. This is because there is some dynamic data which the linker seems to put into each binary, so even if you build from the same source code twice, you get different binaries. This is, of course, a feature rather than a bug.

The obvious answer is to know which object files are linked into which shared objects, but I'd like to give them an "idiot proof" way to show which shared objects have had *code* changed. I tried experimenting with "elfdump" but had no luck finding options which could be used to determine if the binaries contain the same code.

Any help is appreciated.
Posted by: peter

Re: ELF binary comparison - 07/01/2002 11:43

I don't know how, but it must be possible as this is how three-stage bootstraps of GCC are done. Your native compiler compiles a temporary GCC (xgcc), then xgcc compiles a proper GCC, then that GCC compiles a third GCC. The second and third GCC's are then binary diffed to determine whether some compiler along the way was faulty.

Good luck in finding, in the labyrinthine GCC build system, exactly how this is done.

Peter
Posted by: Captain_Chaos

Re: ELF binary comparison - 07/01/2002 15:02

Perhaps you could strip the dynamic information from the executable before comparing it?

/Pepijn
Posted by: tonyc

Re: ELF binary comparison - 07/01/2002 15:20

I thought about that, but how?
Posted by: Captain_Chaos

Re: ELF binary comparison - 07/01/2002 15:28

Using the strip command. I'm not sure exactly which parts it strips (or whether it has options to strip different parts), but try it out, perhaps it'll work.

/Pepijn
Posted by: wfaulk

Re: ELF binary comparison - 07/01/2002 16:57

So you're saying that if you take the same source file and pass it through the same compiler two different times, you get two different outputs? What kind of non-deterministic bullshit is that? Seriously, I admin Solaris professionally (at least when I have a job), and I've never seen this behavior. What compiler are you using? Sun's Forte/Workshop or gcc (or something else)? If gcc, are you using the GNU ld or the ld that comes with Solaris?
Posted by: tonyc

Re: ELF binary comparison - 07/01/2002 17:42

I'm talking about the shared object (.so file) not the .o objects. Obviously the compiler is deterministic and should produce the same object file output. But I guess when the linker bundles everything together in a .so, it adds some kind of header or timestamp or *something* that makes it different.

FWIW we use Workshop.
Posted by: wfaulk

Re: ELF binary comparison - 07/01/2002 17:47

Let me know exactly which flags you're handing to ld to generate your shared objects and let me play around with it for a while. What version of WorkShop are you using? Does it supply it's own ld or is it using /usr/ccs/bin/ld? (It's been a while since I've played with WorkShop, but it'll come back to me.)
Posted by: Roger

Re: ELF binary comparison - 07/01/2002 20:25

It's actually quite likely to generate different output -- consider what happens if I use __DATE__ and __TIME__ in my C++ source code.

It's also quite likely that the compiler and linker are including some other kind of timestamp information in the binary.