This one is driving me insane.

My main PC consists of a Gigabyte GA-Z97X Gaming 5 motherboard, socket 1150, with a core I5-4770k processor, 32GB of Kingston hyperX DDR3 Ram, an AMD 7870 graphics card, and a Samsung 850 EVO SSD. There are various random usb things like mice, wacom tables, serial devices, etc, but those are probably secondary to the current issue. The machine also has an 8 port PC serial card and a PCIE USB3 card as well. All powered by a 750W PSU, a good one.

About 18 months ago, I had to replace the motherboard with an identical one due to a total failure. It had died shortly after I got the two 40 inch 4k monitors due in fact to those monitors, or rather the issue with the DP cable I mentioned here at the time. I transferred the SSD onto the new system, plugged all the cards in, and after a certain amount of faffing around, it all started working fine. So far, so good.

Six or seven weeks ago, this newer machine suddenly decided it wouldn't boot reliably. It's normally hibernated and reboots from cold only about once a month, but it started giving STOP 7B errors, saying it couldn't find the boot device.

I pointed out to it, quite reasonably, that the boot device was exactly where it had been all along and perhaps it could look more carefully. It failed to heed my advice so I had to go in and do it myself. After a while, I found it would boot about one time in six, more or less completely randomly. I did all the normal diagnostics, replacing or removing as much hardware as I could including the PSU and ram, running ram tests, drive tests, booting under linux (which works fine), the whole thing, and came to the conclusion everything seemed to work properly. I also unplugged all the usb devices, took out the cards, stripped it right back to the motherboard and the onboard graphics with a keyboard, mouse, and HDMI monitor.

Nothing really helped, it was still iffy as hell, even after uninstalling everything I'd installed in the previous month. It wouldn't even boot into safe mode. It insisted on running the system repair every couple of boot failures, which also failed without telling me much more than 'it didn't work'. Finally I backed it up, then reformatted and restored a backup from a week ago.

That did the same thing.

So did one from a month before, and six months before THAT.

All of them, restored onto a different HD, would boot perfectly happily on an older machine with a core 2 quad processor in it. But that new HD failed in the same way in this machine.

OK, I thought, the motherboard has an issue with life, let's swap everything over to the identical brand new one I bought at the same time as part of my cunning plan that's been sitting in the cupboard ever since. Which I did.

Same problem.

Over the next week I restored that sodding thing more times than I care to remember, tweaked boot records, faffed around with hand building partitions, the lot. All I managed to do was waste time and make it even less reliable, so in the end I gave up.

I wiped the lot, reinstalled windows, all the drivers, all the applications, all the data, re-entered all the security codes and keys for about sixty programs, and wasted another week making a whole new system. I'd resisted doing this because it's so time consuming and irritating. This worked fine and I thought that was the end of the problem.

Until three days ago when it did exactly the same damn thing, except this time it always gives a STOP F4 error on boot. It also reliably boots into safe mode, so yay for that.

Again, it will occasionally boot properly, and having done so work perfectly. Playing around with it the only things I could find that would provoke a repeatable behaviour was to either change something big in the bios, then reboot, at which point it would normally boot properly exactly once, or remove the AMD graphics drivers, which would make it boot reliably but at low resolution.

Aha, I thought, it must be the graphics card with some odd subtle fault. The thing had once or twice given some slight graphics corruption so it seemed plausible, perhaps it was finally feeling the effect of the bad cable. I acquired another one of the same type, a Sapphire one rather than an MSI one, which turned up today. Quick swap of cards and... Exactly the same thing.

Great.

Much testing later, and I have got no idea what the hell is going on.

So far I have:

Swapped out the motherboard with the spare one.
Taken all the ram out and replaced it with a single 4GB stick that's known good from another machine.
Removed all the cards and gone back to the built in graphics.
Unplugged all the USB devices and reverted to a PS/2 mouse and keyboard
Swapped out the PSU (again).
Tried a different monitor.
Unplugged all the drives other than the boot one.
Disconnected any unused cables and replaced the ones in use with new ones.
Sworn a lot.
Booted in safe mode and set the thing to selective boot, then turned off EVERYTHING. It boots with less running than safe mode has, which make it boot really fast, true, but you can't do anything with it.
Uninstalled all the drivers for all the unused devices and a lot of the used ones

None of it has made any difference at all.

Boot logging shows nothing useful, when it goes bang it doesn't leave a boot log or a memory dump. It seems to crash immediately after loading the last system device, if I turn on OS boot information I get to the point it tells me the machine type and OS version then it falls over.

I'm stumped. I have no idea what's wrong with it, how to find out, or how to fix it. The only thing this particular iteration of the machine has in common with the one from four days ago is the processor, box, and OS. I'm 90%+ sure there's no hardware issue, either with this hardware or all the other parts lying around the room. The original graphics card MIGHT have a fault, but if so it's not the one that's doing this.

If it's a driver fault, which is plausible, how do I find it? With all the third party stuff allegedly disabled, it would presumably have to be a microsoft one, but I have no idea which.

The more than annoying final point is that it, just as last time, will every now and then randomly boot perfectly happily and then sit there running without any issues at all, looking smugly at me. I can hibernate it and resume it perfectly well, which is a sort of workaround, right up to the point I actually need to reboot it. Bearing in mind this is windows such a thing comes up sooner or later no matter what.

Starting yet again with a clean install is not something I'm prepared to do for the fairly obvious reason that until I know what's going on it would most likely do the same thing all over again. The machine was fine up to a month or so ago, and this version isn't really that version anyway. Switching to linux or Macos is not an option either, I run a lot of very definitely windows only stuff I can't use with either.

I have to admit I'm on the verge of tossing the entire collection of junk into the rubbish and giving up on computers entirely. Possibly buying a whole new computer would fix it, possibly not, but I can't afford to do that right now regardless. I've wasted weeks on this, quite a lot of money, fallen behind in my work, and raised my blood pressure, all to no effect.

If anyone has any helpful suggestions on where to look next for a solution I'd certainly love to hear them...

pca


Edited by pca (03/12/2016 21:26)
_________________________
Experience is what you get just after it would have helped...