L7HM77

joined 4 weeks ago
[–] L7HM77@sh.itjust.works 2 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

Rambling StoryOnce, I had an El Cheapo and very questionable SATA SSD fail on my system. Had similar symptoms, Windows would hang and crash at random, becoming more frequent over time. Found out while digging through Windows logs and troubleshooting, that the system would crash when trying to access the drive via the file explorer, because the drive would disconnect. The SSD seemed to fail slowly, but I was using it as a faster workspace and saving everything to an HDD, so I never looked into the possibility of a failing drive until the system wouldn't boot. Removing the drive cured everything. I should probably note that the failed SSD wasn't the boot drive, it was used strictly for data, so the OS wasn't being unmounted directly. I think the drive itself was shorting out some of the SATA pins, scrambling the whole bus.

Several years later, on the Linux side of things, I found out that fstab can prevent booting if a storage device is missing. Fstab had auto configured an external drive enclosure as a critical component on a fresh install. Not sure what the error messages would look like if an internal data drive mounted as critical disconnected on a running system, but I would assume Linux would halt even if no processes are running from the drive.

I'm not sure what the symptoms would have been if my SSD drive failed while running Linux. My gut says it would show similar to your Linux dmesg, like the boot drive I/O disconnecting or becoming inaccessible.

I've also had a system with an AMD processor fail to boot, but that one wouldn't even POST. Fixed that one by finally reseating the CPU. Turns out that's a common issue with some AMD CPUs using the AM4 socket, found a lot of complaints online for that one after the fact.

Since your system runs fine from a live USB, and you've already replaced the M.2 drive, I would try running the system without any SATA drives installed, and try to force a crash until you feel confident the issue is gone.

If the problems still persist, then I would look at getting a cheap fresh HDD and new SATA cable, installing a temporary OS, and try the test again.

If it STILL crashes, I would look at removing all unnecessary hardware from the motherboard and slowly testing each stage as you rebuild.