SATA Corruption Between Disks
I’ve been having all sorts of hardware troubles for the past many weeks. They always seem to come in batches. First a bad RAM stick caused massive corruption of my files as I installed two new SATA disks in my fileserver. Problem surfaced when I suddenly noticed audio glitches in my MP3 files which weren’t there before.
Thought the problem was fixed when I identified the RAM as the culprit and it largely was. Yesterday, however, I decided to test moving a bunch of files from one of the new SATA disks to the other to see if the problem was gone. If there are issues with the RAM the files would likely have corruption after the copy.
And they were corrupted. I wtf’d as I had tested this RAM thoroughly when I removed the bad stick. Retesting showed no errors and replacing it with a pair of extremely high quality RAM modules also produced no errors in memtest86.
Then I suspected the disks to be faulty but there were no errors being thrown around in the logs. By further testing I narrowed the problem down. There would only be corruption when copying between the SATA disks. Not if first copying to an IDE disk and then from that to the SATA disk. I.e.:
SATA disk (1|2) <-> SATA disk (1|2) - corruption SATA disk (1|2) <-> IDE disk <-> SATA disk (1|2) - all good
The system this was happening on was Debian Etch. Kernel 2.6.18 and I tried the latest stable 2.6.22 as well. No difference.
I think I’ve finally found the solution after hours of searching and planning massive hardware upgrade (new board, CPU, etc.). Setting EXT-P2P’s Discard Time to 1 ms instead of 30 us have so far made the problem go away. God knows what that setting does.
The board in question is an Abit AN7 with an onboard SiI 3112 chip.
Leave a comment