The story so far

When the epidemic of dying GPUs started, the general consensus surrounding the reason behind it was pointing towards uncapped frames. See, most games these days limit the frame rate in menus. So, when you load into a game, it doesn’t suddenly jump up to 500FPS then dip back down to 100 when you’re actually in the game. New World’s beta didn’t have that. Sudden jumps in frame rates stressed the card to the point where the cooling couldn’t keep up with its demands, or so was assumed. Once the issue was acknowledged by EVGA, they supposedly accepted all RMAs and sent the affected customers a replacement card without even waiting for the broken card to ship to them first. EVGA would not disclose the total number of 3090s sold in contrast to the ones affected. But they did confirm that “less than 1%” of the cards from the total yield were bricked. It’s also noteworthy that EVGA charged extra to accelerate RMA requests as compared to the MSRP at the height of this situation.

It’s not the fan controller

Many thought that the micro-controller of the fans was giving out and could not keep up with the amount of frames being generated. Supposedly, the fan-controller was not working properly so when temperatures on the card suddenly jumped up due to high frames in menus, the fans would spin at insanely high RPMs to cool down the GPU. This was thought to be the cause of failure behind the cards failing. EVGA, however, dismissed this claim entirely and gave another explanation for why the fan-controller was being shown as broken in various monitoring software. What actually happened was, the i2c bus on the PCB was creating noise but hardware monitoring tools such as HWInfo could not pick that up properly. They were, instead, reporting the noise as fans spinning at unbelievably high RPMs, aka a fan-controller failure. EVGA’s own Precision X1 software was, however, reported it correctly. Since this issue was highlighted, EVGA has already released an update to their micro-controllers that coordinate with third-party software better, showing the correct fan RPMs. If you were one of the handful that were affected, EVGA recommends downloading the Precision X1 software. Once installed, it will let you update the micro-controller easily. Furthermore, make sure the monitoring software you’re using is up to date as well.

So, what the heck really killed all those 3090s?

Internal investigation at EVGA revealed that cards produced the year prior, in 2020, had “poor workmanship” on the PCB around the MOSFET circuits. EVGA was able to come to this conclusion by putting the two dozen faulty cards they received under an X-Ray machine and analyzing them thoroughly. Moreover, the spokesman stated that the fan controller was not at fault here, as we already discussed.

Quality Control concerns

Now that it’s clear that poor PCB design was the reason for all of this horror, it begs the question: why didn’t EVGA fix this before? EVGA cared enough to change the PCB design starting 2021 which means they were aware of the loose tolerances around the electrical circuits. But instead of revising the cards themselves, they chose to sit quiet. And it was only when tragedy struck that EVGA decided to RMA the cards and compensate the affected users. Putting all of that into context, it paints a clear picture that EVGA knew but did not care. They sold the units knowing that they had design flaws. Even if we consider that EVGA was unaware of this for a moment, it then speaks for their Quality Control. A issue that existed last year was unable to be highlighted until almost a year later. All those cards passed QC tests and were shipped to customers with a major problem, only to be rectified months later.

EVGA RTX 3090s Were Dying Due To Poor PCB Design and Not The Fan Controller - 44EVGA RTX 3090s Were Dying Due To Poor PCB Design and Not The Fan Controller - 82EVGA RTX 3090s Were Dying Due To Poor PCB Design and Not The Fan Controller - 30