The core if it boils down to, when emulating older machines, is the consoles processor speaks language A, and our computers all speak language B
The emulator has to translate back and forth between A<->B faster than the speed the original processors would've just spoken A
So translating A<->B is a way tougher task than just reciting A. So you need a tremendously better CPU than what the console had to emulate it.
It's kinda like, Dropping a rock in a pile of sand is easy. Simulating dropping that rock into the pile of sand in real time accurately is really challenging.