In my experience writing straightforward C code is gonna give you basically the same thing as well-written asm would. I'm not saying modern compilers are perfect but they generally do the right thing (or it's very easy to coax then into doing the right thing). They usually vectorize properly, they know how to avoid obvious pipeline stalls etc. These days order of magnitude performance improvements generally don't come from hand-written assembly but from using better algorithms and data structures
Also you could look at what games of that era actually looked like. Something like Quake 2 already pushed contemporary hardware to it's limits and I wouldn't argue it's a poorly optimized game
I was imagining something that looks and plays almost exactly like Witcher 3 except at lower resolution, which doesn't seem possible to me. I'd say if you simplify the graphics (replace dynamic lighting with pre-baked lightmaps, do gouraud shading instead of PBR, remove most post-processing effects etc etc) and make all models look like half-life assets, you could definitely render something at an acceptable frame rate. However asset streaming would still be a huge issue for a large and detailed open world like Witcher 3, considering memory limits and slow disk reads