the bootleneck for the low NVMe performances are the Pci-Ex Lines managed from the processor and the CHIPSET.
On real workstations, every processor must manage 40/44 lines PCI-EXPRESS.
Consumers processors manage 24 lines and the bottleneck is sure!
For this reason real professional workstations are configured with two Xeon E5 2600 processors on socket LGA 2011 v3 and chipset C612. The second xeon processor is used for Amplify the pci-express lines.
Workstastions Entry level/low cost are made with a single processor extreme version as I7 6950x on socket LGA 2011 v3 and chipset X99, or the new i9 family on sk LGA 2066 and chipset X299.
In the workstation/server motherboards manual, you can found PCI-Ex slots connected directly with CPU, and the lines PCI-EX connected with chipset.
Primary expansions cards with very high bandwidth as High End GPUs, storage NVMe, and raid controllers, must be linked on processor bus.
Scondary expansion cards with low/medium bandwidth on lines linked with controller bus.
Another suggestion is about RAID 0 with SSDs(SATA,NVME)... Never do this configuration! With RAID 0 TRIM and garbage on SSDs are disabled from S.O.
With RAID0 SSDs max performances are limited to 4000 writes! If you want RAID 0 you need SAS SSDs and SAS HW conroller DC class.
The last suggestion is about GPUs. Never use high end consumer/gaming GPUs on workstations, Consumer GPUs are streams limited to 2. Real GPU for workstation are all professional GPUs with stream unlimited and sync card support
In general workstations and server configuration isn't simply as consumers or gaming PCs. Don't try to assembly yourself if you don't know computer architecture.