Sometimes features, independently acceptable on their own, can combine to produce peculiar results.
SQL parallelism is simply the query optimizer breaking up tasks to different schedulers. A single query can go parallel in several different parts of the query plan. There is a significant cost associated with separating the threads and then re-assembling them so not all queries will go parallel. To find out if queries are going parallel you can take a look at the plan cache.
Since GHz hasn’t increased in a long time but core count in going through the roof, it makes sense to have a controller thread delegate to the minions. CXPACKET waits will increase when queries have to go parallel. Missing indexes and bad queries can cause queries to go parallel.
The CXPACKET wait is incredibly complex. There are ways to make it go away without really fixing a problem. For example, setting max degree of parallelism (MAXDOP) to 1 will certainly make CXPACKET go away. Increasing the cost threshold for parallelism higher than the cost of your queries will also make CXPACKET go away. The goal isn’t to make CXPACKET go away. The goal is to make queries faster, not to fix the waits.
Recently, in my short career as a SAN admin I have been exposed to automatic storage tiering. With storage tiering we are taking pools of storage, with different performance characteristics, and attempting to spread the workload across the different pools drives. Ideally, the pool’s IOPS capability matches the demand for IOPS. Ideally, data that doesn’t get accessed that often will get put on slower cheaper storage. Ideally this reduces the need to identify archive workloads up front because the back-end storage solves most of that problem. Ideally management will buy enough storage so that everything isn’t running on disk pools with archive characteristics. My point is storage tiering doesn’t always work as well as advertised. Storage tiering is a cost saving maneuver which can cause a lot of inconsistent performance. Inconsistent performance leads to a lot of headaches.
If we combine these two features, some threads on that parallel query could be hitting cheap storage and other minion threads are hitting SSDs. The result is some threads are fast and other threads are slow. This will send CXPACKET waits into orbit. When CXPACKET waits are high, they generally mask any other types of system issues. There are many causes of CXPACKET waits and inconsistent storage performance could be one of them.
Parallelism is a feature to alleviate CPU bottlenecks. With storage tiering, the bottlenecks can quickly shift from storage to CPU and back. So the cost increase of going parallel can sometimes be for nothing if a single thread is waiting on storage.
I apologize if you came here looking for some kind of recommendation. The fact is, storage tiering can be a nightmare. Performance troubleshooting is an ever changing game that I am fighting to stay ahead in.