Link GPU coordinate producer and consumer tasks
The event synchronizer indicating that coordinates are ready in the GPU
is now passed to the two tasks that depend on this input: PME and
X buffer ops. Both enqueue a wait on the passed event prior to kernel
launch to ensure that the coordinates are ready before the kernels
start executing.
On the separate PME ranks and in tests, as we use a single stream,
no synchronization is necessary.
With the on-device sync in place, this change also removes the
streamSynchronize call from copyCoordinatesToGpu.
Refs. #2816, #3126.
Change-Id: I3457f01f44ca6d6ad08e0118d8b1def2ab0b381b
14 files changed: