Fixes for Power7 big-endian
Now compiles and passes all tests in both double and single precision
with gcc 4.9.3, 5.4.0 and 6.1.0 for big-endian VSX.
The change for the code in incrStoreU and decrStoreU addresses an
apparent regression in 6.1.0, where the compiler thinks the type
returned by vec_extract is a pointer-to-float, but my attempts a
reduced test case haven't reproduced the issue.
Added some test cases that might hit more endianness cases in future.
We have not been able to test this on little-endian Power8; there is
a risk the gcc-specific permutations could be endian-sensitive. We'll
test this when we have hardware access, or if somebody runs the tests
for us.
Fixes #1997.
Refs #1988.
Change-Id: Iede0eac22504b22973f1a40d2b0180f10a34b7ed