amd64: Implement VFMADD213 for Iop_MAddF32 and Iop_MAddF64
Speed up F32 and F64 FMA on amd64. Add priv/host_amd64_maddf.c
implementing h_amd64_calc_MAddF32_fma4 and h_amd64_calc_MAddF64_fma4
to be used instead of the generic variants h_generic_calc_MAddF32
and h_generic_calc_MAddF64 when host has VEX_HWCAPS_AMD64_FMA4.
Add fma3 and fma4 detection m_machine.c (machine_get_hwcaps).
This patch also fixes the memcheck/tests/vcpu_fnfns and
none/tests/amd64/fma testcases when run on a x86-64-v3 system.
Patch contributed by Grazvydas Ignotas <notasas@gmail.com> and
Bruno Lathuilière <bruno.lathuiliere@edf.fr>
https://bugs.kde.org/show_bug.cgi?id=481127
https://bugs.kde.org/show_bug.cgi?id=463463
https://bugs.kde.org/show_bug.cgi?id=463458