libs/basekit/source/simd_cph/simd_cph.readme

   1 /*****************************************************
   2
   3    Cross-platform SIMD intrinsics header file
   4
   5    VERSION: 2004.10.26 (alpha)
   6
   7    Created by Patrick Roberts
   8
   9    This is an on-going project.  Please add functions and
  10    typedefs as needed, but try to follow the guideline
  11    below:
  12
  13    The goal of this file is to stay cross-platform.
  14    Only intrinsics or #defines that mimic another system's
  15    SIMD instruction should be included, with the only exception
  16    being instructions that, if non existant, are not
  17    needed (see bottom)
  18
  19    Currently, the goal is to base support around 128-bit SIMD.
  20    (Only the Gekko and x86-MMX are 64-bit)
  21
  22    Changelog:
  23
  24    2004.05.09  [Patrick Roberts]
  25         *) Created file with some i386, GCC dialect
  26    2004.10.22  [Patrick Roberts]
  27         *) Created emulated SIMD
  28    2004.10.25  [Patrick Roberts]
  29         *) Created arm-iwmmx GCC dialect
  30         *) Fixed sqrt bug in emu dialect
  31         *) Organized directories
  32         *) Makefile for test app
  33
  34
  35    To Do:
  36
  37    *( Docs
  38    *( Add new intrinsics to test app
  39    *( MinGW x86 dialect (same as GCC on Linux?)
  40    *( Does 3DNOW buy us anything?
  41    *( Intel ICC x86 dialect
  42    *( MSVC .NET x86 dialect
  43    *( Support for ARM ARM6, VFP and NEON SIMD?  What compilers use these?
  44    *( PowerPC AltiVec/Velocity/VMX components
  45    *( MIPS-MMI / PS2-VU components
  46    *( See if SSE2 buys us anything beyond what the compiler does already
  47    *( Compaq Alpha components
  48
  49 */
  50
  51 /***************************************************
  52
  53   Platform Notes:
  54
  55
  56      General
  57      -------
  58
  59         NOTE: Code must be 16-byte aligned. Align to 16 when allocating memory.
  60
  61         X86/XSCALE (Intel) vs. PowerPC/MIPS
  62
  63         While the PowerPC and MIPS SIMD instructions take 2 source vectors
  64         and a destination vector, the Intel platforms only take a source and
  65         destination.  Example:
  66
  67            PPC/MIPS can do:
  68
  69               C = A + B
  70
  71            X86 can only do:
  72
  73              A = A + B   (or A+=B)
  74
  75          Code written either way will work on the X86, and still be faster than
  76          387 math, but preserving the registers takes significant overhead.
  77          (Disassemble the test program for an example.  The prints preserve, the
  78          'disassembly test' does not.)   For the fastest code between systems, write
  79          your SIMD math as the X86 expects, manually preserving SIMD variables.
  80          At least GCC for PPC doesn't seem to have any issues figuring out how to
  81          deal with a source and destination memory address being the same.
  82
  83
  84      GCC x86
  85      -------
  86
  87         You must compile with -msse and -mmmx.  I try to avoid mmx as mmx is slower on
  88         the P4 than on the P3 and XP, but sse doesn't have integer math.
  89
  90          You may want to set -msse2 if you have a P4 CPU (-msse2 is set by default
  91          for x86-64 CPUS), as some of the simd functions not supported on x86
  92          can be sped up by gcc using sse2 commands rather than standard pipeline
  93          commands.
  94
  95
  96      GCC PowerPC
  97      -----------
  98
  99         You must compile with the switch -maltivec
 100
 101
 102      GCC ARM (Xscale only)
 103      ----------------
 104
 105         GCC ARM only seems to support Intel Wirekess MMX (XSCALE), not ARMv6,
 106         Neon, or VFP? (Are these all the same beast?)
 107
 108         You must compile with +iwmmxt
 109
 110
 111 */