Table of Contents
Since 1997 many household processors have offered ways to speed up operations on multiple integers by performing them in parallel, if so instructed. This is useful for example when needing to adjust the contrast of an image, which performs the same operation on all pixels.
Some time later, instructions where added to perform such operations on single precision floating point numbers as well. The generic name for this technique is 'Single Instruction, Multiple Data', SIMD.
Additionally, most CPUs have the ability to 'pair' certrain instructions, allowing them to do several subsequent operations at once. This may appear to be magic, and it would be if this technique did not come with some caveats.
This document sets out how to benefit from these features from C and C++ using the GNU Compiler Collection, gcc. As an aside, some attention will be given to common techniques to speed up any calculation.
Throughout, the emphasis is somewhat on Intel compatible processors (Pentium, Athlon, Opteron), but everything applies to recent PowerPC processors too. Readers with more specific AltiVec knowledge are requested to submit improvements.
Sources can be found on ds9a.nl/gcc-simd.tar.gz.