System: 1 GHz ARM Cortex A9 (Pandaboard)
Compiler: gcc 4.4.5 (native)
Configure settings: --enable-single --enable-neon --enable-armv7a-cycle-counter ARM_CPU_TYPE=cortex-a9
Comments: Scalar performance is 4-7 times better on the Cortex A9 than the Cortex A8, thanks to a much better VFP unit, while NEON performance is only slightly improved. This makes the typical speed gain from NEON 1.2-1.5x on the A9, with peak rates around 1.0 gigaflops on a single core. As with the A8, it is faster not to use NEON's fused multiply-add instructions.
Detailed NEON timing data for these cases
Copyright © 2010-11 Vesperix Corporation