System: 1 GHz ARM Cortex A8 (Beagleboard XM)
Compiler: gcc 4.4.5 (native)
Configure settings: --enable-single --enable-neon --enable-perf-events ARM_CPU_TYPE=cortex-a8
Configure settings for slow timer: --enable-single --enable-neon --with-slow-timer ARM_CPU_TYPE=cortex-a8
Configure settings for estimate: --enable-single --enable-neon ARM_CPU_TYPE=cortex-a8
Comments: These comparisons show the differences between using a cycle counter, using the standard FFTW 3.2.2 option --with-slow-timer, and using no cycle counter (estimating the best plan rather than measuring). Estimating isn't that bad, especially on simple FFTs where there are only a few different ways to do the computation; once there are a lot of choices, you'll want to use a cycle counter or a slow timer, which are almost identical in performance. Even with a cycle counter, there are many complex plans that take 10 seconds or more to generate, so don't fool yourself into thinking that you can generate plans on the fly except for batch mode processing.
Detailed NEON timing data for --enable-perf-events cases
Detailed NEON timing data for --with-slow-timer cases
Detailed NEON timing data for estimate cases
Copyright © 2010-11 Vesperix Corporation