Version: ATLAS 3.8.4 ARM
System: 1 GHz ARM Cortex A8 (Beagleboard XM)
Compiler: gcc 4.4.5 (gcc 4.5.1 used for NEON kernel with inline ASM, which is compiled with gcc 4.5 automatically)
Comments: Using NEON, single precision real and complex matrix multiplies reach a peak speed of 1.2-1.3 gigaflops. (In contrast, single precision GEMM on the non-pipelined VFP floating-point unit of the A8 reaches slightly over 100 megaflops.) Double precision on the VFP is 50-60 megaflops peak. Computing the LU decomposition (the computationally intensive part of solving a general system of linear equations) using GETRF, we see that the A8 has not reached its peak speed at N=2000, but performance exceeds 500 megaflops by N=400-600. (GETRF is asymptotically limited by GEMM speed, so by N=4000 we should see over 1 gigaflop.)
Copyright © 2010-11 Vesperix Corporation