The latest version of the mainline ATLAS distribution (ATLAS 3.10) includes support for ARM NEON. It is available from the ATLAS download page.
We will continue to make ATLAS-ARM available here for users too stubborn to change, but we strongly suggest transitioning to the mainline distribution since it will be supported by the original developers. ATLAS-ARM is no longer supported or maintained.
The process for building ATLAS-ARM is the same as ATLAS 3.8.4; if you do not have experience doing this, please take a look at the installation section of the ATLAS manual and run through the build process with the unmodified ATLAS 3.8.4 source first.
For additional configuration options and more detailed information, follow the steps below:
The source is available in tar.gz format or zip format. The release version is unchanged from beta 3.
(Source for beta 1 is still available in tar.gz format or zip format; source for beta 2 is also still available in tar.gz format or zip format.)
Unpack it with tar xvzf atlas-3.8.4-arm.tar.gz or unzip atlas-3.8.4-arm.zip.
All the remaining steps happen in the source directory, so cd ATLAS-3.8.4-arm.
../configure -Si archdef 0
The "-Si archdef 0" tells ATLAS not to look for a default configuration for your system, but to build from scratch. This is necessary because we don't yet have a good way to automatically identify all the configurations we might encounter.
All the other command line arguments supported by ATLAS 3.8.4 work as usual. You'll need these if you want to use a compiler other than gcc, for example. See the installation section of the ATLAS manual for details.
Note that versions of gcc up to 4.7.0 do not optimize ATLAS code very well for specific ARM processors, and in many cases, these "optimizations" reduce speed. You can run make time after make finishes, as described below, to get a quick overview of the speed of what you've just built.
If you're really concerned about maximizing speed, we strongly recommend starting with a normal build, and then comparing the results from an "optimized" build for your processor to see if there's an improvement. You can do this while retaining the results of the normal build by making another directory under ATLAS-3.8.4-arm, let's say test-with-optimizations, and then repeat the configure and build process in that directory using your desired optimization flags. Finer control over what optimizations are applied to what routines can be obtained using more specific versions of the -Fa flag settings, as described in the ATLAS manual.
If the configure process fails, please take a look at its output. The configure script is designed to check whether the system has everything it needs to build ATLAS, and the error messages can often tell you what part of the system is missing or misconfigured. If you cannot easily resolve the problem using this process, please let us know what happened, including the output of the script.
Now you need to pick a time when you won't be using your system for a day or so, since that's how long a build typically takes. When you're ready, run make build to build the ATLAS library.
If the make process fails, please let us know, and include any messages given by make.
This does some very simple tests, so that you can find any catastrophic failures. You're not done testing yet, though -- make check only does a few basic tests. Before you rely on the results, you'll need to run a much more comprensive set of tests using the ATLAS test scripts.
You can see a quick estimate of how fast your build is by running
This will give you a summary of the speed of several basic routines. For large problems, it's the speeds on the BIG_MM line that matter. A typical result on a 1 GHz Cortex A9 using gcc 4.6.3 is about 160% of clock for single and complex, and about 70% of clock for double and double complex.
The ATLAS test scripts were written by Antoine Petitet, and are often referred to as "Antoine's Tester". A quick outline of how to set the scripts up is given here; more information on this step is available in the ATLAS Developer Guide.
Download the NETLIB BLAS, untar them, and cd BLAS.
Edit make.inc, making sure that the right Fortran compiler is specified, and that the optimization settings are conservative (say, OPTS = -O). Note the name of the BLAS library that will be created (the default on Linux is blas_LINUX.a).
Build the reference BLAS library by running make.
This puts the tester files in a directory called AtlasTest.
Some Linux distributions (e.g. Ubuntu) no longer include a csh-compatible shell, which the tester needs to run. If you don't have a csh-compatible shell, you'll need to consult your distribution's documentation for how to install one (e.g. sudo apt-get install tcsh on Ubuntu).
Change the BLASLIB = line in Make.inc in the ATLAS build directory to point to the reference BLAS library you just built (for example, BLASlib = /home/user/BLAS/blas_LINUX.a). If you're following the examples in Step 2, the build directory is something like /home/user/ATLAS-3.8.4-arm/test.
Like ATLAS, the tester wants a subdirectory for each new run. Go to the directory you unpacked the tester into, mkdir test, cd test, and tell the tester where the ATLAS library you're testing is located by running:
../configure --atldir=[the name of the ATLAS build directory]
If you followed the examples in Step 2, this will be something like ../configure --atldir=/home/user/ATLAS-3.8.4-arm/test.
Run the tester by running make. You will be happiest if you do this at a time when you won't need the system for another day or so. (Unlike the ATLAS build, step, where doing other work might potentially affect the timings and produce suboptimal results, multitasking on the system during the tester runs will not damage anything except your patience. It's not something we recommend, however.)
If you do not encounter any errors, the on-screen output of the test script will be a whole lot of build output, followed by the actual test results (for an example, take a look at this file). All of the results are also saved in the res subdirectory, so don't worry if you miss the results as they scroll by.
If your machine has only one core, the tester will report that it cannot find the threaded libraries (libptcblas.a,libptf77blas.a, etc.). This should not be surprising, since they aren't built when ATLAS determines the machine has only one core. If you do actually have a multicore CPU and you see this error, it's a problem in detection of multiple cores; please let us know.
The simplest way to check the voluminous output for errors is to run the scope script provided with the tester. Assuming you're still in the directory where you ran make to start the tester, just run ../scope.sh.
If you have no errors, you will see the 37 lines in this file for a multicore machine, or the 22 lines in this file for a single core machine. If you have errors, the output of scope will usually be a lot longer, and will include bits of difficult to interpret detail from the test runs that failed. If you see anything other than the 37 lines of scope output from a successful run, that's a failure, and you should not use the ATLAS libraries you tested.
If any test failures occur, the output stored in the res directory may help you to identify the routines which failed.
Copyright © 2011-4 Vesperix Corporation