FastTransforms
FastTransforms

Build Status Build status codecov

FastTransforms provides computational kernels and driver routines for orthogonal polynomial transforms. The univariate algorithms have a runtime complexity of O(n log n), while the multivariate algorithms are 2-normwise backward stable with a runtime complexity of O(nd+1), where n is the polynomial degree and d is the spatial dimension of the problem.

The transforms are separated into computational kernels that offer SSE, AVX, and AVX-512 vectorization on applicable Intel processors, and driver routines that are easily parallelized by OpenMP.

If you feel you need help getting started, please do not hesitate to e-mail me. If you use this library for your research, please cite the references.

Installation Notes

The library makes use of OpenBLAS, FFTW3, and MPFR, which are easily installed via package managers such as Homebrew, apt-get, or vcpkg. When FastTransforms is compiled with OpenMP, the environment variable that controls multithreading is OMP_NUM_THREADS.

macOS

Sample installation:

brew install libomp fftw mpfr
make CC=clang FT_USE_APPLEBLAS=1

On macOS, the OpenBLAS dependency is optional in light of the vecLib framework. It is also important to have export SDKROOT="/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk" defined in your environment. In case the library is compiled with vecLib, then the environment variable VECLIB_MAXIMUM_THREADS partially controls the multithreading.

Linux

To access functions from the library, you must ensure that you append the current library path to the default. Sample installation:

apt-get install libomp-dev libblas-dev libopenblas-base libfftw3-dev libmpfr-dev
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.
make CC=gcc

See the GitHub build for further details.

Windows

We use GCC 7.2.0 distributed through MinGW-w64 on Visual Studio 2017. Sample installation:

vcpkg install openblas:x64-windows fftw3[core,threads]:x64-windows mpir:x64-windows mpfr:x64-windows
mingw32-make CC=gcc FT_BLAS=openblas FT_FFTW_WITH_COMBINED_THREADS=1

See the AppVeyor build for further details.

Performance Benchmark

The tables below shows the current timings and accuracies for the transforms with pseudorandom coefficients drawn from U(-1,1) and converted back. The timings are reported in seconds and the error is measured in the relative 2- and ∞-norms.

The library is compiled by Apple LLVM version 10.0.1 as above with the compiler optimization flags -O3 -march=native, and executed on an iMac Pro (Early 2018) with a 2.3 GHz Intel Xeon W-2191B processor with 18 threads on 18 logical cores.

Spherical harmonic series to bivariate Fourier series (sph2fourier)

Degree 63 127 255 511 1023 2047 4095 8191
Forward 0.000101 0.000446 0.001952 0.006277 0.026590 0.141962 0.775724 4.870621
Backward 0.000101 0.000448 0.001918 0.006357 0.027123 0.145149 0.815881 5.040577
ϵ2 5.42e-16 7.79e-16 9.23e-16 1.27e-15 1.80e-15 2.52e-15 3.54e-15 4.98e-15
ϵ 1.33e-15 2.55e-15 4.55e-15 5.22e-15 9.33e-15 1.11e-14 1.81e-14 3.80e-14

Proriol series to bivariate Chebyshev series (tri2cheb) with (α, β, γ) = (0, 0, 0)

Degree 63 127 255 511 1023 2047 4095 8191
Forward 0.000063 0.000300 0.001048 0.003886 0.019123 0.113299 0.699465 4.555086
Backward 0.000065 0.000289 0.000987 0.003837 0.019529 0.114718 0.699051 4.697958
ϵ2 2.30e-15 3.94e-15 8.14e-15 1.68e-14 3.55e-14 7.30e-14 1.44e-13 2.97e-13
ϵ 1.09e-14 1.84e-14 5.49e-14 1.49e-13 9.06e-13 1.95e-12 3.73e-12 1.05e-11

The error growth rate is significantly reduced if (α, β, γ) = (-0.5, -0.5, -0.5).

Generalized Zernike series to Chebyshev×Fourier series (disk2cxf) with (α, β) = (0, 0)

Degree 63 127 255 511 1023 2047 4095 8191
Forward 0.000209 0.001077 0.003618 0.013243 0.064321 0.369272 2.182001 16.15872
Backward 0.000208 0.001027 0.003487 0.012903 0.064544 0.372493 2.191747 16.43340
ϵ2 7.60e-16 9.35e-16 1.31e-15 1.84e-15 2.56e-15 3.59e-15 5.05e-15 7.15e-15
ϵ 1.78e-15 2.66e-15 5.33e-15 7.11e-15 1.27e-14 1.79e-14 2.91e-14 1.66e-13

Rotation of a spherical harmonic series (sph_rotation) with ZYZ Euler angles (α, β, γ) = (0.1, 0.2, 0.3)

Degree 63 127 255 511 1023 2047 4095 8191
Time 0.000141 0.000453 0.001598 0.006742 0.031256 0.170632 1.082894 10.35029
ϵ2 7.75e-15 8.29e-15 1.08e-14 1.77e-14 2.93e-14 5.12e-14 1.09e-13 1.87e-13
ϵ 1.47e-14 2.55e-14 8.01e-14 2.59e-13 7.36e-13 2.45e-12 6.74e-12 1.55e-11

References

[1] J. Molina and R. M. Slevinsky. A rapid and well-conditioned algorithm for the Helmholtz–Hodge decomposition of vector fields on the sphere, arXiv:1809.04555, 2018.

[2] S. Olver, R. M. Slevinsky, and A. Townsend. Fast algorithms using orthogonal polynomials, Acta Numerica, 29:573—699, 2020.

[3] R. M. Slevinsky. Fast and backward stable transforms between spherical harmonic expansions and bivariate Fourier series, Appl. Comput. Harmon. Anal., 47:585—606, 2019.

[4] R. M. Slevinsky, Conquering the pre-computation in two-dimensional harmonic polynomial transforms, arXiv:1711.07866, 2017.