FastTransforms
|
FastTransforms provides computational kernels and driver routines for orthogonal polynomial transforms. The univariate algorithms have a runtime complexity of O(n log n), while the multivariate algorithms are 2-normwise backward stable with a runtime complexity of O(nd+1), where n is the polynomial degree and d is the spatial dimension of the problem.
The transforms are separated into computational kernels that offer SSE, AVX, and AVX-512 vectorization on applicable Intel processors, and driver routines that are easily parallelized by OpenMP.
If you feel you need help getting started, please do not hesitate to e-mail me. If you use this library for your research, please cite the references.
The library makes use of OpenBLAS, FFTW3, and MPFR, which are easily installed via package managers such as Homebrew, apt-get, or vcpkg. When FastTransforms
is compiled with OpenMP, the environment variable that controls multithreading is OMP_NUM_THREADS
.
Sample installation:
On macOS, the OpenBLAS dependency is optional in light of the vecLib framework. It is also important to have export SDKROOT="/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk"
defined in your environment. In case the library is compiled with vecLib, then the environment variable VECLIB_MAXIMUM_THREADS
partially controls the multithreading.
To access functions from the library, you must ensure that you append the current library path to the default. Sample installation:
See the GitHub build for further details.
We use GCC 7.2.0 distributed through MinGW-w64 on Visual Studio 2017. Sample installation:
See the AppVeyor build for further details.
The tables below shows the current timings and accuracies for the transforms with pseudorandom coefficients drawn from U(-1,1) and converted back. The timings are reported in seconds and the error is measured in the relative 2- and ∞-norms.
The library is compiled by Apple LLVM version 10.0.1 as above with the compiler optimization flags -O3 -march=native
, and executed on an iMac Pro (Early 2018) with a 2.3 GHz Intel Xeon W-2191B processor with 18 threads on 18 logical cores.
sph2fourier
)Degree | 63 | 127 | 255 | 511 | 1023 | 2047 | 4095 | 8191 |
---|---|---|---|---|---|---|---|---|
Forward | 0.000101 | 0.000446 | 0.001952 | 0.006277 | 0.026590 | 0.141962 | 0.775724 | 4.870621 |
Backward | 0.000101 | 0.000448 | 0.001918 | 0.006357 | 0.027123 | 0.145149 | 0.815881 | 5.040577 |
ϵ2 | 5.42e-16 | 7.79e-16 | 9.23e-16 | 1.27e-15 | 1.80e-15 | 2.52e-15 | 3.54e-15 | 4.98e-15 |
ϵ∞ | 1.33e-15 | 2.55e-15 | 4.55e-15 | 5.22e-15 | 9.33e-15 | 1.11e-14 | 1.81e-14 | 3.80e-14 |
tri2cheb
) with (α, β, γ) = (0, 0, 0)Degree | 63 | 127 | 255 | 511 | 1023 | 2047 | 4095 | 8191 |
---|---|---|---|---|---|---|---|---|
Forward | 0.000063 | 0.000300 | 0.001048 | 0.003886 | 0.019123 | 0.113299 | 0.699465 | 4.555086 |
Backward | 0.000065 | 0.000289 | 0.000987 | 0.003837 | 0.019529 | 0.114718 | 0.699051 | 4.697958 |
ϵ2 | 2.30e-15 | 3.94e-15 | 8.14e-15 | 1.68e-14 | 3.55e-14 | 7.30e-14 | 1.44e-13 | 2.97e-13 |
ϵ∞ | 1.09e-14 | 1.84e-14 | 5.49e-14 | 1.49e-13 | 9.06e-13 | 1.95e-12 | 3.73e-12 | 1.05e-11 |
The error growth rate is significantly reduced if (α, β, γ) = (-0.5, -0.5, -0.5).
disk2cxf
) with (α, β) = (0, 0)Degree | 63 | 127 | 255 | 511 | 1023 | 2047 | 4095 | 8191 |
---|---|---|---|---|---|---|---|---|
Forward | 0.000209 | 0.001077 | 0.003618 | 0.013243 | 0.064321 | 0.369272 | 2.182001 | 16.15872 |
Backward | 0.000208 | 0.001027 | 0.003487 | 0.012903 | 0.064544 | 0.372493 | 2.191747 | 16.43340 |
ϵ2 | 7.60e-16 | 9.35e-16 | 1.31e-15 | 1.84e-15 | 2.56e-15 | 3.59e-15 | 5.05e-15 | 7.15e-15 |
ϵ∞ | 1.78e-15 | 2.66e-15 | 5.33e-15 | 7.11e-15 | 1.27e-14 | 1.79e-14 | 2.91e-14 | 1.66e-13 |
sph_rotation
) with ZYZ Euler angles (α, β, γ) = (0.1, 0.2, 0.3)Degree | 63 | 127 | 255 | 511 | 1023 | 2047 | 4095 | 8191 |
---|---|---|---|---|---|---|---|---|
Time | 0.000141 | 0.000453 | 0.001598 | 0.006742 | 0.031256 | 0.170632 | 1.082894 | 10.35029 |
ϵ2 | 7.75e-15 | 8.29e-15 | 1.08e-14 | 1.77e-14 | 2.93e-14 | 5.12e-14 | 1.09e-13 | 1.87e-13 |
ϵ∞ | 1.47e-14 | 2.55e-14 | 8.01e-14 | 2.59e-13 | 7.36e-13 | 2.45e-12 | 6.74e-12 | 1.55e-11 |
[1] J. Molina and R. M. Slevinsky. A rapid and well-conditioned algorithm for the Helmholtz–Hodge decomposition of vector fields on the sphere, arXiv:1809.04555, 2018.
[2] S. Olver, R. M. Slevinsky, and A. Townsend. Fast algorithms using orthogonal polynomials, Acta Numerica, 29:573—699, 2020.
[3] R. M. Slevinsky. Fast and backward stable transforms between spherical harmonic expansions and bivariate Fourier series, Appl. Comput. Harmon. Anal., 47:585—606, 2019.
[4] R. M. Slevinsky, Conquering the pre-computation in two-dimensional harmonic polynomial transforms, arXiv:1711.07866, 2017.