= fft2d-highpass =

Applies a highpass filter to a BMP image, using a recursive, radix-2 Decimation In Time (DIT) [http://en.wikipedia.org/wiki/Fast_fourier_transform Fast Fourier Transform]. The first argument is the lower cutoff frequency.

Each of the RGB channels is converted to the frequency domain, the lower frequencies are set to zero, then the channels are converted back to the image domain.

{{{
 repa-fft2d-highpass 2 lena.bmp lena-high2.bmp
}}}

== Code ==

The main algorithm is at [http://code.haskell.org/repa/repa-head/repa-algorithms/Data/Array/Repa/Algorithms/FFT.hs FFT.hs]

The wrapper is at [http://code.haskell.org/repa/repa-head/repa-examples/FFT/HighPass/src/Main.hs Main.hs]

== Test Data ==
[http://en.wikipedia.org/wiki/Lenna http://en.wikipedia.org/wiki/Lenna] is a standard test image.

|| lena.bmp || lena-high2.bmp ||
|| [[Image(Examples/Fft2dHighpass:lena-thumb.jpg)]] || [[Image(WikiStart:lena-high2-thumb.jpg)]] || 
|| [http://code.haskell.org/repa/wiki/images/lena.bmp full size] || [http://code.haskell.org/repa/wiki/images/lena-high2.bmp full size] ||

== Runtime ==
Head version. Compiled with GHC 6.13.20100309. 512x512 image.

Running on a Intel i7 iMac. 2.8Ghz, 4 cores x 2 threads/core. 256k L1, 8MB L2, 8GB main memory.

Times stated include IO.

|| Threads || Time(s) || 
|| 1  || 6.93 ||
|| 2 || 4.07 || 
|| 3 || 3.42 || 
|| 4 || 3.13 || 
|| 5 || 3.15 ||
|| 6 || 3.08 ||
|| 7 || 3.08 || 
|| 8 || 3.39 ||

=== Comparisons ===

|| Version || Time(s) || Source ||
|| Using Data.Vector.Unboxed || 2.88 || [http://code.haskell.org/repa/repa-head/repa-examples/FFT/HighPass/legacy/vector/FFT.hs FFT.hs] ||
|| Using Jones's inplace C implementation || 0.24 || [http://code.haskell.org/repa/repa-head/repa-examples/FFT/HighPass/legacy/c/Jones.c Jones.c] ||
|| Using FFTW using Estimate mode || 0.09 || [http://code.haskell.org/repa/repa-head/repa-examples/FFT/HighPass/legacy/c/FFTW.c FFTW.c] ||

The vector version uses the same recursive radix-2 decimation in time (DIT) algorithm as the Repa version, but is not rank generalised. It applies a recursive 1d FFT to each row and then transposes the matrix, twice each. Recursive FFT algorithms tend to be slower than in-place ones because intermediate data is written into new vectors at each recursion. A 512 point FFT is built from two 256 point FFTs, which are built from 4 128 point FFTs and so on. The result of each FFT is a new vector which needs to be allocated, filled, then unboxed again during the next recursion. The base case is a 2 point vector, so there are 256 of them to allocate then unbox at the lowest step.

Jones's version also uses a 1d radix-2 DIT FFT kernel, but it first reorders the values then performs an in-place transform using three nested loops. Using an in-place algorithm gives better locality, and avoids the need to allocate and unbox all the intermediate vectors.

FFTW contains deep magic, and is comparable with vendor optimised versions. "Estimate" mode is not the most magical. The "Measure" mode of FFTW performs runtime profiling to determine the best way to do an FFT on your hardware. However, this process can require several seconds of startup time, which is why I haven't enabled it.