= Repa (REgular PArallel arrays) = Repa provides high performance, regular, multi-dimensional, shape polymorphic parallel arrays. All numeric data is stored unboxed. Functions written with the Repa combinators are automatically parallel provided you supply +RTS -Nwhatever on the command line when running the program. Repa means "turnip" in Russian. If you don't like turnips then this library probably isn't for you. == Download == Repa is split up into a few packages to help control dependencies. || [http://hackage.haskell.org/package/repa repa] || The base library, defining the array type and combinators. || || [http://hackage.haskell.org/package/repa-io repa-io] || Reading and writing arrays in various formats, including BMP. || || [http://hackage.haskell.org/package/repa-algorithms repa-algorithms] || Some reusable matrix algorithms. || || [http://hackage.haskell.org/package/repa-devil repa-devil] || Bindings to the DevIL image library. || || [http://hackage.haskell.org/package/repa-examples repa-examples] || Example applications. || If you want the lot then just install the examples, and this will pull down the rest. {{{ cabal install repa-examples }}} == Data Flow Fusion == The following packages implement the fusion method described in [http://www.cse.unsw.edu.au/~benl/papers/flow/flow-Haskell2013.pdf Data Flow Fusion with Series Expressions in Haskell] || [http://hackage.haskell.org/package/repa-plugin repa-plugin] || GHC Plugin. || || [http://hackage.haskell.org/package/repa-series repa-series] || User facing API for series expressions. || || [http://hackage.haskell.org/package/ddc-core-flow ddc-core-flow] || Flow Fusion transform implemented on [http://disciple.ouroborus.net/ DDC] Core || * The repa-plugin converts GHC Core to a fragment of [http://disciple.ouroborus.net/ DDC] core named Disciple Core Flow. The plugin performs code transforms on the DDC side, then converts the resulting imperative code back to GHC Core. * The user facing API for series expressions is embryonic and does not interface with the rest of Repa. As of 7/2013 we're actively working on this. * Example Haskell source code is available [https://github.com/benl23x5/repa/tree/master/repa-plugin/test here] * Example Disciple Core Flow code at various stages of the transform is available [https://github.com/DDCSF/ddc/tree/master/test/ddc-main/60-CoreFlow/ here] == The source repo == The source repos are on github at [https://github.com/DDCSF/repa] == Frequently Asked Questions (FAQ) == Q: GHC complains: `ghc.exe: could not execute opt` what's up with that? [[br]] A: `opt` is the LLVM optimiser. You need to install [http://llvm.org/ LLVM]. Q: Does Repa's implicit parallelism extend to distributed memory machines? [[br]] A: No. Repa supports shared memory / multi-core parallelism only. Repa's approach to array fusion could be used in a distributed setting, but managing the extra communication is a significant amount of extra work. We have no current plans to target distributed machines. Q: How to I map a function across all slices in an array? [[br]] A: This is not possible in general because we can't guarantee that the worker function will return a slice with the same shape as the original. If the slices only contain a few elements (up to six) then your best bet is to use an array of tuples and use the regular `map` function to apply the worker to all the tuples. This approach is limited to 6-tuples because that's the maximum the `Data.Vector` library currently supports, and `Repa` uses `Data.Vector` to store manifest arrays. See #22 for discussion. Q: Why is my program so slow? [[br]] A: Probably because you haven't applied {{{computeP}}}, or haven't added enough INLINE pragmas. Read [http://www.haskell.org/haskellwiki/Numeric_Haskell:_A_Repa_Tutorial#Optimising_Repa_programs the optimisation section] in the tutorial, then [http://stackoverflow.com/questions/6300428/poor-performance-with-transpose-and-cumulative-sum-in-repa/6340867#6340867 This Answer] on Stack Overflow. If it's still a problem then continue on... Q: Why does the runtime vary so much from run to run? [[br]] A: Maybe because you haven't enabled thread affinity and some threads are being swapped out by the OS. Enable thread affinity with {{{+RTS -qa}}}. Disabling the parallel garbage collector in generation 0 may also help {{{+RTS -qg}}}. Q: Where do I ask further questions? [[br]] A: Ask on [http://stackoverflow.com/search?q=repa Stack Overflow], then post in the [https://groups.google.com/forum/#!forum/haskell-repa haskell-repa] Google group if that doesn't help. == Report a bug == To report bugs, request features, or get an account on the trac, please send email to the [https://groups.google.com/forum/#!forum/haskell-repa haskell-repa] Google group. == Papers and Tutorials == [http://www.cse.unsw.edu.au/~benl/papers/guiding/guiding-Haskell2012-sub.pdf Guiding Parallel Array Fusion with Indexed Types] * Describes the current implementation in Repa 3. * Start with this paper. [http://www.cse.unsw.edu.au/~benl/papers/stencil/stencil-haskell2011-sub.pdf Efficient Parallel Stencil Convolution in Haskell] * Discusses the back-end, how the parallelism works, and how to write fast code. * Describes Repa's special support for Stencil convolutions. * This paper was based on Repa 2. [http://www.cse.unsw.edu.au/~benl/papers/repa/repa-icfp2010.pdf Regular Shape Polymorphic Arrays in Haskell] * Describes the overall approach to fusion, and how the shape polymorphism works. * This paper was based on Repa 1. Some API details are different, but the main points are the same. [http://www.haskell.org/haskellwiki/Numeric_Haskell:_A_Repa_Tutorial Tutorial on usage and optimisation] * Contains lots of simple examples to get you started. * High level discussion of fusion, optimisation, and how to use the {{{force}}} function. == Examples == Here is the output of some of the examples included in the [http://hackage.haskell.org/package/repa-examples repa-examples] package: || fft2d-highpass || Laplace || Crystal || || [[Image(WikiStart:lena-high2-thumb.jpg)]] || [[Image(WikiStart:pls-400x400-out-thumb.jpg)]] || [[Image(WikiStart:crystal-thumb.png)]] || || [wiki:Examples/Fft2dHighpass more info] || [wiki:Examples/Laplace more info] || [https://github.com/benl23x5/gloss/blob/master/gloss-examples/raster/Crystal/Main.hs source] [http://www.youtube.com/watch?v=v_0Yyl19fiI video] || == Demo == There is also an OSX demo that does edge detection on a video stream: || [[Image(WikiStart:beholder-thumb.jpg)]] || || [http://code.ouroborus.net/beholder/beholder-head/ source] [http://www.youtube.com/watch?v=UGN0GxGEDsY video] || * The [http://code.ouroborus.net/beholder/beholder-head/ source] should compile with XCode 3.2.1, GHC 7.0.3 and Repa 2.0.0, but you need to update and run the CONFIGURE.sh script to point it to your GHC install. * There are also prebuilt OSX i386 versions for [http://code.ouroborus.net/beholder/distro/beholder-N2.tgz two] [http://code.ouroborus.net/beholder/distro/beholder-N4.tgz four] and [http://code.ouroborus.net/beholder/distro/beholder-N6.tgz six] threads. These just have the corresponding +RTS -N# option baked in, you can set it in the main.m module. Some day I will make it so you can select this from the GUI. * You can also run the edge detector over a single uncompressed .bmp file using the repa-canny program from the [http://hackage.haskell.org/package/repa-examples repa-examples] package.