Repa (REgular PArallel arrays)
Repa provides high performance, regular, multi-dimensional, shape polymorphic parallel arrays. All numeric data is stored unboxed. Functions written with the Repa combinators are automatically parallel provided you supply +RTS -Nwhatever on the command line when running the program. Repa means "turnip" in Russian. If you don't like turnips then this library probably isn't for you.
Repa is split up into a few packages to help control dependencies.
|repa||The base library, defining the array type and combinators.|
|repa-io||Reading and writing arrays in various formats, including BMP.|
|repa-algorithms||Some reusable matrix algorithms.|
|repa-devil||Bindings to the DevIL image library.|
If you want the lot then just install the examples, and this will pull down the rest.
cabal install repa-examples
Data Flow Fusion
The following packages implement the fusion method described in Data Flow Fusion with Series Expressions in Haskell
|repa-series||User facing API for series expressions.|
|ddc-core-flow||Flow Fusion transform implemented on DDC Core|
- The repa-plugin converts GHC Core to a fragment of DDC core named Disciple Core Flow. The plugin performs code transforms on the DDC side, then converts the resulting imperative code back to GHC Core.
- The user facing API for series expressions is embryonic and does not interface with the rest of Repa. As of 7/2013 we're actively working on this.
- Example Haskell source code is available here
- Example Disciple Core Flow code at various stages of the transform is available here
The source repo
The source repos are on github at https://github.com/DDCSF/repa
Frequently Asked Questions (FAQ)
Q: GHC complains: ghc.exe: could not execute opt what's up with that?
A: opt is the LLVM optimiser. You need to install LLVM.
Q: Does Repa's implicit parallelism extend to distributed memory machines?
A: No. Repa supports shared memory / multi-core parallelism only. Repa's approach to array fusion could be used in a distributed setting, but managing the extra communication is a significant amount of extra work. We have no current plans to target distributed machines.
Q: How to I map a function across all slices in an array?
A: This is not possible in general because we can't guarantee that the worker function will return a slice with the same shape as the original. If the slices only contain a few elements (up to six) then your best bet is to use an array of tuples and use the regular map function to apply the worker to all the tuples. This approach is limited to 6-tuples because that's the maximum the Data.Vector library currently supports, and Repa uses Data.Vector to store manifest arrays. See #22 for discussion.
Q: Why is my program so slow?
A: Probably because you haven't applied computeP, or haven't added enough INLINE pragmas. Read the optimisation section in the tutorial, then This Answer on Stack Overflow. If it's still a problem then continue on...
Q: Why does the runtime vary so much from run to run?
A: Maybe because you haven't enabled thread affinity and some threads are being swapped out by the OS. Enable thread affinity with +RTS -qa. Disabling the parallel garbage collector in generation 0 may also help +RTS -qg.
Report a bug
To report bugs, request features, or get an account on the trac, please send email to the haskell-repa Google group.
Papers and Tutorials
- Describes the current implementation in Repa 3.
- Start with this paper.
- Discusses the back-end, how the parallelism works, and how to write fast code.
- Describes Repa's special support for Stencil convolutions.
- This paper was based on Repa 2.
- Describes the overall approach to fusion, and how the shape polymorphism works.
- This paper was based on Repa 1. Some API details are different, but the main points are the same.
- Contains lots of simple examples to get you started.
- High level discussion of fusion, optimisation, and how to use the force function.
Here is the output of some of the examples included in the repa-examples package:
There is also an OSX demo that does edge detection on a video stream:
- The source should compile with XCode 3.2.1, GHC 7.0.3 and Repa 2.0.0, but you need to update and run the CONFIGURE.sh script to point it to your GHC install.
- There are also prebuilt OSX i386 versions for two four and six threads. These just have the corresponding +RTS -N# option baked in, you can set it in the main.m module. Some day I will make it so you can select this from the GUI.
- You can also run the edge detector over a single uncompressed .bmp file using the repa-canny program from the repa-examples package.