Version 49 (modified by 12 years ago) (diff) | ,
---|
Repa (REgular PArallel arrays)
Repa provides high performance, regular, multi-dimensional, shape polymorphic parallel arrays. All numeric data is stored unboxed. Functions written with the Repa combinators are automatically parallel provided you supply +RTS -Nwhatever on the command line when running the program. Repa means "turnip" in Russian. If you don't like turnips then this library probably isn't for you.
Download
Repa is split up into a few packages to help control dependencies.
repa | The base library, defining the array type and combinators. |
repa-io | Reading and writing arrays in various formats, including BMP. |
repa-algorithms | Some reusable matrix algorithms. |
repa-devil | Bindings to the DevIL image library. |
repa-examples | Example applications. |
If you want the lot then just install the examples, and this will pull down the rest.
cabal install repa-examples
From the source repo
The darcs source repos are at http://code.ouroborus.net/repa/
Frequently Asked Questions (FAQ)
Q: GHC complains: ghc.exe: could not execute opt
what's up with that?
A: opt
is the LLVM optimiser. You need to install LLVM.
Q: Does Repa's implicit parallelism extend to distributed memory machines?
A: No. Repa supports shared memory / multi-core parallelism only. Repa's approach to array fusion could be used in a distributed setting, but managing the extra communication is a significant amount of extra work. We have no current plans to target distributed machines.
Q: How to I map a function across all slices in an array?
A: This is not possible in general because we can't guarantee that the worker function will return a slice with the same shape as the original. If the slices only contain a few elements (up to six) then your best bet is to use an array of tuples and use the regular map
function to apply the worker to all the tuples. This approach is limited to 6-tuples because that's the maximum the Data.Vector
library currently supports, and Repa
uses Data.Vector
to store manifest arrays. See #22 for discussion.
Q: Why is my program so slow?
A: Probably because you haven't applied computeP
, or haven't added enough INLINE pragmas. Read the optimisation section in the tutorial, then This Answer on Stack Overflow. If it's still a problem then continue on...
Q: Why does the runtime vary so much from run to run?
A: Maybe because you haven't enabled thread affinity and some threads are being swapped out by the OS. Enable thread affinity with +RTS -qa
. Disabling the parallel garbage collector in generation 0 may also help +RTS -qg
.
Q: Where do I ask further questions?
A: Ask on Stack Overflow, then send a link to repa@ouroborus.net
if that doesn't help.
Report a bug
To report bugs, request features, or get an account on the trac, please send email to repa@ouroborus.net
Papers and Tutorials
Guiding Parallel Array Fusion with Indexed Types
- Describes the current implementation in Repa 3.
- Start with this paper.
Efficient Parallel Stencil Convolution in Haskell
- Discusses the back-end, how the parallelism works, and how to write fast code.
- Describes Repa's special support for Stencil convolutions.
- This paper was based on Repa 2.
Regular Shape Polymorphic Arrays in Haskell
- Describes the overall approach to fusion, and how the shape polymorphism works.
- This paper was based on Repa 1. Some API details are different, but the main points are the same.
Tutorial on usage and optimisation
- Contains lots of simple examples to get you started.
- High level discussion of fusion, optimisation, and how to use the
force
function.
Examples
Here is the output of some of the examples included in the repa-examples package:
fft2d-highpass Laplace Crystal more info more info source video
Demo
There is also an OSX demo that does edge detection on a video stream:
source video
- The source should compile with XCode 3.2.1, GHC 7.0.3 and Repa 2.0.0, but you need to update and run the CONFIGURE.sh script to point it to your GHC install.
- There are also prebuilt OSX i386 versions for two four and six threads. These just have the corresponding +RTS -N# option baked in, you can set it in the main.m module. Some day I will make it so you can select this from the GUI.
- You can also run the edge detector over a single uncompressed .bmp file using the repa-canny program from the repa-examples package.
Attachments (4)
- lena-high2-thumb.jpg (5.6 KB) - added by 15 years ago.
- pls-400x400-out-thumb.jpg (6.0 KB) - added by 15 years ago.
- beholder-thumb.jpg (29.5 KB) - added by 14 years ago.
- crystal-thumb.png (28.8 KB) - added by 13 years ago.
Download all attachments as: .zip