|Version 39 (modified by benl, 5 years ago) (diff)|
Repa (REgular PArallel arrays)
Repa provides high performance, regular, multi-dimensional, shape polymorphic parallel arrays. All numeric data is stored unboxed. Functions written with the Repa combinators are automatically parallel provided you supply +RTS -Nwhatever on the command line when running the program. Repa means "turnip" in Russian. If you don't like turnips then this library probably isn't for you.
Repa is split up into a few packages to help control dependencies.
|repa||The base library, defining the array type and combinators.|
|repa-bytestring||Conversions to and from ByteString.|
|repa-io||Reading and writing arrays in various formats, including BMP.|
|repa-algorithms||Some reusable matrix algorithms.|
|repa-devil||Bindings to the DevIL image library.|
If you want the lot then just install the examples, and this will pull down the rest.
cabal install repa-examples
From the source repo
The darcs source repos are at http://code.ouroborus.net/repa/
Frequently Asked Questions (FAQ)
Q: GHC complains: ghc.exe: could not execute opt what's up with that?
A: opt is the LLVM optimiser. You need to install LLVM.
Q: Does Repa's implicit parallelism extend to distributed memory machines?
A: No. Repa supports shared memory / multi-core parallelism only. Repa's approach to array fusion could be used in a distributed setting, but managing the extra communication is a significant amount of extra work. We have no current plans to target distributed machines.
Q: How to I map a function across all slices in an array?
A: This is not possible in general because we can't guarantee that the worker function will return a slice with the same shape as the original. If the slices only contain a few elements (up to six) then your best bet is to use an array of tuples and use the regular map function to apply the worker to all the tuples. This approach is limited to 6-tuples because that's the maximum the Data.Vector library currently supports, and Repa uses Data.Vector to store manifest arrays. See #22 for discussion.
Q: Why is my program so slow?
A: Probably because you haven't forced your source arrays before traversing them, or haven't added enough INLINE pragmas. In particular you may need to force image arrays read with the BMP routines before traversing them. Read the optimisation section in the tutorial, then This Answer on Stack Overflow. If it's still a problem then continue on...
Q: Where do I ask further questions?
A: Ask on Stack Overflow, then send a link to firstname.lastname@example.org if that doesn't help.
Report a bug
To report bugs, request features, or get an account on the trac, please send email to email@example.com
Papers and Tutorials
- Describes the overall approach to fusion, and how the shape polymorphism works.
- Since this paper was published, the internals have changed slightly, but the overall structure is the same.
- Describes the current array representation.
- Discusses the back-end, how the parallelism works, and how to write fast code.
- Describes Repa's special support for Stencil convolutions.
- Contains lots of simple examples to get you started.
- High level discussion of fusion, optimisation, and how to use the force function.
Here is the output of some of the examples included in the repa-examples package:
fft2d-highpass Laplace Crystal more info more info source video(12MB)
There is also an OSX demo that does edge detection on a video stream:
video: low(10MB) high(40MB)
- The source should compile with XCode 3.2.1, GHC 7.0.3 and Repa 2.0.0, but you need to update and run the CONFIGURE.sh script to point it to your GHC install.
- There are also prebuilt OSX i386 versions for two four and six threads. These just have the corresponding +RTS -N# option baked in, you can set it in the main.m module. Some day I will make it so you can select this from the GUI.
- You can also run the edge detector over a single uncompressed .bmp file using the repa-canny program from the repa-examples package.