Distributed Pattern Recognition in RapidMiner
RapidMiner already provides easy to use interfaces for developing and evaluating Pattern Recognition and Machine Learning applications. However, it has only limited support for parallelization and it lacks functionality to spread long-running computations over multiple machines. A solution to this is distributed computing with paradigms like MapReduce. In this paper, we present a system called DisPaRe, which integrates distributed computing frameworks into RapidMiner. A special focus is put on utilizing MapReduce as a programming model. The frameworks GridGain and Oracle Coherence are reviewed and evaluated with respect to their suitability to fit into the context of RapidMiner. The system provides effective means for transparently utilizing these frameworks and enabling RapidMiner processes to parallelize their computations within a distributed environment.