Expansion of dataset sizes and increasing complexity of processing algorithms have led to consideration of parallel and distributed implementations. The rationale for distributing the computational load may be to thin-provision computational resources, to accelerate data processing rate, or to efficiently reuse already available but otherwise idle computational resources.\ Whatever the rationale, an efficient solution of this type brings with it questions of data distribution, job partitioning, reliability, and robustness.

\

This paper addresses the first two of these questions in the context of a local cluster-computing environment.\ Using the CHRT depth estimator, it considers active and passive data distribution and their effect on data throughput, focusing mainly on the compromises required to maintain minimal communications requirements between nodes.\ As metric, the algorithm considers the overall computation time for a given dataset (i.e., the time lag that a user would experience), and shows that although there are significant speedups to be had by relatively simple modifications to the algorithm, there are limitations to the parallelism that can be achieved efficiently, and a balance between inter-node parallelism (i.e., multiple nodes running in parallel) and intra-node parallelism (i.e., multiple threads within one node) for most efficient utilization of available resources.

\

},
keywords = {Bathmetric Estimation, CHRT, Data Scheduling, Distributed Processing, Parallel Processing, Spatially-aware Data Distribution},
author = {Brian R Calder}
}