Parallel Algorithms for Regular Architectures by Russ Miller