Customizable Memory Schemes for Data Parallel Accelerators by