Improving memory access performance for irregular algorithms in heterogeneous CPU/FPGA systems