parallel processing - OpenCL inconsistent results from kernel -


when try wrong result @ 'output' though copying values of 'cum' array output.

but if rename 'cum' array mentioned earlier in code. correct value of array. therefore unable reuse result values.

the device has 8 cores no shared memory.

any , comments/suggestions appreciated.

kernel void histogram(global unsigned int *input,                              global unsigned int *output,                              global unsigned int *frequency,                              global unsigned int *cum,                              unsigned int n)  {     int pid = get_global_id(0);      //cumulative sum     for(int i=0; < 16; i++)     {                        cum[(i*16)+(2*pid)+1] = frequency[(i*16)+(2*pid)] + frequency[(i*16)+(2*pid)+1];     }     barrier(clk_global_mem_fence);      for(int i=0; < 32; i++)     {                        output[(i*8)+pid] = cum[(i*8)+pid];      }      barrier(clk_global_mem_fence); } 

make sure understand parallel prefix sums. in particular don't see downsweep step of total sum or parts:

http://http.developer.nvidia.com/gpugems3/gpugems3_ch39.html

i'd in ti's keystone ii sdk you're using in opencl device memory read/write issue see if have scan or parallel prefix sum implementations or built in functions.