opencl atomic operation doesn't work when total work-items is large -
i've working opencl lately. create kernel take 1 global variable shared work-items in kernel. kernel can't simpler, each work-item increment value of result, global variable. code shown.
__kernel void accumulate(__global int* result) { *result = 0; atomic_add(result, 1); }
every thing goes fine when total number of work-items small. on mac pro retina, result correct when work-item around 400.
however, increase global size, such as, 10000. instead of getting 10000 when getting number stored in result, value around 900, means more 1 work-item might access global @ same time.
i wonder possible solution types of problem? help!
*result = 0;
looks problem. small global sizes, every work items atomically increments, leaving correct count. however, when global size becomes larger number can run @ same time (which means run in batches) subsequent batches reset result 0. why you're not getting full count. solution: initialize buffer host side instead , should good. alternatively, initialization on device can initialize global_id == 0, barrier, atomic increment.
Comments
Post a Comment