opencl atomic operation doesn't work when total work-items is large -


i've working opencl lately. create kernel take 1 global variable shared work-items in kernel. kernel can't simpler, each work-item increment value of result, global variable. code shown.

__kernel void accumulate(__global int* result) {     *result = 0;     atomic_add(result, 1); } 

every thing goes fine when total number of work-items small. on mac pro retina, result correct when work-item around 400.

however, increase global size, such as, 10000. instead of getting 10000 when getting number stored in result, value around 900, means more 1 work-item might access global @ same time.

i wonder possible solution types of problem? help!

*result = 0; looks problem. small global sizes, every work items atomically increments, leaving correct count. however, when global size becomes larger number can run @ same time (which means run in batches) subsequent batches reset result 0. why you're not getting full count. solution: initialize buffer host side instead , should good. alternatively, initialization on device can initialize global_id == 0, barrier, atomic increment.


Comments

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

c - Why would PK11_GenerateRandom() return an error -8023? -