OpenCL - reusing global memory -


this seemingly basic problem haven't been able right fair amount of trial , error. have kernel makes use of 2 global r/w buffers , 1 local - takes input first buffer, pseudo-sort on using second buffer interim storage, , copies first in order. (stripped) code follows:

struct packet_pointer {        int packetindex;        int currentcell; };  #define rpc_div_bucket 100 __kernel void pseudosort(__global struct packet_pointer * in,__global struct packet_pointer * out, __local struct packet_pointer * aux) {   int = get_local_id(0);   int wg = get_local_size(0);   int gid = get_global_id(0);   int offset = get_group_id(0) * wg;    aux[i] = in[i+offset];   barrier(clk_local_mem_fence);   //-----   //irrelevant code block here   //-----    out[(gid%1024)*rpc_div_bucket + (gid/1024)] = aux[i]; } 

retrieving contents of "out" buffer in parent c program happens without issue. however, when add following lines kernel:

    barrier(clk_global_mem_fence);     in[gid] = out[gid]; 

and attempt read "in" buffer, turns garbage values on first execution, have expected data if .exe run second time without modification. have clfinish(commands) call between kernel call , buffer read, should running completion before read attempts. obvious i'm missing here? appreciate in advance - post solution if happen upon before then.

clk_global_mem_fence syncs within workgroup. there no way place barrier sync across workgroups (e.g syncs across threads have identical group_id).

you have race condition there. example when global_id 1 write goes out[100]. particular thread reads out[1] , writes in[1]. out[1] written @ global_id 1024. in different workgroup. read garbage first workgroup going finish before out[1] ever going written.


Comments

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

c - Why would PK11_GenerateRandom() return an error -8023? -