OpenCL - reusing global memory -
this seemingly basic problem haven't been able right fair amount of trial , error. have kernel makes use of 2 global r/w buffers , 1 local - takes input first buffer, pseudo-sort on using second buffer interim storage, , copies first in order. (stripped) code follows:
struct packet_pointer { int packetindex; int currentcell; }; #define rpc_div_bucket 100 __kernel void pseudosort(__global struct packet_pointer * in,__global struct packet_pointer * out, __local struct packet_pointer * aux) { int = get_local_id(0); int wg = get_local_size(0); int gid = get_global_id(0); int offset = get_group_id(0) * wg; aux[i] = in[i+offset]; barrier(clk_local_mem_fence); //----- //irrelevant code block here //----- out[(gid%1024)*rpc_div_bucket + (gid/1024)] = aux[i]; }
retrieving contents of "out" buffer in parent c program happens without issue. however, when add following lines kernel:
barrier(clk_global_mem_fence); in[gid] = out[gid];
and attempt read "in" buffer, turns garbage values on first execution, have expected data if .exe run second time without modification. have clfinish(commands) call between kernel call , buffer read, should running completion before read attempts. obvious i'm missing here? appreciate in advance - post solution if happen upon before then.
clk_global_mem_fence syncs within workgroup. there no way place barrier sync across workgroups (e.g syncs across threads have identical group_id).
you have race condition there. example when global_id 1 write goes out[100]. particular thread reads out[1] , writes in[1]. out[1] written @ global_id 1024. in different workgroup. read garbage first workgroup going finish before out[1] ever going written.
Comments
Post a Comment