c++ - Generating Random Numbers with CUDA via rejection method. Performance problems -
i'm running monte carlo code particle simulation, written in cuda. basically, in each step calculate velocity of each particle , update position. velocity directly proportional path length. given material, path length has distribution. know probability density function of path length. try sample random numbers according function via rejection method. describe cuda knowledge limited. understood, preferable create large chunks of random numbers @ once instead of multiple small chunks. however, rejection method, generate 2 random numbers, check condition , repeat procedure in case of failure. therefore generate random numbers on kernel.
using profiler / nvvp noticed, 50% of time spend during rejection method.
here question: there ways optimize rejection methods?
i appreciate every answer.
code
here rejection method.
__global__ void rejectsamplepathlength(float* p, curandstate* globalstate, int numparticles, float sigma, int timestep,curandstate state) { int = blockdim.x * blockidx.x + threadidx.x; if (i < numparticles) { bool success = false; float p; float rho1, rho2; float a, b; = 0.0; b = 10.0; curand_init(i, 0, 0, &state); while (!success) { rho1 = curand_uniform(&globalstate[i]); rho2 = curand_uniform(&globalstate[i]); if (rho2 < pathlength(a, b, rho1, sigma)) { p = + rho1 * (b - a); success = true; } } p[i] = abs(p); } }
the pathlength function in if statement computes value y=f(x) on kernel. i"m pretty sure, curand_init problematic in terms of time, without statement, each kernel generate same numbers?
maybe create pool of random generated uniform variable in previous kernel , pick uniform in pool , cycling on pool. should large enough avoid infinite loop..
Comments
Post a Comment