statistics - Running AB tests on Revenue in Python -
i'm trying run ab test - comparing revenue amongst variants on websites.
our standard approach (using t-tests) didn't seem work because revenue can't modelled binomially. however, read bootstrapping , came following code:
import numpy np import scipy.stats stats import random def resampler(original_array, number_of_samples): sample_array = np.zeros(number_of_samples) choice = random.choice in range(number_of_samples): sample_array[i] = sum([choice(original_array) _ in range(len(original_array))]) y = stats.normaltest(sample_array) if y[1] > 0.001: print y new_y = resampler(original_array, number_of_samples * 2) y = new_y return sample_array
basically, randomly sample 'revenue vector' (a sparsely populated vector - 0 non-converting visitors) , sum resulting vectors until you've got normal distribution.
i can perform both test groups @ point i've got 2 distributed quantities t-testing. using scipy.stats.ttest_ind
able results looked someway reasonable.
however, wondered effect of running procedure on cookie split (expected each group see 50% of cookies). here, saw unexpected - given following code:
x = [272898,389076,61091,65251,10060,1468815,216014,25863,42421,476379,73761] y = [274253,387941,61333,65020,10056,1466908,214679,25682,42873,474692,73837] print stats.ttest_ind(x,y)
i output: (0.0021911476165975929, 0.99827342714956546)
not @ significant (i think i'm interpreting correctly?)
however, when run code:
for in range(1000, 100000, 5000): one_array = resampler(x,i) two_array = resampler(y,i) t_value, p_value = stats.ttest_ind(one_array, two_array) t_value_array.append(t_value) p_value_array.append(p_value) print np.mean(t_value_array) print np.mean(p_value_array)
i get: 0.642213492773 0.490587258892
i'm not sure how interpret these numbers - far i'm aware, i've repeatedly generated normal distributions actual cookie splits (each number in array represents different site). in each of these cases, i've used t-test on 2 distributions , gotten t-statistic , p-value.
is legitimate thing do? ran these tests multiple times because seeing variation in p-value , t-statistic when not doing this.
am missing obvious way run kind of test?
cheers,
matt
p.s
the data have: website 1 : test group 1: unique cookies: revenue website 1 : test group 2: unique cookies: revenue website 2 : test group 1: unique cookies: revenue website 2 : test group 2: unique cookies: revenue e.t.c.
what we'd like:
test group x beating test group y z% certainty
(null hypothesis of test group 1 = test group 2)
bonus:
the same above @ per site, overall, basis
firstly, using t-test test binomial response variables isn't correct. need use logistic regression model.
on question. it's hard read code , understand think you're testing---what's h_0 (null hypothesis)? if i'm being honest (and hope don't take offense) looks pretty confused.
i'm going have guess data like---you have bunch of samples this:
website method revenue ------- ------ ------- w1 12 w2 b 0 w3 6 w4 b 0
etc etc. correct? have repeated measures (i.e. have revenue measurement each website each method? or did randomly assign websites methods?)? i'm guessing you're passing method array of revenues 1 of methods in turn, pair across methods in way?
i can imagine testing various hypotheses data. example, method more generate non-zero revenue method b (use logistic regression, response binary)? of cases method generates revenue @ all, method generate more method b (t-test on non-zero revenues)? method generate more revenue method b across instances (probably sign test, due problems assumption of normality when include zeros). assume troubling assumption why run procedure of repeatedly subsampling until data normal, can't , test meaningful: because subset of data distributed doesn't mean can @ part of it! in fact, wouldn't surprised see excludes either of 0 entries or of non-zero entries.
if elaborate of actual data like, , questions want answer, i'm happy make more specific suggestions.
Comments
Post a Comment