statistics - Running AB tests on Revenue in Python -


i'm trying run ab test - comparing revenue amongst variants on websites.

our standard approach (using t-tests) didn't seem work because revenue can't modelled binomially. however, read bootstrapping , came following code:

import numpy np import scipy.stats stats import random  def resampler(original_array, number_of_samples):     sample_array = np.zeros(number_of_samples)     choice = random.choice     in range(number_of_samples):         sample_array[i] = sum([choice(original_array) _ in range(len(original_array))])      y = stats.normaltest(sample_array)     if y[1] > 0.001:         print y         new_y = resampler(original_array, number_of_samples * 2)         y = new_y     return sample_array 

basically, randomly sample 'revenue vector' (a sparsely populated vector - 0 non-converting visitors) , sum resulting vectors until you've got normal distribution.

i can perform both test groups @ point i've got 2 distributed quantities t-testing. using scipy.stats.ttest_ind able results looked someway reasonable.

however, wondered effect of running procedure on cookie split (expected each group see 50% of cookies). here, saw unexpected - given following code:

x = [272898,389076,61091,65251,10060,1468815,216014,25863,42421,476379,73761] y = [274253,387941,61333,65020,10056,1466908,214679,25682,42873,474692,73837] print stats.ttest_ind(x,y) 

i output: (0.0021911476165975929, 0.99827342714956546)

not @ significant (i think i'm interpreting correctly?)

however, when run code:

for in range(1000, 100000, 5000):     one_array = resampler(x,i)     two_array = resampler(y,i)     t_value, p_value = stats.ttest_ind(one_array, two_array)     t_value_array.append(t_value)     p_value_array.append(p_value)  print np.mean(t_value_array) print np.mean(p_value_array) 

i get: 0.642213492773 0.490587258892

i'm not sure how interpret these numbers - far i'm aware, i've repeatedly generated normal distributions actual cookie splits (each number in array represents different site). in each of these cases, i've used t-test on 2 distributions , gotten t-statistic , p-value.

is legitimate thing do? ran these tests multiple times because seeing variation in p-value , t-statistic when not doing this.

am missing obvious way run kind of test?

cheers,

matt

p.s

the data have: website 1 : test group 1: unique cookies: revenue website 1 : test group 2: unique cookies: revenue website 2 : test group 1: unique cookies: revenue website 2 : test group 2: unique cookies: revenue e.t.c.

what we'd like:

test group x beating test group y z% certainty

(null hypothesis of test group 1 = test group 2)

bonus:

the same above @ per site, overall, basis

firstly, using t-test test binomial response variables isn't correct. need use logistic regression model.

on question. it's hard read code , understand think you're testing---what's h_0 (null hypothesis)? if i'm being honest (and hope don't take offense) looks pretty confused.

i'm going have guess data like---you have bunch of samples this:

website   method     revenue -------   ------     ------- w1                 12 w2        b          0 w3                 6 w4        b          0 

etc etc. correct? have repeated measures (i.e. have revenue measurement each website each method? or did randomly assign websites methods?)? i'm guessing you're passing method array of revenues 1 of methods in turn, pair across methods in way?

i can imagine testing various hypotheses data. example, method more generate non-zero revenue method b (use logistic regression, response binary)? of cases method generates revenue @ all, method generate more method b (t-test on non-zero revenues)? method generate more revenue method b across instances (probably sign test, due problems assumption of normality when include zeros). assume troubling assumption why run procedure of repeatedly subsampling until data normal, can't , test meaningful: because subset of data distributed doesn't mean can @ part of it! in fact, wouldn't surprised see excludes either of 0 entries or of non-zero entries.

if elaborate of actual data like, , questions want answer, i'm happy make more specific suggestions.


Comments

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

c - Why would PK11_GenerateRandom() return an error -8023? -