Independent-Samples T-Tests

 

Date: 2017

Research Project on Design, Running, and Analyzing Experiments

Instructor: Jacob O. Wobbrock; Scott Klemmer

TYPE: Design, Running, and Analyzing Experiments / R and R Studio- project for Interaction Design Certificate - UCSD/UW

 

 

This study describes a website A/B test in which visitors' time-on-site was measured and analyzed in two website variations

## Scenario:   Comparing pages visited in an A/B test
## Statistics: Descriptive statistics, independent-samples t-test
## Independent-samples t-test

 

# read in a data file with page views from an A/B test - the file timeonsite.csv describes a website A/B test in which visitors' time-on-site was measured in two website variations.

CODE:
timeonsite = read.csv("timeonsite.csv")
View(timeonsite)
timeonsite$Subject = factor(timeonsite$Subject) # convert to nominal factor
summary(timeonsite)

t test 01.png

CONSOLE:

> summary(timeonsite)
    Subject    Site         Time      
 1       :  1     A:300   Min.    :230.0  
 2      :  1     B:300   1st Qu.:297.8  
 3      :  1           Median :348.5  
 4      :  1           Mean   :353.9  
 5      :  1           3rd Qu.:414.2  
 6      :  1           Max.   :480.0  
 (Other):594    

EXPLANATION: 600 subjects were in this website A/B test;  300 subjects were exposed to each variation of the website A and B


# descriptive statistics by Site

CODE:
library(plyr)
ddply(timeonsite, ~ Site, function(data) summary(data$Time))
ddply(timeonsite, ~ Site, summarise, Time.mean=mean(Time), Time.sd=sd(Time))

t test 02.png

CONSOLE:

> ddply(timeonsite, ~ Site, function(data) summary(data$Time))
  Site Min. 1st Qu.  Median     Mean 3rd   Qu. Max.
1    A  241      300   359.0   360.2933    421.25  480
2   B  230     295   341.5    347.5600    403.25  470
> ddply(timeonsite, ~ Site, summarise, Time.mean=mean(Time), Time.sd=sd(Time))
  Site Time.mean  Time.sd
1    A  360.2933  70.31815
2   B  347.5600  65.87822


# graph histograms and a boxplot


hist(timeonsite[timeonsite$Site == "A",]$Time)
hist(timeonsite[timeonsite$Site == "B",]$Time)
plot(Time ~ Site, data=timeonsite)

t test 03.png
t test 04.png
t test 05.png


# independent-samples t-test

CODE:
t.test(Time ~ Site, data=timeonsite, var.equal=TRUE)

t test 06.png

CONSOLE:

Two Sample t-test

data:  Time by Site
t = 2.2889, df = 598, p-value = 0.02243
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1.807652 23.659015
sample estimates:
mean in group A     mean in group B
      360.2933               347.5600