r - dplyr - Aggregation incorrect? -
i have som issues dplyr , group_by function not working expected. using summarise, expect mean of var1 each unique combination of id , year entered group_by statement.
this code should create df id-year observations, want aggregate mean of var 1 each combination of id , year. however, not working expected , output ignores id, , aggregates on year.
df <- data.frame(id=c(1,1,2,2,2,3,3,4,4,5), year=c(2013,2013,2012,2013,2013,2013,2012,2012,2013,2013), var1=rnorm(10))
dplyr code:
dfagg <- df %.% group_by(id, year) %.% select(id, year, var1) %.% summarise( var1=mean(var1) )
result:
> dfagg source: local data frame [8 x 2] groups: year year var1 1 2013 0.22924025 2 2012 -0.93073687 3 2013 -0.82351583 4 2012 0.05656113 5 2013 -0.21622021 6 2012 1.91158209 7 2013 -2.67003628 8 2013 -0.72662276
any idea going on?
to make sure no other package interrupted dplyr functions tried below same result.
dfagg <- df %.% dplyr::group_by(id, year) %.% dplyr::select(id, year, var1) %.% dplyr::summarise( var1=mean(var1) )
i don't think need select()
line. using group_by()
, summarise()
did trick me.
library(dplyr) df <- data.frame(id=c(1,1,2,2,2,3,3,4,4,5), year=c(2013,2013,2012,2013,2013,2013,2012,2012,2013,2013), var1=rnorm(10)) df %>% group_by(id, year) %>% summarise(mean_var1=mean(var1)) -> dfagg
result:
id year mean_var1 (dbl) (dbl) (dbl) 1 1 2013 -1.20744511 2 2 2012 -0.59159641 3 2 2013 -0.03660552 4 3 2012 -0.38853566 5 3 2013 -1.76459495 6 4 2012 -0.66926387 7 4 2013 0.70451751 8 5 2013 -0.82762769
Comments
Post a Comment