r - Add a column for counting unique tuples in the data frame -


this question has answer here:

suppose have following data frame:

userid <- c(1, 1, 3, 5, 3, 5)      <- c(2, 3, 2, 1, 2, 1) b      <- c(2, 3, 1, 0, 1, 0) df     <- data.frame(userid, a, b) df #   userid b # 1      1 2 2 # 2      1 3 3 # 3      3 2 1 # 4      5 1 0 # 5      3 2 1 # 6      5 1 0 

i create data frame same columns added final column counts number of unique tuples / combinations of other columns. output should following:

userid b count      1 2 2     1      1 3 3     1      3 2 1     2       5 1 0     2 

the meaning the tuple / combination of (1, 2, 2) occurs count=1, while tuple of (3, 2, 1) occurs twice has count=2. prefer not use external packages.

1) aggregate

ag <- aggregate(count ~ ., cbind(count = 1, df), length) ag[do.call("order", ag), ]  # sort rows 

giving:

  userid b count 3      1 2 2     1 4      1 3 3     1 2      3 2 1     2 1      5 1 0     2 

the last line of code sorts rows omitted if order of rows unimportant.

the remaining solutions use indicated packages:

2) sqldf

library(sqldf) names <- tostring(names(df)) fn$sqldf("select *, count(*) count df group $names order $names") 

giving:

  userid b count 1      1 2 2     1 2      1 3 3     1 3      3 2 1     2 4      5 1 0     2 

the order clause omitted if order unimportant.

3) dplyr

library(dplyr) df %>% regroup(as.list(names(df))) %>% summarise(count = n()) 

giving:

source: local data frame [4 x 4] groups: userid,   userid b count 1      1 2 2     1 2      1 3 3     1 3      3 2 1     2 4      5 1 0     2 

4) data.table

library(data.table) data.table(df)[, list(count = .n), = names(df)] 

giving:

   userid b count 1:      1 2 2     1 2:      1 3 3     1 3:      3 2 1     2 4:      5 1 0     2 

added additional solutions. small improvements.


Comments

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

c - Why would PK11_GenerateRandom() return an error -8023? -