r - Add a column for counting unique tuples in the data frame -
this question has answer here:
- how frequencies add variable in array? 3 answers
suppose have following data frame:
userid <- c(1, 1, 3, 5, 3, 5) <- c(2, 3, 2, 1, 2, 1) b <- c(2, 3, 1, 0, 1, 0) df <- data.frame(userid, a, b) df # userid b # 1 1 2 2 # 2 1 3 3 # 3 3 2 1 # 4 5 1 0 # 5 3 2 1 # 6 5 1 0
i create data frame same columns added final column counts number of unique tuples / combinations of other columns. output should following:
userid b count 1 2 2 1 1 3 3 1 3 2 1 2 5 1 0 2
the meaning the tuple / combination of (1, 2, 2) occurs count=1, while tuple of (3, 2, 1) occurs twice has count=2. prefer not use external packages.
1) aggregate
ag <- aggregate(count ~ ., cbind(count = 1, df), length) ag[do.call("order", ag), ] # sort rows
giving:
userid b count 3 1 2 2 1 4 1 3 3 1 2 3 2 1 2 1 5 1 0 2
the last line of code sorts rows omitted if order of rows unimportant.
the remaining solutions use indicated packages:
2) sqldf
library(sqldf) names <- tostring(names(df)) fn$sqldf("select *, count(*) count df group $names order $names")
giving:
userid b count 1 1 2 2 1 2 1 3 3 1 3 3 2 1 2 4 5 1 0 2
the order clause omitted if order unimportant.
3) dplyr
library(dplyr) df %>% regroup(as.list(names(df))) %>% summarise(count = n())
giving:
source: local data frame [4 x 4] groups: userid, userid b count 1 1 2 2 1 2 1 3 3 1 3 3 2 1 2 4 5 1 0 2
4) data.table
library(data.table) data.table(df)[, list(count = .n), = names(df)]
giving:
userid b count 1: 1 2 2 1 2: 1 3 3 1 3: 3 2 1 2 4: 5 1 0 2
added additional solutions. small improvements.
Comments
Post a Comment