i have data frame structured like:
a <- c(1,1,1,2,2,2,3,3,3,3,4,4) b <- c(1,2,3,1,2,3,1,2,3,4,1,2) c <- c(na, na, 2, na, 1, 1, na, na, 1, 1, na, na) df <- data.frame(a,b,c) where a , b uniquely identify observation. want create new variable, d, indicates if each observation's value b present @ least once in c grouped a. such d be:
[1] 0 1 0 1 0 0 1 0 0 0 0 0 i can write loop trick,
attach(df) (i in unique(a)) { (j in b[a == i]) { df$d[a == & b == j] <- ifelse(j %in% c[a == i], 1, 0) } } but surely in r there must cleaner/faster way of achieving same result?
using data.table:
library(data.table) setdt(df) #convert df data.table without copying # +() code golf as.integer df[ , d := +(b %in% c), = a] # b c d # 1: 1 1 na 0 # 2: 1 2 na 1 # 3: 1 3 2 0 # 4: 2 1 na 1 # 5: 2 2 1 0 # 6: 2 3 1 0 # 7: 3 1 na 1 # 8: 3 2 na 0 # 9: 3 3 1 0 # 10: 3 4 1 0 # 11: 4 1 na 0 # 12: 4 2 na 0 adding dplyr version of persuasion. credit due @akrun.
library(dplyr) df %>% group_by(a) %>% mutate(d = +(b %in% c)) and posterity, base r version (via @thelatemail below)
df <- df[order(df$a, df$b), ] df$d <- unlist(by(df, df$a, fun = function(x) (x$b %in% x$c) + 0l ))
Comments
Post a Comment