R: By group, test if for each value of one variable, that value exists in another variable -


i have data frame structured like:

a <- c(1,1,1,2,2,2,3,3,3,3,4,4) b <- c(1,2,3,1,2,3,1,2,3,4,1,2) c <- c(na, na, 2, na, 1, 1, na, na, 1, 1, na, na)  df <- data.frame(a,b,c) 

where a , b uniquely identify observation. want create new variable, d, indicates if each observation's value b present @ least once in c grouped a. such d be:

[1] 0 1 0 1 0 0 1 0 0 0 0 0 

i can write loop trick,

attach(df) (i in unique(a)) {   (j in b[a == i]) {     df$d[a == & b == j] <- ifelse(j %in% c[a == i], 1, 0)   } } 

but surely in r there must cleaner/faster way of achieving same result?

using data.table:

library(data.table) setdt(df) #convert df data.table without copying # +() code golf as.integer df[ , d := +(b %in% c), = a] #     b  c d #  1: 1 1 na 0 #  2: 1 2 na 1 #  3: 1 3  2 0 #  4: 2 1 na 1 #  5: 2 2  1 0 #  6: 2 3  1 0 #  7: 3 1 na 1 #  8: 3 2 na 0 #  9: 3 3  1 0 # 10: 3 4  1 0 # 11: 4 1 na 0 # 12: 4 2 na 0 

adding dplyr version of persuasion. credit due @akrun.

library(dplyr) df %>% group_by(a) %>% mutate(d = +(b %in% c)) 

and posterity, base r version (via @thelatemail below)

df <- df[order(df$a, df$b), ] df$d <- unlist(by(df, df$a, fun = function(x) (x$b %in% x$c) + 0l )) 

Comments