in r want following:
i have gene.list 5 dataframes, each dataframe looks this:
col1 name1 name2 name3 ...
firstly want extract overlap of 5 dataframes. output has new dataframe: output
i have list, called coverage.list 11 dataframes. each dataframe looks this
col1 col2 col3 name1-a 1 2 name2-c 3 4 name3-d 5 6 name4-e 7 8
now each dataframe in coverage.list, want extract lines value in col1 starts value present in new output dataframe, created in previous step. output should new list called coverage.new.list
the first step: extracting overlap of 5 dataframes, trying use
reduce(intersect, coverage.list))
but message 'data frame 0 columns , 0 rows'. when use venn function on list, correct overlap counts
could point me correct solution?
i think looking for
library(dplyr) library(tidyr) # inner join on gene.list tables. inner join gene.list[[1]] gene.list[[2]] # inner join result gene.list[[3]] inner join # inner join gene.list[[4]] gene.list[[5]] output <- inner_join(gene.list[[1]], gene.list[[2]]) %>% inner_join(gene.list[[3]]) %>% inner_join(gene.list[[4]]) %>% inner_join(gene.list[[5]]) coverage.list.new <- lapply(coverage.list, function(x) {x %>% mutate(backup=col1) %>% separate(col1, c("col1", "col1_2"), sep="-") %>% filter(col1 %in% output$col1) %>% mutate(col1=backup) %>% select(-c(backup, col1_2))})
update
coverage.list.new <- lapply(coverage.list, function(x) {x %>% mutate(backup=col1, col1=sub("-", "@", col1)) %>% separate(col1, c("col1", "col1_2"), sep="@") %>% filter(col1 %in% output$col1) %>% mutate(col1=backup) %>% select(-c(backup, col1_2))}) # col1=sub("-", "@", col1) in mutate substituting first - @ # in order split col1 @. if have @ in col1 begin # choose symbol not exist in col1 , replace # in code above @ symbol chosen symbol.
sample data
gene.list <- list(data.frame(col1=c("name1", "name2", "name3")), data.frame(col1=c("name1", "name3", "name4")), data.frame(col1=c("name1", "name3", "name4")), data.frame(col1=c("name1", "name3", "name4")), data.frame(col1=c("name1", "name3", "name4"))) coverage.list <- list(data.frame(col1=c("name1-a", "name2-c", "name3-d", "name4-e"), col2=c(1, 3, 5, 7), col3=c(2, 4, 6, 8)), data.frame(col1=c("name3-a", "name4-c", "name3-d", "name4-e"), col2=c(1, 3, 5, 7), col3=c(2, 4, 6, 8)))