r - Split a data frame and sort the splitted values based on a particular column -


i have following data frame

tdf <- structure(list(go = c("cytokine-cytokine receptor interaction",  "cytokine-cytokine receptor interaction|endocytosis", "i-kappab kinase/nf-kappab signaling",  "nf-kappa b signaling pathway", "nf-kappab import nucleus",  "t cell chemotaxis"), poscount = c(17, 18, 4, 5, 1, 2), shortgo = structure(c(7l,  7l, 18l, 18l, 18l, 21l), .label = c("tnf", "adaptive", "alpha",  "apop", "beta", "chemokine", "cytokine", "death", "defense",  "gamma", "immune response", "infla", "interleukin-1 ", "interleukin-10 ",  "interleukin-12 ", "interleukin-18 ", "interleukin-6 ", "kappa",  "migration", "stress", "taxis", "wound"), class = "factor")), .names = c("go",  "poscount", "shortgo"), class = "data.frame", row.names = c(na,  6l)) 

that looks this:

> tdf                                                   go poscount  shortgo 1             cytokine-cytokine receptor interaction       17 cytokine 2 cytokine-cytokine receptor interaction|endocytosis       18 cytokine 3                i-kappab kinase/nf-kappab signaling        4    kappa 4                       nf-kappa b signaling pathway        5    kappa 5                      nf-kappab import nucleus        1    kappa 6                                  t cell chemotaxis        2    taxis 

what want split data frame according shortgo , sort go member poscount, yielding (handcrafted):

$cytokine [1] cytokine-cytokine receptor interaction|endocytosis [2] cytokine-cytokine receptor interaction   $kappa [1] nf-kappa b signaling pathway [2] i-kappab kinase/nf-kappab signaling  [3] nf-kappab import nucleus  $taxis [1] t cell chemotaxis 

i'm stuck this:

> split(tdf$go,tdf$shortgo) error in split.default(tdf$go, tdf$hsortgo) :    group length 0 data length > 0 

how can go it?

you can order dataframe first before split:

library(dplyr) tdf <- tdf %>% group_by(shortgo) %>% arrange(desc(poscount)) 

then split:

ldf <- split(tdf$go, tdf$shortgo, drop=true) 

which gives desired (ordered) output:

> ldf $cytokine [1] "cytokine-cytokine receptor interaction|endocytosis" [2] "cytokine-cytokine receptor interaction"              $kappa [1] "nf-kappa b signaling pathway"        [2] "i-kappab kinase/nf-kappab signaling" [3] "nf-kappab import nucleus"        $taxis [1] "t cell chemotaxis" 

when want split dataframe in list of dataframes, can use:

ldf <- split(tdf, tdf$shortgo, drop=true) 

a solution base r (provided @henrik in comments):

split(tdf$go[order(tdf$shortgo, -tdf$poscount)], tdf$shortgo, drop=true)