car 100 200 300 group1 34 35 34 group1 57 67 34 group1 68 76 6 group2 45 23 23
i have problems while detecting outliers in dataframe. want detect if there complete vector (one row) outlier of corresponding group vectors (rows one-three)for each group. further want detect if there outlier in 1 specific row. problem found solution code have repeat whole code every single row , check table "true". there outomatisation possible? e.g. creating matrix of outputs have check >sum(matrix==true)
the code:
x=as.numeric(data_without[1,1:400]) grubbs.flag <- function(x) { outliers <- null test <- x grubbs.result <- grubbs.test(test) pv <- grubbs.result$p.value while(pv < 0.05) { outliers <- c(outliers,as.numeric(strsplit(grubbs.result$alternative," ")[[1]][3])) test <- x[!x %in% outliers] grubbs.result <- grubbs.test(test) pv <- grubbs.result$p.value } return(data.frame(x=x,outlier=(x %in% outliers))) } grubbs.flag(x) x outlier 1 0.1157 false 2 0.1152 false 3 0.1163 false 4 0.1165 false
i've read object documentation , default option checks if there single outlier given data. therefore consider suffices run test once per each group.
first data split group , test done recursively each group. p-value , description returned @ end see outlier if - it'd easy identify outlier it'll either maximum or minimum value.
library(outliers) df <- t(data.frame(car = c(100,200,300), g1 = c(34,35,34), g1 = c(57,67,34), g1 = c(68, 76, 6), g2 = c(45, 23, 23))) row.names(df) <- c("car", "group1", "group1", "group1", "group2") lst <- lapply(1:length(unique(row.names(df))), function(x) { df[row.names(df)==unique(row.names(df))[x],] }) lst [[1]] [1] 100 200 300 [[2]] [,1] [,2] [,3] group1 34 35 34 group1 57 67 34 group1 68 76 6 [[3]] [1] 45 23 23 lapply(lst, function(x) { tst <- grubbs.test(x) c(tst$p.value, tst$alternative) }) [[1]] [1] "0.5" "highest value 300 outlier" [[2]] [1] "0.244875529263511" "lowest value 6 outlier" [[3]] [1] "0" "highest value 45 outlier"