i attempting parse xml output nih's pubmed system. have generated urls parse, xmlparse() function appears adding " , " text urls contain operators.
for example:
url <- 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=smith+m[author]+and+science[journal]' di <- xmlparse(url) dl <- xmltolist(di)
this results "null" idlist (where results should be):
> dl[["idlist"]] null
checking querytranslation reveals problem (see: and):
> dl[["querytranslation"]] [1] "smith+m[author] , +and+science[journal]"
any idea what's going on there? occurring every search field or type of query construct has operator such "and" or "or".
a clean parse finds 20 papers reference:
> url <- 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=smith+bm[author]' > di <- xmlparse(url) > dl <- xmltolist(di) > length(dl[["idlist"]]) [1] 20
assuming want scratch instead of package mentioned above:
use httr
first, retrieve payload, doesn't mess url
library("xml") library("httr") url <- 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=smith+m[author]+and+science[journal]' res <- get(url) di <- xmlparse(content(res, "text")) dl <- xmltolist(di) unname(unlist(dl[["idlist"]])) [1] "25745065" "25430773" "25395526" "25104368" "24458648" "24264993" "24052300" "23869013" [9] "23363771" "22936773" "22116878" "21940895" "21330515" "21097923" "20966241" "20150469" [17] "19407144" "19150811" "19119232" "19119226"