xml - R: xmlParse incorrectly "adding" extra " AND " to URL link, parsing fails -


i attempting parse xml output nih's pubmed system. have generated urls parse, xmlparse() function appears adding " , " text urls contain operators.

for example:

url <- 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=smith+m[author]+and+science[journal]' di <- xmlparse(url) dl <- xmltolist(di) 

this results "null" idlist (where results should be):

> dl[["idlist"]] null 

checking querytranslation reveals problem (see: and):

> dl[["querytranslation"]] [1] "smith+m[author] , +and+science[journal]" 

any idea what's going on there? occurring every search field or type of query construct has operator such "and" or "or".

a clean parse finds 20 papers reference:

> url <- 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=smith+bm[author]' > di <- xmlparse(url) > dl <- xmltolist(di) > length(dl[["idlist"]]) [1] 20 

assuming want scratch instead of package mentioned above:

use httr first, retrieve payload, doesn't mess url

library("xml") library("httr") url <- 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=smith+m[author]+and+science[journal]' res <- get(url) di <- xmlparse(content(res, "text")) dl <- xmltolist(di) unname(unlist(dl[["idlist"]]))  [1] "25745065" "25430773" "25395526" "25104368" "24458648" "24264993" "24052300" "23869013" [9] "23363771" "22936773" "22116878" "21940895" "21330515" "21097923" "20966241" "20150469" [17] "19407144" "19150811" "19119232" "19119226"