i split these addresses respective categories(street number, street name, city , state , zip) check same. can basic idea on how carry out in r?
company address 1. 1 ne 1 street miami,fl 33132 2. b 1 1st street miami,fl 33132 3. c 1 ne 1st st miami,fl 33132 4. d 1 1st street miami,fl 33134 5. e 100 biscayne blvd. miami,fl 33132 6. f 100 biscayne blvd miami ,fl 33132 7. g 100 biscayne boulevard suite 604 miami,fl 33132 8. h 100 biscayne blvd. suite 604 miami,fl 33132 9. 100 n. biscayne blvd. miami,fl 33132
try read.pattern
in gsubfn package. if lines in file replace text = lines
character string giving file name. can fragile , may need adjust regular expression once have more data try out with.
lines <- "company address 1. 1 ne 1 street miami,fl 33132 2. b 1 1st street miami,fl 33132 3. c 1 ne 1st st miami,fl 33132 4. d 1 1st street miami,fl 33134 5. e 100 biscayne blvd. miami,fl 33132 6. f 100 biscayne blvd miami ,fl 33132 7. g 100 biscayne boulevard suite 604 miami,fl 33132 8. h 100 biscayne blvd. suite 604 miami,fl 33132 9. 100 n. biscayne blvd. miami,fl 33132" library(gsubfn) df <- read.pattern(text = lines, pattern = "\\s+ \\s+ *(\\d+) (.*) (\\s+) ?,(\\s+) (\\d+)$", skip = 1, as.is = true, col.names = c("no", "street", "city", "state", "zip"))
giving:
> df no street city state zip 1 1 ne 1 street miami fl 33132 2 1 1st street miami fl 33132 3 1 ne 1st st miami fl 33132 4 1 1st street miami fl 33134 5 100 biscayne blvd. miami fl 33132 6 100 biscayne blvd miami fl 33132 7 100 biscayne boulevard suite 604 miami fl 33132 8 100 biscayne blvd. suite 604 miami fl 33132 9 100 n. biscayne blvd. miami fl 33132
here regular expression visualized:
\s+ \s+ *(\d+) (.*) (\s+) ?,(\s+) (\d+)$