how to split addresses (with uneven format) into various fields in R -


i split these addresses respective categories(street number, street name, city , state , zip) check same. can basic idea on how carry out in r?

    company                          address   1.                    1 ne 1 street miami,fl 33132  2. b                     1 1st street miami,fl 33132  3. c                      1 ne 1st st miami,fl 33132  4. d                     1 1st street miami,fl 33134  5. e               100 biscayne blvd. miami,fl 33132  6. f               100 biscayne blvd miami ,fl 33132  7. g 100 biscayne boulevard suite 604 miami,fl 33132  8. h     100 biscayne blvd. suite 604 miami,fl 33132  9.            100 n. biscayne blvd. miami,fl 33132 

try read.pattern in gsubfn package. if lines in file replace text = lines character string giving file name. can fragile , may need adjust regular expression once have more data try out with.

lines <- "company                          address  1.                    1 ne 1 street miami,fl 33132  2. b                     1 1st street miami,fl 33132  3. c                      1 ne 1st st miami,fl 33132  4. d                     1 1st street miami,fl 33134  5. e               100 biscayne blvd. miami,fl 33132  6. f               100 biscayne blvd miami ,fl 33132  7. g 100 biscayne boulevard suite 604 miami,fl 33132  8. h     100 biscayne blvd. suite 604 miami,fl 33132  9.            100 n. biscayne blvd. miami,fl 33132"  library(gsubfn) df <- read.pattern(text = lines,    pattern = "\\s+ \\s+ *(\\d+) (.*) (\\s+) ?,(\\s+) (\\d+)$",   skip = 1,    as.is = true,   col.names = c("no", "street", "city", "state", "zip")) 

giving:

> df    no                       street  city state   zip 1   1                  ne 1 street miami    fl 33132 2   1                   1st street miami    fl 33132 3   1                    ne 1st st miami    fl 33132 4   1                   1st street miami    fl 33134 5 100               biscayne blvd. miami    fl 33132 6 100                biscayne blvd miami    fl 33132 7 100 biscayne boulevard suite 604 miami    fl 33132 8 100     biscayne blvd. suite 604 miami    fl 33132 9 100            n. biscayne blvd. miami    fl 33132 

here regular expression visualized:

\s+ \s+ *(\d+) (.*) (\s+) ?,(\s+) (\d+)$ 

regular expression visualization

debuggex demo