java - What is the correct regular expression for finding specific pattern in these lines? -


so have 1 large file contains bunch of weather data. have allocate each line large file corresponding state file. there total of 50 new state files own data.

the large file contains ~1 million lines of records this:

coop:166657,'new iberia airport acadiana regional la us',200001,177,553 

although name of station can vary , have different number of words.

this regular expression using:

pattern p = pattern.compile(".* ([a-z][a-z]) us.*");  matcher m = p.matcher(line); 

when run program there still instances of lines in pattern not found.

this program:

package climate;  import java.io.bufferedreader; import java.io.file; import java.io.filereader; import java.io.filewriter; import java.io.ioexception; import java.util.arrays; import java.util.scanner; import java.util.regex.matcher; import java.util.regex.pattern;  /**  * program read in large file containing many stations , states,  * , output in order stations corresponding state file.  *   * note: take long time depending on processor. appends data  * files must remove state files in current directory  * before running accuracy.  *   * @author marcus  *  */  public class climatecleanstates {      public static void main(string[] args) throws ioexception {          scanner in = new scanner(system.in);         system.out                 .println("note: program can take long time depending on processor.");         system.out                 .println("it not necessary run state files in directory.");         system.out                 .println("but if see how works, may continue.");         system.out.println("please remove state files before running.");         system.out.println("\nis states directory empty?");         string answer = in.nextline();          if (answer.equals("n")) {             system.exit(0);             in.close();         }         system.out.println("would run program?");         string answer2 = in.nextline();         if (answer2.equals("n")) {             system.exit(0);             in.close();         }          string[] statesspaced = new string[51];          file statefile, dir, infile;          // create files each states         dir = new file("states");         dir.mkdir();           infile = new file("climatedata.csv");         filereader fr = new filereader(infile);         bufferedreader br = new bufferedreader(fr);          string line;         system.out.println();          // read in climatedata.csv         // need implement climaterecord class         final long start = system.currenttimemillis();         while ((line = br.readline()) != null) {             // remove instances of -9999              if (!line.contains("-9999")) {                            pattern p = pattern.compile("^.* ([a-z][a-z]) us.*$");                          matcher m = p.matcher(line);                         string statefilename = null;                          if(m.find()){                             //system.out.println(m.group(1));                             statefilename = m.group(1);                         } else {                             system.out.println("could not find abbreviation");                         }                          /*                         statefilename = "states/" + statefilename + ".csv";                         statefile = new file(statefilename);                          filewriter statewriter = new filewriter(statefile, true);                         statewriter.write(line + "\n");                         // progress reporting                         system.out.printf("writing [%s] file [%s]\n", line,                                 statefile);                         statewriter.flush();                         statewriter.close();                         */                  }         }         system.out.println("elapsed " + (system.currenttimemillis() - start) + " ms");         br.close();         fr.close();         in.close();      }  } 

i think need around functions, assert should precede or follow expression you're matching not included in result.

(?<= )[a-z][a-z](?= us) 

(?<= ) must space before

[a-z][a-z] 2 capital letters

(?= us) must space , letters after

it might pay more robust around: (?= us) (?= us',) instance.