i have class main:
public class main { // args[0] - path file first , last words // args[1] - path file dictionary public static void main(string[] args) { try { list<string> firstlastwords = fileparser.getwords(args[0]); system.out.println(firstlastwords); system.out.println(firstlastwords.get(0).length()); } catch (ioexception ex) { ex.printstacktrace(); } } }
and have fileparser:
public class fileparser { public fileparser() { } final static charset encoding = standardcharsets.utf_8; public static list<string> getwords(string filepath) throws ioexception { list<string> list = new arraylist<string>(); path path = paths.get(filepath); try (bufferedreader reader = files.newbufferedreader(path, encoding)) { string line = null; while ((line = reader.readline()) != null) { string line1 = line.replaceall("\\s+",""); if (!line1.equals("") && !line1.equals(" ") ){ list.add(line1); } } reader.close(); } return list; } }
args[0]
path txt file 2 words. if file contains:
тор кит
programm returns:
[тор, кит] 4
if file contains:
т тор кит
programm returns:
[т, тор, кит] 2
if file contains:
//jump next line
тор
кит
programm returns:
[, тор, кит] 1
where digit - length of first string in list.
so question why counts 1 more symbol?
thanks all.
this symbol said @bill bom (http://en.wikipedia.org/wiki/byte_order_mark) , reside @ beginning of text file. found symbol line:
system.out.println(((int)firstlastwords.get(0).charat(0)));
it gave me 65279
then changed line:
string line1 = line.replaceall("\\s+","");
this
string line1 = line.replaceall("\ufeff","");