utf 8 - Java code reads UTF-8 text incorrectly -
i'm having problem reading utf-8 characters in code (running on eclipse).
i have file text
has few lines in it, example:
אך 1234
note: there \t
before word, , word should appear on left, number on right... don't know how reverse them here, sorry.
that is, hebrew word , number.
i need separate word number somehow. tried this:
bufferedreader br = new bufferedreader(new filereader(text)); string content; while ((content = br.readline()) != null) { string delims = "[ ]+"; string[] tokens = content.split(delims); }
the problem reason, code reads content
(the first line in file) follows:
אך\t1234
...meaning space isn't in correct place.
i suppose tokenize text using \t
, i'm not sure should it, file isn't being read correctly...
does have idea why happens?
thanks :-)
i think matching space when there tab there?
can try this:
bufferedreader br = new bufferedreader(new filereader(text)); string content; while ((content = br.readline()) != null) { string delims = "\\s"; string[] tokens = content.split(delims); }
Comments
Post a Comment