utf 8 - Java code reads UTF-8 text incorrectly -

January 15, 2012

i'm having problem reading utf-8 characters in code (running on eclipse).

i have file text has few lines in it, example:

אך  1234

note: there \t before word, , word should appear on left, number on right... don't know how reverse them here, sorry.

that is, hebrew word , number.

i need separate word number somehow. tried this:

        bufferedreader br = new bufferedreader(new filereader(text));         string content;          while ((content = br.readline()) != null)          {             string delims = "[ ]+";             string[] tokens = content.split(delims);         }

the problem reason, code reads content (the first line in file) follows:

אך\t1234

...meaning space isn't in correct place.

i suppose tokenize text using \t, i'm not sure should it, file isn't being read correctly...

does have idea why happens?

thanks :-)

i think matching space when there tab there?

can try this:

bufferedreader br = new bufferedreader(new filereader(text)); string content;  while ((content = br.readline()) != null)  {     string delims = "\\s";     string[] tokens = content.split(delims); }

Search This Blog

And

utf 8 - Java code reads UTF-8 text incorrectly -

Comments

Post a Comment

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

how to run a query SQL in node.js mysql -