character encoding - HTML files with no http-equiv meta tag and the charset may be other than UTF-8 -


we using jsoup - excellent thanks.

we may html files no http-equiv meta tag , charset may other utf-8. how best handle please. can have list of encodings , try them not sure how tell programatically if wrong. jsoup throw ioexception?

jsoup try determine encoding content type header or http equiv tag, if have none of them use utf8. not sure if jsoup can more here.

but can try approach:

implement class reads files you. there can take care of encoding issues. result such class should give proper encoded string or @ least encoding that's used input.

(html input) --> [encoding class] --normalized encoding--> [jsoup] --> (whatever)    

jsoup can parse input known encoding.

i guess changes on html-creation thing not possible, isn't it?

some further readings:


Comments

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

c - Why would PK11_GenerateRandom() return an error -8023? -