How to auto detect the encoding of srt subtitle file -


i have product here have weakness in auto detect encoding of srt subtitle files compared competitor. can auto detect encoding smi files, since has language info in header. srt, cannot that. how can apply auto detect srt files? references example algorithm can learn first step appreciated. fyi, product should support western europe, central europe, cyrillic alphabet, greek, turkish, hebrew, arabic, baltic, korean, s-chinese, t-chinese, vietnam, thai.

there plenty of tools detect charset of text file (e.g. srt files). example, in command line of linux machine can use chardet:

chardet subtile_file_name.srt 

this utility should installed pip (python installer). in ubuntu:

sudo apt-get install python-pip pip install chardet 

if need integrate detector in application, there open libraries job. example, in tool dualsub implemented in java, used juniversalchardet.


Comments

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

c - Why would PK11_GenerateRandom() return an error -8023? -