How to auto detect the encoding of srt subtitle file -
i have product here have weakness in auto detect encoding of srt subtitle files compared competitor. can auto detect encoding smi files, since has language info in header. srt, cannot that. how can apply auto detect srt files? references example algorithm can learn first step appreciated. fyi, product should support western europe, central europe, cyrillic alphabet, greek, turkish, hebrew, arabic, baltic, korean, s-chinese, t-chinese, vietnam, thai.
there plenty of tools detect charset of text file (e.g. srt files). example, in command line of linux machine can use chardet:
chardet subtile_file_name.srt
this utility should installed pip (python installer). in ubuntu:
sudo apt-get install python-pip pip install chardet
if need integrate detector in application, there open libraries job. example, in tool dualsub implemented in java, used juniversalchardet.
Comments
Post a Comment