java - Downloading online text with different encodings -


i downloading online text, can uploaded users, texts can utf-8, iso-8859-1, etc...

the problem don't know wich encoding using users, , if user has uploaded utf-8 text works perfect if user has uploaded iso-8859-1 text accents (á é etc..) these characters not shown correctly.

i tryed force text encoding utf-8 not works cases (buffer.tostring("utf-8"))

this code:

javaurl = new url(urlparser.parse(textresource.geturlstr()));                     connection = javaurl.openconnection();                                           connection.setconnecttimeout(2000);                     connection.setreadtimeout(2000);                     inputstream input = new bufferedinputstream(connection.getinputstream());                     bytearrayoutputstream buffer = new bytearrayoutputstream();                     int nread;                     try{                                 byte [] data = new byte [1024];                         while ((nread = input.read(data, 0, data.length)) != -1) {                             buffer.write(data, 0, nread);                         }                         buffer.flush();                         total = buffer.tostring();                                       }finally{                         input.close();                         buffer.close();                     } 

since have multiple possible encodings , don't know correct have little choice use charsetdecoder here.

the plan:

  • open inputstream connection;
  • read contents byte[] array;
  • try different encodings until find suitable one.

here 1 possible method find correct encoding:

public boolean ischarset(final charset charset, final byte[] contents)     throws ioexception {     final charsetdecoder decoder = charset.newdecoder()         .onmalformedinput(codingerroraction.report);     final bytebuffer buf = bytebuffer.wrap(contents);      try {         decoder.decode(buf);         return true;     } catch (charactercodingexception ignored) {         return false;     } } 

try different set of encodings (preferrably starting utf-8).


Comments