i downloading online text, can uploaded users, texts can utf-8, iso-8859-1, etc...
the problem don't know wich encoding using users, , if user has uploaded utf-8 text works perfect if user has uploaded iso-8859-1 text accents (á é etc..) these characters not shown correctly.
i tryed force text encoding utf-8 not works cases (buffer.tostring("utf-8"))
this code:
javaurl = new url(urlparser.parse(textresource.geturlstr())); connection = javaurl.openconnection(); connection.setconnecttimeout(2000); connection.setreadtimeout(2000); inputstream input = new bufferedinputstream(connection.getinputstream()); bytearrayoutputstream buffer = new bytearrayoutputstream(); int nread; try{ byte [] data = new byte [1024]; while ((nread = input.read(data, 0, data.length)) != -1) { buffer.write(data, 0, nread); } buffer.flush(); total = buffer.tostring(); }finally{ input.close(); buffer.close(); }
since have multiple possible encodings , don't know correct have little choice use charsetdecoder here.
the plan:
- open
inputstreamconnection; - read contents
byte[]array; - try different encodings until find suitable one.
here 1 possible method find correct encoding:
public boolean ischarset(final charset charset, final byte[] contents) throws ioexception { final charsetdecoder decoder = charset.newdecoder() .onmalformedinput(codingerroraction.report); final bytebuffer buf = bytebuffer.wrap(contents); try { decoder.decode(buf); return true; } catch (charactercodingexception ignored) { return false; } } try different set of encodings (preferrably starting utf-8).
Comments
Post a Comment