0

I'm try to fix a java app that process the html page by a URL and get the bytes of content

// this a simplified part of code

private static final Pattern PAT_CHARSET = Pattern.compile("charset=([^; ]+)$");

HttpURLConnection conn = (HttpURLConnection) url.openConnection();
String ct = conn.getContentType();
Charset cs = Charset.forName("utf-8");
String encoding;
  if (ct != null) {
    Matcher in = PAT_CHARSET.matcher(ct);
    if (in.find()) {
      encoding = in.group(1);
      cs = Charset.forName(encoding);
    }
  }

Object in1 = conn.getInputStream();
encoding = conn.getContentEncoding();
if (encoding != null) {
  if ("gzip".equalsIgnoreCase(encoding)) {
    in1 = new GZIPInputStream((InputStream) in1);
  }
}

...

but for some url I get this error

unsupported Content-Encoding: br
Vito Lipari
  • 795
  • 8
  • 35
  • Your question is lacking all the detail and might get closed for that. For example, which URLs, which not? What is the complete error message, including the full stacktrace? Whats the pattern your matching against? Please create a [mcve]. – Zabuzard Jun 05 '19 at 08:40
  • 1
    Quite obviously your patterns first group, after matching, has the content `br`. Which is not a valid charset. Impossible to say why its br though without seeing the content of `ct` and the pattern `PAT_CHARSET`. – Zabuzard Jun 05 '19 at 08:43
  • 1
    Why do you create `in1` as `Object` just to later cast it instead of directly creating it as `InputStream` variable? Looks like a bad idea. – Zabuzard Jun 05 '19 at 08:44

0 Answers0