1

I'm using Apache Calcite to parse and validate some arbitrary SQL. It works for most cases but I've tried to use Unicode characters and hit some bumps.

eg;

String sql = "SELECT '®'";
SqlParser.Config config = SqlParser.configBuilder().setConfig(SqlParser.Config.DEFAULT)
                .setUnquotedCasing(Casing.UNCHANGED)
                .setQuoting(Quoting.BACK_TICK)
                .build();
        SqlParser parser = SqlParser.create(sql, config);
        SqlNode parsed;
        try {
            parsed = parser.parseQuery();
            parsed.toSqlString(MysqlSqlDialect.DEFAULT).getSql();
        } catch (Exception e) {
            // wheels fall off and catch fire
        }

This gives me SELECT u&'\00ae' which my DB doesn't want to handle. Is there some way I can configure this to return SELECT '®'? I've had a look in the SqlDialect classes and I think the issue occurs here

  public void quoteStringLiteral(StringBuilder buf, @Nullable String charsetName,
      String val) {
    if (containsNonAscii(val) && charsetName == null) {
      quoteStringLiteralUnicode(buf, val);
    } else {
      if (charsetName != null) {
        buf.append("_");
        buf.append(charsetName);
      }
      buf.append(literalQuoteString);
      buf.append(val.replace(literalEndQuoteString, literalEscapedQuote));
      buf.append(literalEndQuoteString);
    }
  }

Which doesn't seem to give me any way to avoid this behaviour.

Olaf Kock
  • 46,930
  • 8
  • 59
  • 90
jwfischer
  • 13
  • 4

0 Answers0