1

I am using Riak 1.3.1 and am trying to create a binary index name Indexed field name @$@$@ @$ not url safe#@£!. I am able to successfully save a key with this index.

When I query for this key directly using the key value, I get this as a result:

{"indexedFieldValue":"index @$ value","keyValue":"093741d5-940a-49a6-b742-be22c1773e87","indexes":{"Indexed field name @$@$@ @$ not url safe#@£!":"index @$ value"}}

Now, when I try to query using this index by using the URL /index/Indexed+field+name+%40%24%40%24%40+%40%24+not+url+safe%23%40£%21_bin/value, I receive absolutely nothing in response: {"keys":[]}

  • Am I doing something wrong, or
  • Does Riak not support index names that require URL encoding?

Note: I was using the Riak Java client to write the data (and get the same result when querying by secondary index using the Java client), but I do not see how the Java client should have anything to do with this.

Thanks!!

Siddhu
  • 888
  • 3
  • 9
  • 28

1 Answers1

2

Looking into this this morning, yes ... there are issues with the HTTP API and URL encoded index names and index values.

The problem is this: Header names and values are not URL decoded and are stored as-is when you POST, but when you issue a GET request the hierarchical part of the URL (which contains the index name and value) is decoded. Furthermore, + instead of %20 in either is a problem.

If you use %20 and URL-escape the GET, it actually works (note I replaced £ with %A3) :

curl -X POST -H 'x-riak-index-Indexed%20field%20name%20%40%24%40%24%40%20%40%24%20not%20url%20safe%23%40%C2%A3%21_bin: index%20%40%24%20value' -d 'Some Value' http://localhost:8098/buckets/test_bucket/keys/my_key

then

curl localhost:8098/buckets/test_bucket/index/Indexed%2520field%2520name%2520%2540%2524%2540%2524%2540%2520%2540%2524%2520not%2520url%2520safe%2523%2540%25C2%25A3%2521_bin/index%2520%2540%2524%2520value

results in:

{"keys":["my_key"]}

Protocol buffers, on the other hand, doesn't run into these issues. If you were to use the Java client with protocol buffers, the following does work fine (note the client does not automatically url-encode index parameters like it does keys and buckets):

IRiakClient client = RiakFactory.pbcClient();
Bucket b = client.fetchBucket("test_bucket").execute();

String s = "Indexed field name @$@$@ @$ not url safe#@£!";
String s2 = URLEncoder.encode(s, "UTF-8");
System.out.println(s2);
String v = "index @$ value";
String v2 = URLEncoder.encode(v, "UTF-8");
System.out.println(v2);

IRiakObject ro = RiakObjectBuilder.newBuilder("test_bucket", "key")
                  .addIndex(s2, v2)
                  .withValue("Some value")
                  .build();

b.store(ro).execute();

List<String> index = b.fetchIndex(BinIndex.named(s2))
                      .withValue(v2)
                      .execute();

System.out.println(index);

client.shutdown();

Output:

Indexed+field+name+%40%24%40%24%40+%40%24+not+url+safe%23%40%C2%A3%21
index+%40%24+value
[key]

Protocol buffers, in fact, doesn't require you to URL-encode at all ... it'll send over the UTF8 bytes and happily use them for the index name and value. You can remove the URL encoding from the example above and see that it works.

Unfortunately, this is bound to be problematic if you're trying to use HTTP somewhere else. Because the URLEncoder class included with java uses +, using a different URL encoder that uses %20 (or doing a String.replaceAll()) would help, but you're still having to deal with the URL-escaping with HTTP queries.

What it comes down to is that non US-ASCII (or special characters) with 2i is problematic if you use the HTTP API.

UPDATE: I decided this would be easy to fix and have a PR: https://github.com/basho/riak_kv/pull/543 - the one issue is discussed at the end so it may need additional work. We are currently going to code-freeze for riak 1.4 so this may not be available until the next release.

Brian Roach
  • 76,169
  • 12
  • 136
  • 161