I have a database (couchDB) with about 90k documents in it. The documents are very simple like this:
{
"_id": "1894496e-1c9e-4b40-9ba6-65ffeaca2ccf",
"_rev": "1-2d978d19-3651-4af9-a8d5-b70759655e6a",
"productName": "Cola"
}
now I want one day to sync this database with a mobile device. Obviously 90k docs shouldn't go to the phone all at once. This is why I wrote filter functions. These are supposed to filter by "productName". At first in Javascript later in Erlang to gain performance. These Filter functions look like this in JavaScript:
{
"_id": "_design/local_filters",
"_rev": "11-57abe842a82c9835d63597be2b05117d",
"filters": {
"by_fanta": "function(doc, req){ if(doc.productName == 'Fanta'){ return doc;}}",
"by_wasser": "function(doc, req){if(doc.productName == 'Wasser'){ return doc;}}",
"by_sprite": "function(doc, req){if(doc.productName == 'Sprite'){ return doc;}}"
}
}
and like this in Erlang:
{
"_id": "_design/erlang_filter",
"_rev": "74-f537ec4b6508cee1995baacfddffa6d4",
"language": "erlang",
"filters": {
"by_fanta": "fun({Doc}, {Req}) -> case proplists:get_value(<<\"productName\">>, Doc) of <<\"Fanta\">> -> true; _ -> false end end.",
"by_wasser": "fun({Doc}, {Req}) -> case proplists:get_value(<<\"productName\">>, Doc) of <<\"Wasser\">> -> true; _ -> false end end.",
"by_sprite": "fun({Doc}, {Req}) -> case proplists:get_value(<<\"productName\">>, Doc) of <<\"Sprite\">> -> true; _ -> false end end."
}
}
To keep it simple there is no query yet but a "hardcoded" string. The filter all work. The problem is they are way to slow. I wrote a testprogram first in Java later in Perl to test the time it takes to filter the documents. Here one of my Perl scripts:
$dt = DBIx::Class::TimeStamp->get_timestamp();
$content = get("http://127.0.0.1:5984/mobile_product_test/_changes?filter=local_filters/by_sprite");
$dy = DBIx::Class::TimeStamp->get_timestamp() - $dt;
$dm = $dy->minutes();
$dz = $dy->seconds();
@contArr = split("\n", $content);
$arraysz = @contArr;
$arraysz = $arraysz - 3;
$\="\n";
print($dm.':'.$dz.' with '.$arraysz.' Elements (JavaScript)');
And now the sad part. These are the times I get:
2:35 with 2 Elements (Erlang)
2:40 with 10000 Elements (Erlang)
2:38 with 30000 Elements (Erlang)
2:31 with 2 Elements (JavaScript)
2:40 with 10000 Elements (JavaScript)
2:51 with 30000 Elements (JavaScript)
btw these are Minutes:Seconds. The number is the number of elements returned by the filter and the database had 90k Elements in it. The big surprise was that the Erlang filter was not faster at all.
To request all elements only takes 9 seconds. And creating views about 15. But it is not possible for my use on a phone to transfer all documents (volume and security reasons).
Is there a way to filter on a view to get a performance increase? Or is something wrong with my erlang filter functions (I'm not surprised by the times for the JavaScript filters).
EDIT: As pointed out by pgras the reason why this is slow is posted in the answer to this Question. To have the erlang filters run faster I need to go a "layer" below and program the erlang directly into the database and not as a _design document. But I dont'r really know where to start and how to do this. Any tips would be helpful.