1

I'm using ruby 1.9.3 and ruby-datamapper with postgresql. I have got a table full of entries, each of them has three properties: id ,text and created_at. The last one is a Date object.

In my application I SELECT some entries like so: collection = Entry.all(:text => /SomeRegexp/). I now want to know all different values of created_at that occur in the returned DataMapper::Collection . I thought of the following two ways:

  1. Iterate through the DataMapper::Collection and collect all dates.

    dates = Array.new
    Entry.all(:text => /SomeRegexp/).each { |entry| dates.include?(entry.date) ? next : dates = dates << entry.date }
    

    Pros:

    • It should work

    Cons:

    • It's slow for massive amounts of data
  2. Ask the database for entries of each day and collect dates where collection != nil

    dates = Array.new
    for date in DatabaseStartDate..Date.today
      Entry.all(:created_at => date, :text => /SomeRegexp/).empty? ? next : dates = dates << date)
    end
    

    Pros:

    • Should be faster than the example above, especially with big amounts of data

    Cons:

    • Many Database queries

NOTICE: The code above is untested pseudo code, it should just give readers a rough idea of what the real code should do, so it's not likely to work. If you need a more detailed example feel free to comment, I will try to explain it better then!

NEXT NOTICE: If you find errors in the pseudo code please tell me, so I can get started faster! ;)

So finally my question is: Which solution should I prefer or is there a better one that didn't come into my mind?

le_me
  • 3,089
  • 3
  • 26
  • 28

2 Answers2

1

Third option: Raw SQL.

repository(:default).adapter.select('SELECT created_at FROM entries WHERE text SIMILAR TO ' + pattern)

where pattern will be an expression like those in http://www.postgresql.org/docs/8.3/static/functions-matching.html

AlexQueue
  • 6,353
  • 5
  • 35
  • 44
0

OK, I made some tests, here are my results:

Solution #1 is a bit faster for small amounts of data, but for >10_000 records the second solution is clearly faster.

It's possible to improve performance for both solutions with selecting only the elements one needs eg. Entry.all(:text => /SomeRegexp/, :fields => [:id, :date])

Both code snippets worked for me.

If you know a better solution (maybe something built into datamapper or even postgresql) please reanswer the question! ;)

le_me
  • 3,089
  • 3
  • 26
  • 28