0

This one is related to my previous question on the performance of Arrays and Hashes in Ruby.

Prerequisites

I know that using Hashes to store lots of Objects leads to significant performance increase because of the O(1) lookup.

Now let's assume I had two classes, namely A and B and they can be related to each other, but only if a third class C exists (it's sort of a relationship class). To give a practical example, let's say I have Document, Query and the relationship class Judgement (this is from information retrieval, so basically the judgement tells you whether the document is relevant for a query or not).

enter image description here

(I hope I got this right)

The Problem

In most cases you'd like to find out how many Judgements there are for a combination of Document and Query or if there are any.

In order to find out the latter, I'll iterate over each Jugdement ...

@judgements.each { |j| return true if j.document == document and j.query == query }

Now, this brings us back to a linear search again, which isn't that helpful.

How to solve it?

I was thinking about a way to have double Hashes - if there's such a thing - so that I could just look up Judgements using the Document and Query I already have.

Or is there any other way of quickly finding out whether a Judgement exists for a given pair of Document and Query?

Community
  • 1
  • 1
slhck
  • 36,575
  • 28
  • 148
  • 201

2 Answers2

3

Well, if you need performance, you can always create another data structure to facilitate indexing - in your case you can write a hash where keys would be [document, query] pairs, and values arrays of judgments. Depending on the architecture of your app, you can either update this index upon every change in your objects, or build indexes from scratch whenever you need to perform a batch of lookups.

Or, perhaps, you should leave it to the database to do your lookups instead, of course, if you have a database at all.

Mladen Jablanović
  • 43,461
  • 10
  • 90
  • 113
  • I don't have a database at all. Didn't know I could just use an Array as a Hash key, that simplifies everything. Thanks! – slhck Apr 06 '11 at 13:23
1

This

@judgements.each { |j| return true if j.document == document and j.query == query }

could be written as

@judgements.any? { |j| j.document == document and j.query == query }

I agree with Mladen Jablanović in that that's a good chance that you should let your database handle this. In MongoDB it would be something like this

db = Mongo::Connection.new.db("mydb")
judgements = db.collection("judgements")
judgement = {:judgement_no=> "2011:73", :document => 4711, :query => 42}
judgements.add(judgement)
judgements.create_index([['document', Mongo::ASCENDING], ['query', Mongo::ASCENDING]])

judgements.find({:document => 4711, :query => 42}).each { |jm| puts jm }
Jonas Elfström
  • 30,834
  • 6
  • 70
  • 106