1

I have a Rails app with a couple resources that I need to run queries on that involve bitwise operations. Right now, I'm using PostgreSQL, and I created a migration for my 'user' model that uses the postgres-specific 'BIT VARYING' datatype, since this is what was recommended for the bitwise '&' operation on the postgres website:

add_column :users, :timeslots, :'BIT VARYING'

In one of my queries, I'm using '&' this way:

self.where("available_lbs > 0 AND status = 0 AND ? & timeslot > 0::bit AND available_end >= ?", user.timeslots, Time.now)

This seems to work on my machine, but there are two issues:

  1. Both the datatype and the query are database-specific, so if I migrate to another database, I'd probably have to change things
  2. This migration didn't seem to update the schema.rb file properly. It's still using the Rails 'string' datatype when describing how to create the table:

create_table "users", :force => true do |t|

t.string "timeslots", :limit => nil

end

So when I setup the app on a new machine, it uses the wrong datatype and things break. Any ideas on a good solution for this? (I tried using the 'binary' datatype, but that didn't seem to work well with postgres)

Andrew
  • 2,425
  • 2
  • 24
  • 35
  • this article may help you http://vitobotta.com/applying-bitwise-operations-to-rspec-testing/ – Antarr Byrd Jun 14 '13 at 19:45
  • This article has some good info about the math, but it doesn't say anything about dealing with the database aspect. – Andrew Jun 14 '13 at 20:02
  • Why are you using bit operators inside a relational database? That's generally a nasty thing to do and your database will probably hate you for it. – mu is too short Jun 14 '13 at 20:13
  • I'm trying to find all user records for which the & of the timeslots field with another timeslots field that I pass in is > 0. I figured it was faster to do this in a query than to pull them all into Ruby and do it there. – Andrew Jun 14 '13 at 20:16
  • 1
    But why is `timeslot` a bitmap at all? Why not use a representation that is more natural for a relational database? Bitmaps make sense in C but they're almost always premature optimization everywhere else. – mu is too short Jun 14 '13 at 20:23

1 Answers1

1

I suggest normalizing the data model: user --- one-to-many ---> timeslots.

This is portable, and you could use all higher level SQL functions like window aggregates etc.

Further your records are "human readable", for example in ad-hoc queries, as they do not need bit-wise operations.

If you opt for bit fields, you normally optimize for size/memory usage, at the cost of performance. I have never seen comparisions for this regarding PostgreSQL, but this is the case at assembly level for microprocessors/microcontrollers: Even in memory restricted microcontrollers, you normally choose bool (which needs one byte, and in case of alignment possibly four bytes), because access is faster, and it needs fewer instructions. And it's easier to code and to debug as well.

Beryllium
  • 12,808
  • 10
  • 56
  • 86
  • All good points. I was using this approach before, but I have 168 different timeslots (one for each hour of the week), each of which directly maps to a cell on a calendar in my UI. If I use a one-to-many relationship, I need to seed the database with 168 different timeslots, and then maintain a giant join table to keep track of which users have which timeslots. It was fairly nasty in my opinion, so flipping bits seemed much cleaner. What I think I'm going to do is break the bitfield up into 7 integers, each for a different day of the week. – Andrew Jun 14 '13 at 21:52
  • Could you add your data model to your question? You have users with a many-to-many relation to timeslots, possibly bound to a certain period of time (= differs for each month/year/week etc)? – Beryllium Jun 14 '13 at 22:24
  • Yes, it is a many-to-many relationship, but the month/year/week is not recorded in any way and is irrelevant. It's simply a record of which 1-hour timeslots are filled for any given week for each user. – Andrew Jun 14 '13 at 22:44
  • So with 10000 users, 10 *assigned* timeslots per user/day you would have 700000 rows. That shouldn't be a problem. You could replace the "timeslots in hours" with time ranges, but that's not really necessary. Whether these assignments are stored in bit fields or rows, the logical data volume is the same. Unless there is a performance problem, I would not denormalize it into 7 integers. – Beryllium Jun 14 '13 at 23:05