1

I have some code written in Ruby 1.9.2 patch level 136 and I'm have an issue where when I perform a find via the _id in the raw ruby mongo driver I get a nil when trying to use a value from a csv file. Here's the code:

require 'mongo'
require 'csv'
require 'bson'

# Games database
gamedb = Mongo::Connection.new("localhost", 27017).db("gamedb")
@games = gamedb.collection("games")

# Loop over CSV data.
CSV.foreach("/tmp/somedata.csv") do |row|

  puts row[0] # Puts the ObjectId

  @game = @games.find( { "_id" => row[0] } ).first  
  puts @game.inspect

end

The CSV file looks like this:

_id,game_title,platform,upc_db_match,upc
4ecdacc339c7d7a2a6000002,TMNT,PSP,TMNT,085391157663
4ecdacc339c7d7a2a6000004,Super Mario Galaxy,Wii,Super Mario Galaxy,045496900434
4ecdacc339c7d7a2a6000005,Beowulf,PSP,Beowulf,097363473046

The first column is the objectId in Mongo that I already have. If I perform a local find from the mongo command line the values in the first column, I get the data I want. However, the code above returns nil on the @game.inspect call.

I've tried the following variations, which all produce nil:

@game = @games.find( { "_id" => row[0].to_s } ).first
@game = @games.find( { "_id" => row[0].to_s.strip } ).first

I've even tried building the ObjectId with the BSON classes as such:

@game = @games.find( { "_id" => BSON::ObjectId(row[0]) } ).first

or

@game = @games.find( { "_id" => BSON::ObjectId("#{row[0]}") } ).first

Both of which output the following error:

/Users/donnfelker/.rvm/gems/ruby-1.9.2-p136@upc-etl/gems/bson-1.4.0/lib/bson/types/object_id.rb:126:in `from_string': illegal ObjectId format: _id (BSON::InvalidObjectId)
    from /Users/donnfelker/.rvm/gems/ruby-1.9.2-p136@upc-etl/gems/bson-1.4.0/lib/bson/types/object_id.rb:26:in `ObjectId'
    from migrate_upc_from_csv.rb:14:in `block in <main>'
    from /Users/donnfelker/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1768:in `each'
    from /Users/donnfelker/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1202:in `block in foreach'
    from /Users/donnfelker/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1340:in `open'
    from /Users/donnfelker/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1201:in `foreach'
    from migrate_upc_from_csv.rb:10:in `<main>'

The crazy thing is, if I manually create the BSON ObjectId by hand it works (as shown below):

@game = @games.find( { "_id" => BSON::ObjectId("4ecdacc339c7d7a2a6000004") } ).first

When I run @game.inspect I get my data back, as I would expect. However, If I change this to use row[0], I get nil.

Why? What am I doing wrong?

System Details

$ gem list

*** LOCAL GEMS ***

bson (1.4.0)
bson_ext (1.4.0)
mongo (1.4.0)

RVM Version: rvm 1.6.9

Ruby Version: ruby 1.9.2p136 (2010-12-25 revision 30365) [x86_64-darwin10.6.0]

Mongo Version:

[initandlisten] db version v1.8.2, pdfile version 4.5
[initandlisten] git version: 433bbaa14aaba6860da15bd4de8edf600f56501b

Again, why? What am I doing wrong here? Thanks!

Donn Felker
  • 9,553
  • 7
  • 48
  • 66

2 Answers2

2

Are you sure your CSV parsing code isn't treating the headers as a first line of data and actually tries to do BSON::ObjectId("_id")? The error message kinda looks like it. Try with FasterCSV.foreach('/tmp/somedata.csv', :headers => true) and using row['_id'] (IIRC you'll still have to use BSON::ObjectID).

Michael Kohl
  • 66,324
  • 14
  • 138
  • 158
  • Thanks for the comment. I'm not using CSV, however your answer did help me get to the correct end result with the assistance of the correct syntax below. Thanks Michael! – Donn Felker Nov 29 '11 at 02:11
2

The first row is not being read as a header, to do that pass in :headers => true like this:

require 'csv'

# Loop over CSV data.
CSV.foreach("/tmp/somedata.csv", :headers => true) do |row|

  puts row[0] # Puts the ObjectId

end

If you do not pass the :headers parameter in you can see the first row[0] object is the string "_id":

_id
4ecdacc339c7d7a2a6000002
4ecdacc339c7d7a2a6000004
4ecdacc339c7d7a2a6000005

When you include it, you are golden:

4ecdacc339c7d7a2a6000002
4ecdacc339c7d7a2a6000004
4ecdacc339c7d7a2a6000005
Tyler Brock
  • 29,626
  • 15
  • 79
  • 79
  • Isn't that essentially what I said? – Michael Kohl Nov 28 '11 at 23:56
  • Yeah sorry about that! I hadn't refreshed the page before I started writing. At least Donn now also has a definitive answer which might be a little more clear. I tried the above on my computer and can confirm our solution fixes the problem he is having and that he can accept the answer. Apologies if I stepped on any toes here. – Tyler Brock Nov 29 '11 at 00:04
  • 1
    Accepting this as the answer as the code sample shows what I need. I'm not using FasterCSV (which I know has been replaced in 1.9.2 etc etc etc). I still up voted the original though too. Thanks! – Donn Felker Nov 29 '11 at 02:10
  • Also, its good to note that the reason I never saw the _id header as the first puts is because my CSV file is > 1000 lines long. The screen output scrolled before I could notice the _id. – Donn Felker Nov 29 '11 at 02:18