0

In trying to fetch images from a MongoDB GridFS collection using the 1.3.4 driver and the Lithium PHP framework, I'm getting broken images. I am having trouble pinpointing when this started happening, as, although the site in question is not live yet, I haven't been smoke testing the entire site after every change and none of my test cases fail (I think someone who is better at creating php test cases than I am could probably tell me how to create a test case for this after reading on).

The images were actually rendering correctly not too long ago. I'm using MongoHQ, a cloud based db, mongodb version 2.4.1.

When I do a wget hex dump on the image fetched through the web app, I get the following.

0000000 0a ff d8 ff e1 0f fe 45 78 69 66 00 00 4d 4d 00
0000010 2a 00 00 00 08 00 0a 01 0f 00 02 00 00 00 06 00
0000020 00 00 86 01 10 00 02 00 00 00 0a 00 00 00 8c 01
0000030 12 00 03 00 00 00 01 00 08 00 00 01 1a 00 05 00
0000040 00 00 01 00 00 00 96 01 1b 00 05 00 00 00 01 00
0000050 00 00 9e 01 28 00 03 00 00 00 01 00 02 00 00 01
0000060 31 00 02 00 00 00 06 00 00 00 a6 01 32 00 02 00
0000070 00 00 14 00 00 00 ac 02 13 00 03 00 00 00 01 00
0000080 01 00 00 87 69 00 04 00 00 00 01 00 00 00 c0 00
0000090 00 00 00 41 70 70 6c 65 00 69 50 68 6f 6e 65 20

but when I do a wget on the static file (not fetched from the data base) I get this:

0000000 ff d8 ff e1 0f fe 45 78 69 66 00 00 4d 4d 00 2a
0000010 00 00 00 08 00 0a 01 0f 00 02 00 00 00 06 00 00
0000020 00 86 01 10 00 02 00 00 00 0a 00 00 00 8c 01 12
0000030 00 03 00 00 00 01 00 08 00 00 01 1a 00 05 00 00
0000040 00 01 00 00 00 96 01 1b 00 05 00 00 00 01 00 00
0000050 00 9e 01 28 00 03 00 00 00 01 00 02 00 00 01 31
0000060 00 02 00 00 00 06 00 00 00 a6 01 32 00 02 00 00
0000070 00 14 00 00 00 ac 02 13 00 03 00 00 00 01 00 01
0000080 00 00 87 69 00 04 00 00 00 01 00 00 00 c0 00 00
0000090 00 00 41 70 70 6c 65 00 69 50 68 6f 6e 65 20 34

The diff is a leading "0a" at the very beginning and the "34" at the very end.

I am getting the relevant fs.files data. For example:

{
  _id: ObjectId("519e31d39bdd497903000007"),
  tags: [
    "mancave"
  ],
  location: [],
  title: "Test Live Site",
  description: "This is for testing the live site",
  credit: "Test",
  user_name: "chuckwh",
  filename: "dog.jpg",
  uploadDate: ISODate("2013-05-23T15:12:19.000Z"),
  length: 86486,
  chunkSize: 262144,
  md5: "88d87a79a98106502777d06a4c7db329"
}

And, obviously, the hex dump indicates that I'm basically getting the image, too, it's just that it looks like I'm getting a corrupted version of it. I don't get anything in my stack trace indicating a problem.

I've noticed the Lithium folks added a patch to their Mongo DB stuff to better handle prefixes but I'm not doing anything tricky there, anyway, just retrieving fs.files, with no potential namespacing issues.

Since it worked before, I'm suspecting a php driver issue, but I don't see any known issues being talked about via Google or here. Is anyone aware of any php driver issues involving mongodb version 2.4.1., GridFS, and the 1.3.4 driver?

As far as code goes, I'm pretty much doing it the way it's done here:

https://github.com/nateabele/photoblog/blob/master/controllers/PhotosController.php

The controller in that link references a model that includes a lithium library or two that don't come with the core lithium package, like behaviors, but again I emphasize that this was working until recently. This part of the site is pretty well siloed from the rest of the site. The only routing change I made was to add some pagination but I commented that out and still am getting the error. I hope this question is "well-formed". I guess what I'm looking for is, rather than a specific solution, some pointers on where I should be looking, since so far I'm obviously not looking in the right places. Thanks

  • What about the code to read the image back? Did you take note of https://github.com/nateabele/photoblog/blob/master/config/bootstrap/media.php#L59? – Nate Abele May 24 '13 at 15:43
  • Thanks, Nate - yes, I used that exact route - as I mentioned, it was working perfectly up until recently. My mongo provider agrees with me that it's probably a driver issue. I haven't had a chance to run a mongo shell on the IP for that but will update this post as soon as I do. Thanks again. – Chuck White May 24 '13 at 23:27

1 Answers1

0

There have not been any significant changes to the GridFS component since 1.3.4 (see: changelog). One thing to investigate would be reading the data from fs.chunks directly.

Since the entire image for the fs.files document you shared above is only 86486 bytes and the chunk size is 262144, we can expect that it'd be contained entirely in the first chunk. Searching the fs.chunks collection where the files_id field is ObjectId("519e31d39bdd497903000007") should turn up the single document. Additional details about the schema of that collection can be found here.

One thing to note is that the MD5 hash is calculated by the server when the file is initially stored. Therefore, the hash in your document above should match the that of the chunk's data field. If these values differ, it's possible that the collection was modified after the fact. Memory/network corruption could be also be at play. The logic to read files in the PHP driver is simply concatenating fields from a query on fs.chunks, so it's unlikely there's an outstanding bug there that wouldn't have already been caught by the test suite.

jmikola
  • 6,892
  • 1
  • 31
  • 61