I don’t know if you have already figured out how to solve this problem, but here is how I’ve just found out recently.
I am new to Caffe and I find it hard to understand the whole Caffe's architecture. I just wanted a quick method to extract the features from Caffe’s CNN and to manipulate them later. Moreover, I work on OSX and hadn't install Caffe from source. I installed it through 'port', and the installation seems incomplete. So I ran Caffe's 'feature_extractor' on another machine where Caffe's properly installed and copied the output file to my machine to further process.
To this end, you'll have to install LMDB and Google's Protobuf on your machine. You'll have to link the C/C++ program with liblmdb and libprotobuf.
I followed the Caffe’s tutorial to save the output from the layer ‘fc7’ of AlexNet into a file in format LMDB. Then I wrote a simple C/C++ program to read it. This can be done using the following code:
#include <fstream>
#include <iostream>
#include <lmdb.h>
using namespace std;
int main( int argc, char *argv[] )
{
if( argc!=2 )
{
cerr<< "Error"<< endl
<< "Usage : "<< argv[0]<< " mdb_dirname"<< endl;
return 0;
}
char *mdb_dirname = argv[1];
int rc;
MDB_env *env;
MDB_dbi dbi;
MDB_val key, data;
MDB_txn *txn;
MDB_cursor *cursor;
char sval[32];
rc = mdb_env_create(&env);
rc = mdb_env_open(env, mdb_dirname, 0, 0664);
rc = mdb_txn_begin(env, NULL, 0, &txn);
rc = mdb_open(txn, NULL, 0, &dbi);
rc = mdb_cursor_open(txn, dbi, &cursor);
key.mv_size = sizeof(int);
key.mv_data = sval;
data.mv_size = sizeof(sval);
data.mv_data = sval;
while ((rc = mdb_cursor_get(cursor, &key, &data, MDB_NEXT)) == 0)
{
printf("key: %p %d %.*s, data: %p %d %.*s\n",
key.mv_data, (int) key.mv_size, (int) key.mv_size, (char *) key.mv_data,
data.mv_data, (int) data.mv_size, (int) data.mv_size, (char *) data.mv_data);
}
mdb_cursor_close(cursor);
mdb_txn_abort(txn);
mdb_close(env, dbi);
mdb_env_close(env);
return 0;
}
'mdb_dirname’ is the directory created by 'feature_extractor'. It contains 'data.mdb' and 'lock.mdb'.
Noted that I am also new to LMDB. I don’t really understand every lines of code above. However, I does work :)
If you process your LMDB file, you may observe that the ‘key.mv_data’ is indeed the index of your example. So the ‘data.mv_data’ should contain the feature vector for this example. I looked into the source code of Caffe’s feature_extractor and found that the string ‘data.mv_data’ is obtained from the serialization of a ‘Datum’ object. This Datum is indeed constructed using Google’s Protocol Buffer or Protobuf. You may find ‘caffe.proto’ somewhere in the Caffe directory. This .proto file is processed by the compiler ‘protoc’ and produce ‘caffe.pb.h’ and ‘caffe.pb.cc’ that must be included in your project. If you cannot find it, this is how it looks like
syntax = "proto2";
package caffe;
message Datum {
optional int32 channels = 1;
optional int32 height = 2;
optional int32 width = 3;
// the actual image data, in bytes
optional bytes data = 4;
optional int32 label = 5;
// Optionally, the datum could also hold float data.
repeated float float_data = 6;
// If true data contains an encoded image that need to be decoded
optional bool encoded = 7 [default = false];
}
Then you may convert ‘data.mv_data’ into the feature vectors by
while ((rc = mdb_cursor_get(cursor, &key, &data, MDB_NEXT)) == 0)
{
string str( (char*)data.mv_data, (int)data.mv_size );
datum.ParseFromString( str );
if( datum.float_data_size()>0 )
{
// datum.float_data_size() is the dimension of the feature vectors
for( int i = 0; i < datum.float_data_size(); i++ )
{
float f = datum.float_data(i);
// do something
}
}
}
When I build the code above, there're lots of linking errors, such as undefined reference … and other stuffs related to Protobuf. If you encounter the same problem, a solution that I've found is to link the program against libprotobuf.a instead of simply -llmdb (static linkage instead of dynamic linkage).
Another small problem is that the label I assigned to each example in the file processed by 'feature_extractor' is lost. I don't know why. So I've just put these labels into a separate file and process it along the LMDB file. For example, if you want to output LIBSVM file:
int c = 0;
while ((rc = mdb_cursor_get(cursor, &key, &data, MDB_NEXT)) == 0)
{
string str( (char*)data.mv_data, (int)data.mv_size );
datum.ParseFromString( str );
if( datum.float_data_size()>0 )
{
cout<< label[c]<< " ";
for( int i = 0; i < datum.float_data_size(); i++ )
{
float f = datum.float_data(i);
cout<< (i+1)<< ":"<< f<< " ";
}
cout<< endl;
c++;
}
}
Good luck.