-1

Following the steps in http://caffe.berkeleyvision.org/gathered/examples/feature_extraction.html to extract feature from new images after training,

you end up getting a data.mdb file.

I would much prefer to write features to a txt file so I can easily manipulate it.

I did some googling and found some code, but did not work. Furthermore, the generated data.mdb files, when opened using mdb opener apps on mac, do not show any tables at all.

Is there a easy way to write extracted features to a text file, or alternatively, easy way to manipulate mdb file so we can check the actual values per image?

ytrewq
  • 3,670
  • 9
  • 42
  • 71

2 Answers2

0

The c++ interface for Caffe provides some lmdb functionality to read in data and then write it back out.

You could take a look yourself at how the LMDBDataLayer performs it's reads and then use that as reference to write a program to manipulate/write to text the data from an LMDB database.

Aidan Gomez
  • 8,167
  • 5
  • 28
  • 51
0

I don’t know if you have already figured out how to solve this problem, but here is how I’ve just found out recently.

I am new to Caffe and I find it hard to understand the whole Caffe's architecture. I just wanted a quick method to extract the features from Caffe’s CNN and to manipulate them later. Moreover, I work on OSX and hadn't install Caffe from source. I installed it through 'port', and the installation seems incomplete. So I ran Caffe's 'feature_extractor' on another machine where Caffe's properly installed and copied the output file to my machine to further process.

To this end, you'll have to install LMDB and Google's Protobuf on your machine. You'll have to link the C/C++ program with liblmdb and libprotobuf.

I followed the Caffe’s tutorial to save the output from the layer ‘fc7’ of AlexNet into a file in format LMDB. Then I wrote a simple C/C++ program to read it. This can be done using the following code:

#include <fstream>
#include <iostream>
#include <lmdb.h>

using namespace std;

int main( int argc, char *argv[] )
{
    if( argc!=2 )
    {
        cerr<< "Error"<< endl
            << "Usage : "<< argv[0]<< " mdb_dirname"<< endl;
        return 0;
    }
    char *mdb_dirname     = argv[1];

    int rc;
    MDB_env *env;
    MDB_dbi dbi;
    MDB_val key, data;
    MDB_txn *txn;
    MDB_cursor *cursor;
    char sval[32];

    rc = mdb_env_create(&env);
    rc = mdb_env_open(env, mdb_dirname, 0, 0664);
    rc = mdb_txn_begin(env, NULL, 0, &txn);
    rc = mdb_open(txn, NULL, 0, &dbi);
    rc = mdb_cursor_open(txn, dbi, &cursor);

    key.mv_size  = sizeof(int);
    key.mv_data  = sval;
    data.mv_size = sizeof(sval);
    data.mv_data = sval;

    while ((rc = mdb_cursor_get(cursor, &key, &data, MDB_NEXT)) == 0)
    {
        printf("key: %p %d %.*s, data: %p %d  %.*s\n",
               key.mv_data,  (int) key.mv_size,  (int) key.mv_size,  (char *) key.mv_data,
               data.mv_data, (int) data.mv_size, (int) data.mv_size, (char *) data.mv_data);
    }
    mdb_cursor_close(cursor);
    mdb_txn_abort(txn);

    mdb_close(env, dbi);
    mdb_env_close(env);

    return 0;
}

'mdb_dirname’ is the directory created by 'feature_extractor'. It contains 'data.mdb' and 'lock.mdb'.

Noted that I am also new to LMDB. I don’t really understand every lines of code above. However, I does work :)

If you process your LMDB file, you may observe that the ‘key.mv_data’ is indeed the index of your example. So the ‘data.mv_data’ should contain the feature vector for this example. I looked into the source code of Caffe’s feature_extractor and found that the string ‘data.mv_data’ is obtained from the serialization of a ‘Datum’ object. This Datum is indeed constructed using Google’s Protocol Buffer or Protobuf. You may find ‘caffe.proto’ somewhere in the Caffe directory. This .proto file is processed by the compiler ‘protoc’ and produce ‘caffe.pb.h’ and ‘caffe.pb.cc’ that must be included in your project. If you cannot find it, this is how it looks like

syntax = "proto2";

package caffe;

message Datum {
  optional int32 channels = 1;
  optional int32 height = 2;
  optional int32 width = 3;
  // the actual image data, in bytes
  optional bytes data = 4;
  optional int32 label = 5;
  // Optionally, the datum could also hold float data.
  repeated float float_data = 6;
  // If true data contains an encoded image that need to be decoded
  optional bool encoded = 7 [default = false];
}

Then you may convert ‘data.mv_data’ into the feature vectors by

    while ((rc = mdb_cursor_get(cursor, &key, &data, MDB_NEXT)) == 0)
    {
        string str( (char*)data.mv_data, (int)data.mv_size );

        datum.ParseFromString( str ); 

        if( datum.float_data_size()>0 )
        {
            // datum.float_data_size() is the dimension of the feature vectors
            for( int i = 0; i < datum.float_data_size(); i++ )
            {
                float f = datum.float_data(i);
                // do something
            }
        }
    }

When I build the code above, there're lots of linking errors, such as undefined reference … and other stuffs related to Protobuf. If you encounter the same problem, a solution that I've found is to link the program against libprotobuf.a instead of simply -llmdb (static linkage instead of dynamic linkage).

Another small problem is that the label I assigned to each example in the file processed by 'feature_extractor' is lost. I don't know why. So I've just put these labels into a separate file and process it along the LMDB file. For example, if you want to output LIBSVM file:

    int c = 0;
    while ((rc = mdb_cursor_get(cursor, &key, &data, MDB_NEXT)) == 0)
    {
        string str( (char*)data.mv_data, (int)data.mv_size );
        datum.ParseFromString( str ); 
        if( datum.float_data_size()>0 )
        {
            cout<< label[c]<< " ";
            for( int i = 0; i < datum.float_data_size(); i++ )
            {
                float f = datum.float_data(i);
                cout<< (i+1)<< ":"<< f<< " ";
            }
            cout<< endl;
            c++;
        }
    }

Good luck.