The encoding of a binary file generally depends on factors other than the programming language, such as the type of file, operating system, etc. For example, on POSIX, a text-file contains characters organized into zero or more lines, and it is represented the same way in both C++ and Python. Both languages just need to use the proper encoding for the given format. In this case, there is no special process in converting a Python binary stream to a C++ binary stream, as it is a raw byte stream in both languages.
There are two issues with the approach in the original code:
strlen()
should be used to determine the length of a null-terminated character string. If the binary file contains bytes with a value of \0
, then strlen()
will not return the entire size of the data.
- The size of the data is lost because
char*
is used instead of std::string
. Consider using std::string
as it provides both the size via size()
and permits null characters within the string itself. An alternative solution is to explicitly provide the size of the data alongside the data.
Here is a complete Boost.Python example:
#include <boost/python.hpp>
#include <fstream>
#include <iostream>
void write_file(const std::string& name, const std::string& data)
{
std::cout << "This length is: " << data.size() << std::endl;
std::ofstream fout(name.c_str(), std::ios::binary);
fout.write(data.data(), data.size());
fout.close();
}
BOOST_PYTHON_MODULE(example)
{
using namespace boost::python;
def("write", &write_file);
}
And the example Python code (test.py
):
import example
with open("t1.txt", "rb") as f:
example.write("t2.txt", f.read())
with open("p1.png", "rb") as f:
example.write("p2.png", f.read())
And the usage, where I download this image and create a simple text file, then create a copy of them with the above code:
[twsansbury]$ wget http://www.boost.org/style-v2/css_0/get-boost.png -O p1.png >> /dev/null 2>&1
[twsansbury]$ echo "this is a test" > t1.txt
[twsansbury]$ ./test.py
This length is: 15
This length is: 27054
[twsansbury]$ md5sum t1.txt t2.txt
e19c1283c925b3206685ff522acfe3e6 t1.txt
e19c1283c925b3206685ff522acfe3e6 t2.txt
[twsansbury]$ md5sum p1.png p2.png
fe6240bff9b29a90308ef0e2ba959173 p1.png
fe6240bff9b29a90308ef0e2ba959173 p2.png
The md5 checksums match, indicating the file contents are the same.