I have a complex file read issue....I have a need to read a DOCX file with an embedded file system, extract a ZIP file, and peruse the ZIP file's internal directory to extract the actual files I need. I already have written this code in Java successfully, so I know it can be accomplished. But, I want to do this in Rust.
Currently, I can read the DOCX file, iterate through the OLE10 objects to locate the file I need. The OLE10 file (which is actually the ZIP) has a weird extraction command header of 256 bytes, which I seek past. If I read the rest of the file stream and write it to the filesystem it will write out as a ZIP. I can use 7-zip to open the file and see all the contents.
The problem is, no matter what Rust ZIP crate I use (zip, zip_extract, zip_extensions, rc-zip) I just cannot extract the ZIP contents. I continuously run into an issue "cannot find end of central directory". I have iterated through the file, and the EOCD tag of "50 4B 05 06" is actually there. If I end the stream at the EOCD, I got an "early end of file exit" error. The file is >9M, and I am wondering if this might be the issue.
Anyone have any ideas how to use Rust to extract the ZIP directory and attach it to a buffer or the filesystem?
Here's the code that just won't extract:
let docx_path = Path::new(docx_filename);
// Capture the files from the embedded CFB filesystem
let mut comp_file = cfb::open(docx_path).unwrap();
let objpool_entries_vec: Vec<_> = comp_file // Collect the entries of /ObjectPool
.read_storage(Path::new("/ObjectPool"))
.unwrap()
.map(|subdir| comp_file.read_storage(subdir.path().to_owned())
.unwrap()
.filter(|path| path.name().contains("Ole10Native"))
.next()
)
.filter(|entry| entry.is_some()) // Filter entries with data
.map(|entry| entry.unwrap()) // Unwrap those entries with data
.collect();
let mut ole10_stream = comp_file.open_stream(objpool_entries_vec[5].path()) // Create stream of the OLE10 file
.unwrap();
ole10_stream.seek(std::io::SeekFrom::Start(256)); // skip the 256 byte header
let mut ole_buffer = Vec::new();
ole10_stream.read_to_end(&mut ole_buffer);
let zip_cursor = Cursor::new(ole_buffer);
zip_extract::extract(
zip_cursor,
&PathBuf::from("C:\\Users\\ra069466\\Documents\\Software_Projects\\Rust_projects\\ha420_maint_app\\test_files\\"),
false)
.unwrap();
When I run the following, it writes out the ZIP to the directory and I can extract with 7zip. But, it still panics when trying to extract to the filesystem.
let docx_path = Path::new(docx_filename);
// Capture the files from the embedded CFB filesystem
let mut comp_file = cfb::open(docx_path).unwrap();
let objpool_entries_vec: Vec<_> = comp_file // Collect the entries of /ObjectPool
.read_storage(Path::new("/ObjectPool"))
.unwrap()
.map(|subdir| comp_file.read_storage(subdir.path().to_owned())
.unwrap()
.filter(|path| path.name().contains("Ole10Native"))
.next()
)
.filter(|entry| entry.is_some()) // Filter entries with data
.map(|entry| entry.unwrap()) // Unwrap those entries with data
.collect();
let mut ole10_stream = comp_file.open_stream(objpool_entries_vec[5].path()) // Create stream of the OLE10 file
.unwrap();
ole10_stream.seek(std::io::SeekFrom::Start(256)); // skip the 256 byte header
let mut ole_buffer = Vec::new();
ole10_stream.read_to_end(&mut ole_buffer);
let zip_cursor = Cursor::new(ole_buffer);
let mut zip_file = OpenOptions::new()
.write(true)
.create(true)
.open("C:\\Users\\ra069466\\Documents\\Software_Projects\\Rust_projects\\ha420_maint_app\\test_files\\test.zip")?;
zip_file.write_all(&mut zip_cursor.get_ref())?;
zip_file.flush();
let mut zip_file = File::open("C:\\Users\\ra069466\\Documents\\Software_Projects\\Rust_projects\\ha420_maint_app\\test_files\\test.zip")?;
let zip_archive = zip::ZipArchive::new(&zip_file)?;
zip_extract::extract(
zip_file,
&PathBuf::from("C:\\Users\\ra069466\\Documents\\Software_Projects\\Rust_projects\\ha420_maint_app\\test_files\\"),
false)
.unwrap();