I have a 1.5GB text file with the UTF-16LE encoding. I want to read it and test regex matches with the lines of the file. Right now I use the following two crates.
encoding_rs = "0.8.31"
encoding_rs_io = "0.1.7"
The code to read the file looks like this:
fn decode_utf16le(buf: Vec<u8>) -> String {
let enc = encoding_rs::Encoding::for_label("utf-16le".as_bytes());
let mut dec = encoding_rs_io::DecodeReaderBytesBuilder::new()
.encoding(enc)
.build(&buf[..]);
let mut res = String::new();
dec.read_to_string(&mut res).unwrap();
res
}
let mut file = File::open("huge.text.file.txt").unwrap();
let mut buffer = Vec::new();
file.read_to_end(&mut buffer).unwrap(
let mut contents = decode_utf16le(buffer);
However, the statement decode_utf16le(buffer)
is so slow and it takes almost 20 seconds. Is it possible to read the file directly and match against a regex?