0

Update Went for an alternate method of reading in See below

Original post

I'm still new to C++, I've been coding in Unreal for a bit, but I'm writing a File Parser and am doing it as a standalone solution. I want to read an unknown length of bytes from an unknown start point in the unique_ptr array. Also would be appreciative of comments on the cleanliness or efficiency of the code, want to go forward with good habits.

I've been able to read in the files using basic_ifstream as so

auto ReadFileData(std::string path) {
    auto fileSize = std::filesystem::file_size(path);
    auto buf = std::make_unique<std::byte[]>(fileSize);
    std::basic_ifstream<std::byte> ifs(path, std::ios::binary);
    ifs.read(buf.get(), fileSize);
    return buf;
}

That returns the unique_ptr byte array to fileData. I've written a method to read in a certain number of bytes to convert to a get values from the binary, but it takes them one byte at a time and I feel this is going to be slow.

uint32_t ParseUint32(auto& fileData, uint32_t dataLocation) {
    return uint32_t((unsigned char)(fileData[dataLocation+3]) << 24 |
        (unsigned char)(fileData[dataLocation + 2]) << 16 |
        (unsigned char)(fileData[dataLocation + 1]) << 8 |
        (unsigned char)(fileData[dataLocation]));
}

But I've got to a point that I need to read in a compressed range of bytes to decompress the data within. I'm not seeming to get anywhere with memcpy, and have read that unique_ptr arrays can't be copied? The last line is where I'm stuck.

for(uint16_t i = 0; i<tileOffsetCount; i++)
{
    tileOffsets[i] = ParseUint32(fileData,tileOffset + (i*4));
    uint32_t tileByteSize = ParseUint32(fileData,tileByteCountsOffset +(i*4)) ;
    char* tileData = new char[tileByteSize];
    std::memcpy(tileData, fileData+tileOffsets[i],tileByteSize); //This line is where I'm stuck
}
r3dstar
  • 1
  • 1
  • Why not use a `std::vector`? – Eljay Mar 12 '23 at 11:28
  • Is the update meant as an answer to the original question? If so, maybe better to revert it from the question and add it as an answer. Or is there still a question there? It's not clear to me – mous Mar 12 '23 at 11:28
  • @Eljay I also wondered a bit about it, but I figured that sometimes maybe one wants to make it more difficult to accidentally make copies, which is maybe a legitimate reason? – mous Mar 12 '23 at 11:30
  • @mous I'll edit the update to an answer. Sorry I've often read Stack Overflow but never asked questions as I've generally found good answers. Also to the second point. I have no real idea what I'm doing and probably found an overly complex solution to something simple. Using seek and read was way, way faster as well. – r3dstar Mar 12 '23 at 11:45

2 Answers2

0

It's true that unique_ptrs can't be copied (the copy ctor, and copy assignment operator of unique_ptrs are deleted), but it should still be possible to copy parts of the data that a unique_ptr is pointing to.

I suspect the problem stems from this addition fileData+tileOffsets[i], where fileData is a unique_ptr and tileOffsets[i] is an uint32_t, but I don't think such an addition operator exists. You can try to modify the expression to fileData.get()+tileOffsets[i]. Adding the get() here is not a bad thing, as we only use the pointer we get to copy some of the data, without storing that pointer (it's storing pointers for later that most often leads to trouble).

I don't think the following part is common or maybe even possible to do more efficiently

uint32_t ParseUint32(auto& fileData, uint32_t dataLocation) {
    return uint32_t((unsigned char)(fileData[dataLocation+3]) << 24 |
        (unsigned char)(fileData[dataLocation + 2]) << 16 |
        (unsigned char)(fileData[dataLocation + 1]) << 8 |
        (unsigned char)(fileData[dataLocation]));
}

The only thing I know that may be worth to additionally mind about here is if you want the program to be truly cross platform, then it's a best practice to mind about byte endianness when storing and loading multibyte data to a file system, or sending data over network.

mous
  • 149
  • 1
  • 11
  • Thanks @mous The last line of code I wrote was definately off, however as I suspected, as soon as I posted I found a solution which was to change how I read the file in ```std::ifstream ReadFileData(std::string path) { std::ifstream dataFile(path,std::ios::binary); return dataFile; }``` – r3dstar Mar 12 '23 at 11:07
  • Then use seekg ```uint32_t ParseUint32(std::ifstream& fileData, uint32_t dataLocation) { char* memblock = new char[4]; fileData.seekg(dataLocation); fileData.read(memblock,4); return uint32_t(((unsigned char)memblock[3]) << 24 | ((unsigned char)memblock[2]) << 16 | ((unsigned char)memblock[1]) << 8 | ((unsigned char)memblock[0])); }``` – r3dstar Mar 12 '23 at 11:09
  • You should've pointed out the possible memory leak in OP's last snippet too. Why not to use `unique_ptr` again?! – Red.Wave Mar 12 '23 at 11:10
  • 1
    @r3dstar don't use `new` if the required memory size is statically known, specially if it is as low as 4 bytes. Just `char memblock[4]{};`. – Red.Wave Mar 12 '23 at 11:26
  • @Red.Wave Thanks for the tip. Applied! – r3dstar Mar 12 '23 at 11:40
0

The update

I over complicated the initial file reading, which made it more difficult. Using seek and read was something I've done in C# and should have realised here. Also, it is much faster...

std::ifstream ReadFileData(std::string path) {
    std::ifstream dataFile(path,std::ios::binary);
    return dataFile;
}

uint32_t ParseUint32(std::ifstream& fileData, uint32_t dataLocation) {
    char memblock[4]{};  
    fileData.seekg(dataLocation);
    fileData.read(memblock,4);
    return uint32_t(((unsigned char)memblock[3]) << 24 |
        ((unsigned char)memblock[2]) << 16 |
        ((unsigned char)memblock[1]) << 8 |
        ((unsigned char)memblock[0]));
}

char* ParseDataBlock(std::ifstream& fileData, uint32_t dataLocation, uint32_t dataSize) {
    char* memblock = new char[dataSize]; 
    fileData.seekg(dataLocation);
    fileData.read(memblock,dataSize);
    return memblock;
}


int main() {
    std::string filePath = "my/filepath/";
    for (const auto& entry : std::filesystem::directory_iterator(filePath)) {
        std::string filePath = { entry.path().string()};
        if (filePath.substr(filePath.size() - 3) == "tif") {
            std::ifstream fileData = ReadFileData(filePath);
            uint32_t* tileOffsets = new uint32_t[tileOffsetCount];
            for(uint16_t i = 0; i<tileOffsetCount; i++)
            {
                tileOffsets[i] = ParseUint32(fileData,tileOffset + (i*4));
                uint32_t tileByteSize = ParseUint32(fileData,tileByteCountsOffset +(i*4)) ;
                char* tileData = ParseDataBlock(fileData,tileOffsets[i],tileByteSize);
            }/....
r3dstar
  • 1
  • 1
  • You were on the right track using `unique_ptr`. Now you are risking memory leaks, using raw pointers. – Red.Wave Mar 12 '23 at 17:00
  • dont use answers for question updates. You can edit the question. Though please take care to not invalidate already posted answers. If this is supposed to be answer you should explain more how this answers the question – 463035818_is_not_an_ai Mar 16 '23 at 14:36
  • Hi @463035818_is_not_a_number, sorry new at using Stack Overflow and it was suggested to me to answer my own question. – r3dstar Mar 17 '23 at 15:41
  • no worries, its completely fine to answer your own question. THough this reads as if it is supposed to be an update to the quesiton rather than an answer – 463035818_is_not_an_ai Mar 17 '23 at 15:46