1

Consider the following code with 3 different versions of file size computation.

#include <iostream>
#include <cstdio>
#include <string>
#include <fstream>

inline long long int filesize1(const std::string& filename)
{
    std::ifstream filestream(filename.c_str(), std::ios::binary);
    std::streampos first = 0;
    std::streampos last = 0;
    long long int size = -1;
    if (filestream.is_open()) {
        filestream.seekg(0, std::ios::beg);
        first = filestream.tellg();
        filestream.seekg(0, std::ios::end);
        last = filestream.tellg();
        if ((first != -1) && (last != -1) && (last-first >= 0)) {
            size = last-first;
        }
        filestream.close();
    }
    return size;
}

inline long long int filesize2(const std::string& filename)
{
    std::ifstream filestream(filename.c_str(), std::ios::binary);
    return (filestream) ? (static_cast<long long int>(filestream.seekg(0, std::ios::end).tellg()-filestream.seekg(0, std::ios::beg).tellg())) : (-1LL);
}

inline long long int filesize3(const std::string& filename)
{
    std::FILE* file = std::fopen(filename.c_str(), "rb");
    long long int size = -1;
    if (file) {
        std::fseek(file, 0, SEEK_END);
        size = std::ftell(file);
        std::fclose(file);
    }
    return size;
}

int main(int argc, char* argv[])
{
    unsigned int n = 0;
    switch (std::atoi(argv[1])) {
        case 1: for (int i = 0; i < std::atoi(argv[3]); ++i) n += filesize1(argv[2]); break;
        case 2: for (int i = 0; i < std::atoi(argv[3]); ++i) n += filesize2(argv[2]); break;
        case 3: for (int i = 0; i < std::atoi(argv[3]); ++i) n += filesize3(argv[2]); break;
    }
    std::cout<<n<<std::endl;
    return 0;
}

First question : do I have the guarantee in all cases to have the same result for the 3 different versions ?

Second question : why the version 3 in C is approximately 2 times slower than the first two versions ?

Vincent
  • 57,703
  • 61
  • 205
  • 388
  • 3
    Why not use stat() to get st_size instead? – brian beuning May 13 '13 at 01:05
  • @brianbeuning: because I work on heterogeneous system and I can only use standard C/C++. – Vincent May 13 '13 at 01:07
  • while stat is a good way to do it, the question is valid on its own as just trying to understand what's going on. However, without seeing the timing code, it's difficult to tell what's going on, since they should be approximately the same speed. Also, 3x worse could mean .000001 vs .000003 seconds. In which case the timings are probably invalid. – xaxxon May 13 '13 at 01:07
  • @MitchWheat The timing code is only the unix time function. For 10 million function execution, the two first versions take approximately 40 seconds, and the third one approximately 80 seconds. – Vincent May 13 '13 at 01:10
  • 1
    "*do I have the guarantee in all cases to have the same result for the 3 different versions ?*" How can we possibly answer that? We don't know what your needs are. If you need these functions to have the same result, then you need to do that. If you don't, then you don't. Unless you're writing this code for someone else, it's your choice. And if you are writing it for someone else, you could just ask them. – Nicol Bolas May 13 '13 at 02:26
  • For the speed issue couldn't you profile it and see what is taking so much time in the third one? You could get a percentage for each version and compare. – ChiefTwoPencils May 13 '13 at 03:36
  • 1
    Tis' indeed a shame `stat()` and `fstat()` are not within the purvey of your allowable API, especially since they have been defined since IEEE Std 1003.1-1988 (POSIX.1). Regarding your requested guarantee for your first question, you're only "guaranteed" what the standard says and your implementations declared conformance therein. Nothing more. Regarding question #2, that is a somewhat unilateral claim when your current test-bed is your only basis for making it. Without peeking into your RT and the system calls therein, you'll be hard-pressed to find a definitive answer. – WhozCraig May 13 '13 at 04:30

0 Answers0