6

My program is supposed to create two files with user-specified paths. I need to know if the paths lead to the same location, to end with an error before I start changing the filesystem.

Because the paths come from the user, they are expected to be non-canonical and weird. For example they could be ./dir1/subdir/file and dir2/subdir/../subdir/file where dir2 is a symlink to dir1 and subdir doesn't exist yet. The expected result is still true, they are equivalent.

The std::filesystem::equivalent works only on files that already exist. Is there any similar function without this limitation?

Jarod42
  • 203,559
  • 14
  • 181
  • 302
Piotr Siupa
  • 3,929
  • 2
  • 29
  • 65
  • You could create the 1st file, then check if the 2nd file is the same, then delete the file if so – Hong Ooi Jun 19 '22 at 16:29
  • @HongOoi That could work. It would be pretty slow, though, especially with bigger number of files. Also, I need to remember to remove the created directories too. Doable but unwieldy. – Piotr Siupa Jun 19 '22 at 16:33
  • 1
    You probably want to compare the so-called canonical paths. – Ulrich Eckhardt Jun 19 '22 at 16:38
  • 1
    It is impossible to tell unless you know how the *specific* filesystems you are going to create your files and directories in work. Is `foo` equivalent to `FOO`, `Foo` or `fOo`? We don't know. Is `averylongmeaninglessname.exe` equivalent to `AVERYL~1.EXE` or `AVERYL~2.EXE` or `AVERYL~3.EXE`? We don't know. Is `Caf\xc3\xa9` (rendered "Café") equivalent to `Cafe\xcc\x81` (rendered "Café")? We don't know! – n. m. could be an AI Jun 19 '22 at 16:40
  • Add `./` to be beginning of non-absolute paths and then call `weakly_canonical` repeatedly until the path stops changing? – Miles Budnek Jun 19 '22 at 16:48
  • @MilesBudnek Huh? Once would not be enough? – Piotr Siupa Jun 19 '22 at 16:50
  • 2
    @NO_NAME The first `weakly_canonical` won't resolve symlinks in the part of the path beyond the existing part, so assuming `/foo` exists, but `/foo/bar` doesn't `/foo/bar/../baz` would return `/foo/baz` even if `/foo/baz` is a symlink to `/qux`. You need a second invocation of `weakly_canonical` to resolve that. – Miles Budnek Jun 19 '22 at 16:54
  • @n.1.8e9-where's-my-sharem.: *"It is impossible to tell unless you know how the specific filesystems you are going to create your files and directories in work."*. I don't get your point, it seems equivalent to say that it is impossible to know the size of `int` or signess of `char`. Yes it might depend of some factors, but that seems manageable too. – Jarod42 Jun 19 '22 at 17:36
  • @Jarod42 it is not reasonable to expect from an std::filesystem implementation to duplicate the logic of every filesystem in the world on order to be able to solve this very local problem. – n. m. could be an AI Jun 19 '22 at 17:39
  • Equivalence of paths seems not a local problem IMO. I might underestimate the complexity of the expected function though. – Jarod42 Jun 19 '22 at 17:51
  • Furthermore, whether two paths are equivalent may depend on what else is present on the filesystem, as well as on other unknown and potentially expensive to check things. – n. m. could be an AI Jun 19 '22 at 17:51

3 Answers3

3

I would use std::filesystem::absolute and then std::filesystem::weakly_canonical on the result.

namespace fs = std::filesystem;

auto fullpath1 fs::weakly_canonical(fs::absolute(path1));
auto fullpath2 fs::weakly_canonical(fs::absolute(path2));

if(fullpath1 == fullpath2) {
    //
}

Demo

Note: For std::filesystem::absolute, implementations are encouraged to not consider a non-existing path to be an error, but implementations may still do. It works in the most current releases of g++, clang++ and MSVC though.

Predelnik
  • 5,066
  • 2
  • 24
  • 36
Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
  • Why is `std::filesystem::absolute` needed? According to https://en.cppreference.com/w/cpp/filesystem/canonical, it returns "An absolute path that resolves to the same file as std::filesystem::absolute(p)." – Piotr Siupa Jun 19 '22 at 16:54
  • 2
    @NO_NAME The `absolute` is needed to handle a case where the user enters a bare path (no leading `./`) where the first part doesn't exist. I.e. if `./foo` does not exist, and the user enters just `foo`. Since `weakly_canonical` only converts the existing part to an absolute path, and there is no existing part. – Miles Budnek Jun 19 '22 at 16:59
  • FYI: `absolute` is allowed to consider the path not existing be an error. You cannot trust this to work. – Nicol Bolas Jun 19 '22 at 18:59
  • @NicolBolas That's a shame. I updated the answer slightly to make that clear. – Ted Lyngmo Jun 19 '22 at 20:50
1

This is a surprisingly difficult problem to solve, and no single standard library function will do it.

There are several cases that you need to worry about:

  • Relative paths with an initial ./
  • Bare relative paths without a initial ./
  • Symlinks in the "non-existing" part of a path
  • Case-sensitivity of different filesystems
  • Almost certainly more that I didn't think of

std::filesystem::weakly_canonical will get you part of the way there, but it won't quite get there by itself. For instance, it doesn't address cases when a bare relative path doesn't exist (i.e. foo won't canonicalize to the same thing as ./foo) and it doesn't even try to address case-sensitivity.

Here's a canonicalize function that will take all of that into account. It still has some shortcomings, mainly around non-ASCII characters (i.e. the case-normalization doesn't work for 'É'), but it should work in most cases:

namespace fs = std::filesystem;

std::pair<fs::path, fs::path> splitExistingNonExistingParts(const fs::path& path)
{
    fs::path existingPart = path;
    while (!existingPart.empty() && !fs::exists(existingPart)) {
        existingPart = existingPart.parent_path();
    }
    return {existingPart, fs::relative(path, existingPart)};
}

fs::path toUpper(const fs::path& path)
{
    const fs::path::string_type& native = path.native();
    fs::path::string_type lower;
    lower.reserve(native.length());
    std::transform(
        native.begin(),
        native.end(),
        std::back_inserter(lower),
        [](auto c) { return std::toupper(c, std::locale()); }
    );
    return lower;
}

fs::path toLower(const fs::path& path)
{
    const fs::path::string_type& native = path.native();
    fs::path::string_type lower;
    lower.reserve(native.length());
    std::transform(
        native.begin(),
        native.end(),
        std::back_inserter(lower),
        [](auto c) { return std::tolower(c, std::locale()); }
    );
    return lower;
}

bool isCaseSensitive(const fs::path& path)
{
    // NOTE: This function assumes the path exists.
    //       fs::equivalent will throw if that isn't the case

    fs::path upper = path.parent_path() / toUpper(*(--path.end()));
    fs::path lower = path.parent_path() / toLower(*(--path.end()));

    bool exists = fs::exists(upper);
    if (exists != fs::exists(lower)) {
        // If one exists and the other doesn't, then they
        // must reference different files and therefore be
        // case-sensitive
        return true;
    }

    // If the two paths don't reference the same file, then
    // the filesystem must be case-sensitive
    return !fs::equivalent(upper, lower);
}

fs::path normalizeCase(const fs::path& path)
{
    // Normalize the case of a path to lower-case if it is on a
    // non-case-sensitive filesystem

    fs::path ret;
    for (const fs::path& component : path) {
        if (!isCaseSensitive(ret / component)) {
            ret /= toLower(component);
        } else {
            ret /= component;
        }
    }
    return ret;
}

fs::path canonicalize(fs::path path)
{
    if (path.empty()) {
        return path;
    }

    // Initial pass to deal with .., ., and symlinks in the existing part
    path = fs::weakly_canonical(path);

    // Figure out if this is absolute or relative by assuming that there
    // is a base path component that will always exist (i.e. / on POSIX or
    // the drive letter on Windows)
    auto [existing, nonExisting] = splitExistingNonExistingParts(path);
    if (!existing.empty()) {
        existing = fs::canonical(fs::absolute(existing));
    } else {
        existing = fs::current_path();
    }

    // Normalize the case of the existing part of the path
    existing = normalizeCase(existing);

    // Need to deal with case-sensitivity of the part of the path
    // that doesn't exist.  Assume that part will have the same
    // case-sensitivity as the last component of the existing path
    if (!isCaseSensitive(existing)) {
        path = existing / toLower(nonExisting);
    } else {
        path = existing / nonExisting;
    }

    // Call weakly_canonical again to deal with any existing symlinks that were
    // hidden by .. components after non-existing path components
    fs::path temp;
    while ((temp = fs::weakly_canonical(path)) != path) {
        path = temp;
    }
    return path;
}

Live Demo

Miles Budnek
  • 28,216
  • 2
  • 35
  • 52
  • It might be safer to call `is_absolute()` instead of assuming it won't have empty existing part. – Piotr Siupa Jun 19 '22 at 21:19
  • `isCaseSensitive` doesn't seem to work: https://godbolt.org/z/z1nPKv13a It also assumes that the last part of existing path has letters in it. – Piotr Siupa Jun 19 '22 at 21:35
  • @NO_NAME Fixed the bug that was assuming a non-empty existing part and changed `isCaseSensitive` to work when a component ends with a non-cased character. It's still not totally right. A non-existing path as the first component on a case-insensitive filesystem will still be guessed as case-sensitive (i.e. on WSL, `/mnt/c/doesNotExist` would be guessed to be case-sensitive even though it isn't), but that just goes to demonstrate that this is a very non-trivial problem to solve. – Miles Budnek Jun 20 '22 at 04:50
  • "changed isCaseSensitive to work when a component ends with a non-cased character" - no, I don't think you have. It still returns `false` for the path `./123` in a case sensitive file system. I think there should be a condition `lower == upper` which either assumes it is case sensitive or checks the parent directory. – Piotr Siupa Jun 20 '22 at 11:36
0

I compiled this answer from Ted Lyngmo's answer and Miles Budnek's comments.

What you need to do is normalize your paths to remove all ., .., symlinks and similar things that get in the way.

std::filesystem::weakly_canonical can do most of that, although, you may need to call it multiple times in case it tripped on some not-existent directory that obscured an existing one. (In your example dir2/subdir/../../dir2 would do it.) You call the function until the result ceases to change.

Before canonizing the path, you will also need to make sure that the path is absolute. std::filesystem::weakly_canonical does normally convert a path to absolute path but only if the first part of the original path exists. Otherwise it may not work correctly.

std::filesystem::path normalizePath(const std::filesystem::path &originalPath)
{
    using namespace std::filesystem;
    path currentPath;
    if (originalPath.is_absolute())
        currentPath = originalPath;
    else
        currentPath = std::filesystem::current_path() / originalPath;
    while(true)
    {
        path newPath = weakly_canonical(currentPath);
        if (newPath != currentPath)
            currentPath = newPath;
        else
            break;
    }
    return currentPath;
}

When this is done, you can just compare paths using the operator ==.

Demo

Piotr Siupa
  • 3,929
  • 2
  • 29
  • 65
  • FYI: `absolute` is allowed to consider the path not existing be an error (the standard *suggests* that it not do this, but the implementation is under no obligation to comply). You cannot trust this to work. – Nicol Bolas Jun 19 '22 at 19:00
  • @NicolBolas Good point. Updated the answer. – Piotr Siupa Jun 19 '22 at 19:34