0

I want to extract a maximum of N + 1 strings from a std::stringstream.

Currently, I have the following code (that needs to be fixed):

#include <iostream>
#include <iomanip>
#include <sstream>
#include <string>
#include <string_view>
#include <vector>
#include <iterator>
#include <ranges>
#include <algorithm>


int main( )
{
    const std::string_view sv { "   @a hgs  -- " };
    const size_t expectedTokenCount { 4 };

    std::stringstream ss;
    ss << sv;

    std::vector< std::string > foundTokens;
    foundTokens.reserve( expectedTokenCount + 1 );

    std::ranges::for_each( std::ranges::take_view { ss, expectedTokenCount + 1 }, [ &foundTokens ]( const std::string& token )
                                                                                  {
                                                                                    std::back_inserter( foundTokens );
                                                                                  } );

    if ( foundTokens.size( ) == expectedTokenCount )
    {
        // do something
    }

    for ( const auto& elem : foundTokens )
    {
        std::cout << std::quoted( elem ) << '\n';
    }
}

How should I fix it? Also, how should I use back_inserter to push_back the extracted strings into foundTokens?

digito_evo
  • 3,216
  • 2
  • 14
  • 42

1 Answers1

1

Note that the following aliases are in effect:

namespace views = std::views;
namespace rng = std::ranges;

There are a few issues and oddities here. First of all:

std::ranges::take_view { ss, expectedTokenCount + 1 }

It's conventional to use the std::views API:

ss | views::take(expectedTokenCount + 1)

The more glaring issue here is that ss is not a view or range. You need to create a proper view of it:

auto tokens = views::istream<std::string>(ss) | views::take(expectedTokenCount + 1);

Now for the other issue:

std::back_inserter( foundTokens );

This is a no-op. It creates a back-inserter for the container, which is an iterator whose iteration causes push_back to be called, but doesn't then use it.

While the situation is poor in C++20 for creating a vector from a range or view, here's one way to do it:

rng::copy(tokens, std::back_inserter(foundTokens));

Putting this all together, you can see a live example, but note that it might not be 100% correct—it currently compiles with GCC, but not with Clang.

As noted below, you can also make use of views::split to split the source string directly if there's a consistent delimiter between the tokens:

std::string_view delim = " ";
auto tokens = views::split(sv, delim);

However, you might run into trouble if your standard library hasn't implemented this defect report.

chris
  • 60,560
  • 13
  • 143
  • 205
  • "*While the situation is poor in C++20 for...*", so is it a bad practice? I mean what other container should I use to store the strings? I do not care about the tokenization method as long as it is fast. What can be a better alternative to `stringstream`? – digito_evo Mar 07 '22 at 05:45
  • 2
    In C++23 you can just simply do `auto foundTokens = views::istream(ss) | views::take(expectedTokenCount + 1) | ranges::to();` – 康桓瑋 Mar 07 '22 at 05:55
  • 1
    @digito_evo, No no, there's been work done to add something more suitable for constructing a container (see the other comment), but the details weren't ready in time for C++20. If you're after speed for the tokenization itself, you might want to look into a tokenizer library of some kind or perhaps build one for your use case. The standard streams notoriously involve virtual calls on every operation that can be difficult to optimize away. Just make sure it's something worth optimizing before putting in the effort. Special mention to `strtok`, but it has its own issues and I don't know its QOI. – chris Mar 07 '22 at 05:56
  • 2
    If your `sv` only contains whitespace then `views::split` may be a [more appropriate and efficient way](https://godbolt.org/z/sEPxrbsKb). – 康桓瑋 Mar 07 '22 at 06:00
  • 1
    Hah, I was just about to say, another special mention to [`views::split`](https://en.cppreference.com/w/cpp/ranges/split_view) now that it exists, but you'll have to measure to see if the speed works. – chris Mar 07 '22 at 06:01
  • @康桓瑋 Thanks. That sounds like a better option since it frees me from working with `std::stringstream`. However, how can I ensure that it covers all the whitespace characters? I checked the example on [cppreference](https://en.cppreference.com/w/cpp/ranges/split_view) and there it uses a delimiter. What should the delimiter for whitespace look like? I would appreciate it if you would write an answer. – digito_evo Mar 07 '22 at 06:18
  • @chris Oh nice. You have mentioned the `split`. – digito_evo Mar 07 '22 at 06:21