1

I need to tokenize (' ','\n','\t' as delimiter) a text with somethink like

std::string text = "foo   bar";
boost::iterator_range<std::string::iterator> r = some_func_i_dont_know(text);

Later I want to get output with:

for (auto i: result)
    std::cout << "distance: " << std::distance(text.begin(), i.begin())
        << "\nvalue: " << i << '\n';

What produces with example above:

distance: 0
value: foo
distance: 6
value: bar

Thanks for any help.

user1587451
  • 978
  • 3
  • 15
  • 30
  • Use a [`std::istringstream`](http://en.cppreference.com/w/cpp/io/basic_istringstream) to extract the tokens using the `std::istream& operator>>(std::istream&, std::string&)` operator. – πάντα ῥεῖ Oct 15 '14 at 15:05
  • I don't understand you well, do you have an example? I need the iterator ranges as well. – user1587451 Oct 15 '14 at 15:06
  • Extract the tokens to e.g. a `std::vector` first, take the `iterator_range` from that one then. – πάντα ῥεῖ Oct 15 '14 at 15:08
  • An iterator range is formed from two iterators. To get two iterators, you'll need a range. You can't just pop an `iterator_range` from thin air. So, your first question should be "how do I tokenize a string"? Then, "where do I store results?". And you'll probably figure it out by then... – jrok Oct 15 '14 at 15:10
  • If I tokenize a string into std::vector the distance is lost. I need the distance from beginning (0) to every token, let's say bar is 6 char's away from 0 in example above – user1587451 Oct 15 '14 at 15:14
  • Somethink like: std::string s = "foo bar"; boost::iterator_range r = boost::algorithm::?????(s, " \t\n"); – user1587451 Oct 15 '14 at 15:17
  • Seeing your other question, I think I smell an [XY Problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) here. If you're trying to parse according to a grammar, have you considered using [tag:boost-spirit] to make your solution more high-level? – sehe Oct 16 '14 at 16:57
  • Yes, but it's a bit too complex for spirit as I want to check the levenshtein distance to a given string/char*. I want to search a txt file and output the best 10 matches with col/row information. – user1587451 Oct 16 '14 at 17:00

1 Answers1

2

I would not use the ancient Tokenizer here. Just use String Algorithm's split offering:

Live On Coliru

#include <boost/algorithm/string.hpp>
#include <iostream>

using namespace boost;

int main()
{
    std::string text = "foo   bar";
    boost::iterator_range<std::string::iterator> r(text.begin(), text.end());

    std::vector<iterator_range<std::string::const_iterator> > result;
    algorithm::split(result, r, is_any_of(" \n\t"), algorithm::token_compress_on);

    for (auto i : result)
        std::cout << "distance: " << distance(text.cbegin(), i.begin()) << ", "
                  << "length: " << i.size() << ", "
                  << "value: '" << i << "'\n";
}

Prints

distance: 0, length: 3, value: 'foo'
distance: 6, length: 3, value: 'bar'
sehe
  • 374,641
  • 47
  • 450
  • 633
  • @user1587451 gah. I had just answered that other question when it was deleted ;( Anyhoops: http://coliru.stacked-crooked.com/a/eb1a6352ddaca567 (I can't fit the explanation here) – sehe Oct 16 '14 at 16:44
  • Sehe, you are too fast :) I've got it running a minute after posting second post. So I deleted. There's one question about your second code. Why is the last output "distance: 614, length: 0, value: ''" the last output? Is there some escape sequence missing in is_any_of? – user1587451 Oct 16 '14 at 16:56
  • That's the way `split` works, apparently. When compressing tokens, it still yields the trailing "empty" token (probably when the input had a trailing delimiter, in this case a newline character)? – sehe Oct 16 '14 at 16:59