1

I'm really bad in regular expressions, so please help me.

I need to find in string any pieces like #text.

text mustn't contain any space characters (\\s). It's length must be at least 2 characters ({2,}), and it must contain at least 1 letter(QChar::isLetter()).

Examples:

  • #c, #1, #123456, #123 456, #123_456 are incorrect
  • #cc, #text, #text123, #123text are correct

I use QRegExp.

Ivan Akulov
  • 4,323
  • 5
  • 37
  • 64
  • What about `##text`? The second `#` is not a space character, so it passes the test. Should the regexp match the substring starting at the first or second `#` ? or perhaps both? – MSalters Jun 21 '12 at 09:00

4 Answers4

2
QRegExp rx("#(\\S+[A-Za-z]\\S*|\\S*[A-Za-z]\\S+)$");
bool result = (rx.indexIn(str) == 0);

rx either finds a non-whitespace followed by a letter and by an unspecified number of non-whitespace characters, or a letter followed by at least non-whitespace.

jogojapan
  • 68,383
  • 11
  • 101
  • 131
KCiebiera
  • 810
  • 7
  • 8
1

The shortest I could come up with (which should work, but I haven't tested extensively) is:

QRegExp("^#(?=[0-9]*[A-Za-z])[A-Za-z0-9]{2,}$");

Which matches:

  • ^ the start of the string
  • # a literal hash character
  • (?= then look ahead (but don't match)
    • [0-9]* zero or more latin numbers
    • [A-Za-z] a single upper- or lower-case latin letter
  • )
  • [A-Za-z0-9]{2,} then match at least two characters which may be upper- or lower-case latin letters or latin numbers
  • $ then find and consume the end of the line

Technically speaking though this is still wrong. It only matches latin letters and numbers. Replacing a few bits gives you:

QRegExp("^#(?=\\d*[^\\d\\s])\\w{2,}$");

This should work for non-latin letters and numbers but this is totally untested. Have a quick read of the QRegExp class reference for an explanation of each escaped group.

And then to match within larger strings of text (again, untested):

QRegExp("\b#(?=\\d*[^\\d\\s])\\w{2,}\b");

A useful tool is the Regular Expressions Example which comes with the SDK.

Samuel Harmer
  • 4,264
  • 5
  • 33
  • 67
1

Styne666 gave the right regex.

Here is a little Perl script which is trying to match its first argument with this regex:

    #!/usr/bin/env perl
    use strict;
    use warnings;
    my $arg = shift;
    if ($arg =~ m/(#(?=\d*[a-zA-Z])[a-zA-Z\d]{2,})/) {
        print "$1 MATCHES THE PATTERN!\n";
    } else {
        print "NO MATCH\n";
    }

Perl is always great to quickly test your regular expressions.

Now, your question is a bit different. You want to find all the substrings in your text string, and you want to do it in C++/Qt. Here is what I could come up with in couple of minutes:

    #include <QtCore/QCoreApplication>
    #include <QRegExp>
    #include <iostream>

    using namespace std;

    int main(int argc, char *argv[])
    {
        QString str = argv[1];
        QRegExp rx("[\\s]?(\\#(?=\\d*[a-zA-Z])[a-zA-Z\\d]{2,})\\b");

        int pos = 0;
        while ((pos = rx.indexIn(str, pos)) != -1)
        {
            QString token = rx.cap(1);
            cout << token.toStdString().c_str() << endl;
            pos += rx.matchedLength();
        }

        return 0;
    }

To make my test I feed it an input like this (making a long string just one command line argument):

    peter@ubuntu01$ qt-regexp "#hjhj  4324   fdsafdsa  #33e #22"

And it matches only two words: #hjhj and #33e.

Hope it helps.

Peter Al
  • 378
  • 1
  • 6
-1

use this regular expression. hope fully your problem will solve with given RE.

^([#(a-zA-Z)]+[(a-zA-Z0-9)]+)*(#[0-9]+[(a-zA-Z)]+[(a-zA-Z0-9)]*)*$
Umair Noor
  • 442
  • 4
  • 17