1

I would like to grab the first 4 characters of two words using RegEx. I have some RegEx experinece however a search did not yeild any results.

So if I have Awesome Sauce I would like the end result to be AwesSauc

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
JA1
  • 538
  • 2
  • 7
  • 21
  • 3
    Do you need to use Regex? This could be done in 1-line of Linq: `String.Join( " ", input.Split( null, StringSplitOptions.RemoveEmptyEntries ).Select( w => w.Substring( 0, Math.Min( 4, w.Length ) )`. – Dai Apr 11 '17 at 05:51
  • @Dai Just because OP has *some RegEx experinece* – Nikhil Vartak Apr 11 '17 at 06:00
  • I'm using Nintex to complete the operation so I'm bound by RegEx – JA1 Apr 11 '17 at 06:02
  • can there be also more or less than 2 words? If more - should the RegEx take the first 4 characters of every word? – fubo Apr 11 '17 at 06:06

5 Answers5

2

Use the Replace Text action with the following parameters:

Pattern: \W*\b(\p{L}{1,4})\w*\W*
Replacement text: $1

See the regex demo.

Pattern details:

  • \W* - 0+ non-word chars (trim from the left)
  • \b - a leading word boundary
  • (\p{L}{1,4}) - Group 1 (later referred to via $1 backreference) matching any 1 to 4 letters (incl. Unicode ones)
  • \w* - any 0+ word chars (to match the rest of the word)
  • \W* - 0+ non-word chars (trim from the right)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

I think this RegEx should do the job

        string pattern = @"\b\w{4}";
        var text = "The quick brown fox jumps over the lazy dog";

        Regex regex = new Regex(pattern);
        var match = regex.Match(text);
        while (match.Captures.Count != 0)
        {
            foreach (var capture in match.Captures)
            {
                Console.WriteLine(capture);
            }
            match = match.NextMatch();
        }

        // outputs:
        // quic
        // brow
        // jump
        // over
        // lazy

Alternatively you could use patterns like:

        \b\w{1,4} => The, quic, brow, fox, jump, over, the, lazy, dog
        \b[\w|\d]{1,4} => would also match digits

Update: added a full example for C# and modified the pattern slightly. Also added some alternative patterns.

MiGro
  • 471
  • 1
  • 4
  • 17
0

one approach with Linq

var res = new string(input.Split().SelectMany((x => x.Where((y, i) => i < 4))).ToArray());
Toshi
  • 2,532
  • 4
  • 17
  • 45
0

Using regex would in fact be more complex and totally unnecessary for this case. Just do it as either of the below.

var sentence = "Awesome        Sau";

// With LINQ
var linqWay = string.Join("", sentence.Split(" ".ToCharArray(), options:StringSplitOptions.RemoveEmptyEntries).Select(x => x.Substring(0, Math.Min(4,x.Length))).ToArray());

// Without LINQ
var oldWay = new StringBuilder();
string[] words = sentence.Split(" ".ToCharArray(), options:StringSplitOptions.RemoveEmptyEntries);
foreach(var word in words) {
    oldWay.Append(word.Substring(0, Math.Min(4, word.Length)));
}

Edit:

Updated code based on @Dai's comment. Math.Min check borrowed as is from his suggestion.

Nikhil Vartak
  • 5,002
  • 3
  • 26
  • 32
  • 3
    This code will crash if any word in the input string is shorter than 4 characters. It also doesn't handle multiple contiguous whitespace characters - or handle non-space whitespace. – Dai Apr 11 '17 at 06:03
  • @Dai - Great catch. Thanks a lot. See edit. Seems OP is restricted to Regex. – Nikhil Vartak Apr 11 '17 at 06:18
  • I suggest using `.Split( null, StringSplitOptions.RemoveEmptyEntries )` as that splits on all whitespace characters, not just spaces. – Dai Apr 11 '17 at 06:44
0

Try this expression

\b[a-zA-Z0-9]{1,4}
Ravinder Gujiri
  • 1,494
  • 10
  • 14