grep string with alphanumeric and special character with a condition of 10 or more chars

Question

We are trying to scan the list of files for the password. As per our requirement, password should contain alphanumeric with special character.

Please help me with why this regex is not working ((\w*)([$%*@#]+)(\w+)){10,}

Note: I will be using this regex in Linux environment

Conditions to match:

1) Minimum 10 character
2) Should contain 1 special character 
3) Should contain 1 Numerical character
4) should contain 1 Alphabetic character

It is not a valid POSIX regex. Show how you are going to use it please. — Wiktor Stribiżew, Oct 28 '21 at 11:52
Example file: below is the actual file content ```Hi this is my password Password@1234``` and I have to extract Password@123 from this file using command like this```grep -E "((\w*)([$%*@#]+)(\w+)){10,}"``` — Jenifer P, Oct 28 '21 at 12:29
It looks like it [does not match in any environment](https://regex101.com/r/xBCSBM/1) — Wiktor Stribiżew, Oct 28 '21 at 12:35
Please provide enough code so others can better understand or reproduce the problem. — Community, Oct 28 '21 at 15:18

score 0 · Answer 1 · answered Oct 28 '21 at 12:31

Your regex is matching firstly zero or many word characters (\w*), then at least one but possibly many special characters ([$%*@#]+), then at least one but possibly many word characters (\w+). Whatever that matches, you're then attempting to match that exact same string at least 10 times (...){10,}. So, for example, you could have abc$%def which would match the outermost parantheses, but to match the full expression you'd then need that string repeated at least 10 times, like this abc$%defabc$%defabc$%defabc$%defabc$%defabc$%defabc$%defabc$%defabc$%defabc$%def

I doubt this is what you're after :)

It's quite hard to understand exactly what the requirement is, but it looks like there are a few possibilities:

Match a string of 10 characters which are a mixture of alphanumeric and certain special characters. This is quite a simple one, and a regex to achieve this might be as follows:

[\w$&*@#]{10}

The problem with the above is that it does not require a special character to always be present.

The key part might be that there must be at least on special character within a password of exactly 10 digits. To achieve this, you could do something like this:

\w{0,9}[$&*@#][\w$&*@#]+

This works as follows - we know that there must be at least one special character, and we know that the password is 10 characters long. Therefore, there can be between 0 and 9 consecutive \w characters initially. After that, there MUST be a special character. Then, after that special character, there can be either \w characters OR special characters. The above regex does not enforce the exact length of 10 characters however.

To achieve the exact length, you might have to be explicit about the lengths, which may start getting messy. For example:

(\w{9}[$&*@#]|\w{8}[$&*@#][\w$&*@#]{1}|\w{7}[$&*@#][\w$&*@#]{2}|\w{6}[$&*@#][\w$&*@#]{3}|\w{5}[$&*@#][\w$&*@#]{4}|\w{4}[$&*@#][\w$&*@#]{5}|\w{3}[$&*@#][\w$&*@#]{6}|\w{2}[$&*@#][\w$&*@#]{7}|\w{1}[$&*@#][\w$&*@#]{8}|[$&*@#][\w$&*@#]{9})

In essence here we are using many regular expressions for each of the combinations of lengths of the particular parts of the expression - e.g., \w{4}[$&*@#][\w$&*@#]{5} would be the case of matching exactly four \w characters, then a special one, then five word or special characters.

You may also want to consider whether a two-stage process would work better in this instance. You could go with a simple imperfect example which includes results without special characters (my first example), and then query the resulting set to filter only the passwords which do indeed contain at least one of the special characters.

A bit more detail around the exact rules would certainly be helpful.

Thanks William. Actually, my pattern should match this condition ```1) Minimum 10 character, 2) Should contain 1 special character, 3) Should contain 1 Numerical character, 4) should contain 1 Alphabetic character ``` — Jenifer P, Oct 28 '21 at 13:06

score 0 · Answer 2 · edited Oct 28 '21 at 16:43

Given the clarification around the rules, and given that the environment is linux and we're using grep, this helps a lot to provide a better answer! :)

The way I would now approach this problem is not with a single regex. The rules are too complicated for this to be elegantly solved with a single simple regex. However, a good starting point it this (assuming the source file is pass.txt):

grep -E -o '[a-zA-Z0-9_$%*@#]{10,}' ./pass.txt

-E for the uninitiated means Extended regex, which essentially means that more regex features such as {} no longer require escaping, so that it's easier to read without all the extra slashes.

-o returns Only the matching part of the file, rather than returning the whole line.

Note the use of single quotes, which is helpful due to the $ character which within double quotes would be interpreted as the start of a variable name. Single quotes means it is treated as a literal.

The flaw with the above regex is that you will match many false positives, such as the last three of the following examples:

Password@1234
sffa##1233P
Moose**F00!d
Dollar$$01234
Dollar$$NoNum
NothingSpecial123
123#@#@123456

Where Dollar$$NoNum has no numbers, NothingSpecial123 has no special characters, and 123#@#@123456 has no alphabetic characters.

However, we can filter out these false positives by using the pipe (|) character to chain many grep commands together, and filter out the items that don't have required properties.

For example, to filter out items which do not contain alphabetic characters, we can use the following:

grep -E -o '[a-zA-Z0-9_$%*@#]{10,}' ./pass.txt | grep -E -v '^[^a-zA-Z]+$'

Noticing that we used -o in the first grep, we can now be explicit about matching the start and end of the password by starting with ^ and ending with $. The match itself is an inverted character class ([^.....]), which will match any text that is NOT specified in the square brackets. For example, [^a] will match any character which is NOT a, so would match b for instance. In our example here, we are matching anything which is NOT an alphabetic character. Because we are also matching the start and end of the password, if we hit a match, then we know we have a password here which is comprised entirely of text which is NOT alphabetic, which violates rule #4 in that it should contain an alphabetic character.

This however does the opposite of what we want - this will FIND the matches which DO NOT have the alphabetic character. Grep rather helpfully allows us to invert the output with -v though, which is exactly what we want. Consequently the output of the above will filter out matches which do not contain an alphabetic character.

Applying the same principle to the other rules, we get the following final grep command:

grep -E -o '[a-zA-Z0-9_$%*@#]{10,}' ./pass.txt | grep -E -v '^[^a-zA-Z]+$' | grep -E -v '^[^0-9]+$' | grep -E -v '^[^$%*@#]+$'

The filtered output of the previous grep command feeds into the next filter, and the by the end of them all we've removed all the false positives.

grep string with alphanumeric and special character with a condition of 10 or more chars

2 Answers2