-4

I'm using

code -

 grep -Ff list.txt C:/data/*.txt > found.txt

but it keeps outputting invalid responses, lines don't contain the emails i input..

list.txt contains -

email@email.com
customer@email.com
imadmin@gmail.com
newcustomer@email.com
helloworld@yes.com

and so on.. email to match on each line,

search files contain -

user1:phonenumber1:email@email.com:last-active:recent
user2:phonennumber2:customer@email.com:last-active:inactive
user3:phonenumber3:blablarandom@bla.com:last-active:never

then another may contain -

blublublu         email@email.com         phonenumber         subscribed
nanananana        customer@email.com      phonenumber         unsubscribed
useruser          noemailinput@noemail.com       phonenumber      pending

so what I'm trying to do is present grep with a list of emails/list of strings " list.txt " and to then search the directory provided for matches of each string and output the entire line that contains each match.

example of output in this case would be -

user1:phonenumber1:email@email.com:last-active:recent
user2:phonennumber2:customer@email.com:last-active:inactive
blublublu         email@email.com         phonenumber         subscribed
nanananana        customer@email.com      phonenumber         unsubscribed

yet it wouldn't output the other two lines -

 user3:phonenumber3:blablarandom@bla.com:last-active:never
 useruser          noemailinput@noemail.com       phonenumber      pending

because no string is within that line.

jake reading
  • 79
  • 1
  • 1
  • 5

2 Answers2

0

I think your file list.txt may have blank lines in it, causing it to match every line in the files specified with C:/data/*.txt. To fix you can either manually delete every empty line or run the command sed -i '/^$/d' list.txt where the -i flag edits the file in place.

The issue may also be related to dos carriage returns, try running: cat -v list.txt and checking if the lines are followed by ^M:

email@email.com^M
customer@email.com^M

If this is the case you will need to amend the file using either dos2unix or tr -d '\r' < list.txt > output.txt.

Thomas Smyth - Treliant
  • 4,993
  • 6
  • 25
  • 36
  • it's still producing unmatched lines, is it possible that the files being searched are causing the issue? OR potentially special characters? so like " . ", " - ", "_" etc – jake reading Jan 06 '18 at 23:26
  • I don't think grep has an issue with them and I can only seem to recreate your issue if I have empty lines. Maybe you could try using `grep -wFf`? – Thomas Smyth - Treliant Jan 06 '18 at 23:29
  • I just tried with my examples above and works fine, but for the actual content I want to match with it produces false lines.. :S is it possibly caused by the content I'm searching? – jake reading Jan 06 '18 at 23:35
  • Added an additional comment about special characters, hope it fixes your issue. – Thomas Smyth - Treliant Jan 07 '18 at 19:50
  • 1
    @jakereading how are we supposed to help you debug a problem that doesn't exist with the sample input you provided? Wouldn't it make sense when posting sample input/output to provide data that **does** reproduce your problem rather than data that **doesn't** reproduce it? "My car won't start, please take a look at this bicycle and tell me what's wrong". – Ed Morton Jan 08 '18 at 12:53
0

The file list.txt probably contains empty lines or some of the separators. When I added : to list.txt, all the lines from the first sample started to match. Similarly, adding a space made all the lines from the second sample match. Adding @ causes the same symptoms.

Try running grep -oFf ... (if your grep supports -o) to see the exact matching parts. If there are empty lines in list.txt, the number of matches will be less than the number of matches without -o. Try searching the output of -o for extremely short outputs to check for suspicious strings. You can also examine the shortest lines in list.txt.

while read line ; do echo ${#line} "$line" ; done < list.txt | sort -nk1,1
choroba
  • 231,213
  • 25
  • 204
  • 289