Find files containing a given text

Question

In bash I want to return file name (and the path to the file) for every file of type .php|.html|.js containing the case-insensitive string "document.cookie" | "setcookie"

How would I do that?

Have you considered just using grep? http://www.cyberciti.biz/faq/grep-in-bash/ — Terrance, May 27 '11 at 13:52
This title is fairly misleading. "find-files-containing-a-given-text" — Josh C, Dec 09 '19 at 03:57

score 257 · Accepted Answer · edited Aug 18 '19 at 19:23

257

egrep -ir --include=*.{php,html,js} "(document.cookie|setcookie)" .

The r flag means to search recursively (search subdirectories). The i flag means case insensitive.

If you just want file names add the l (lowercase L) flag:

egrep -lir --include=*.{php,html,js} "(document.cookie|setcookie)" .

edited Aug 18 '19 at 19:23

Andrew Schwartz

4,440
3
25
58

answered May 27 '11 at 14:03

bear24rw

4,367
2
20
10

2

that didn't seem to work for me(at least not on mac)....just hangs... egrep -lir --include=* "repo" egrep: warning: recursive search of stdin – Dean Hiller Apr 02 '14 at 14:18
21

You forgot to add the path to search. The path is '.' in the above example. In your case, the script is waiting for the input to search on stdin. Try: egrep -lir --include=* "repo" / (or any other path) – LodeRunner May 06 '14 at 16:47
2

`grep -E ... ` > `egrep ...` – Aman Jun 27 '14 at 18:51
I got error `grep: (error|fail): No such file or directory` on Ubuntu Desktop 16; any hints? – Nam G VU Jul 24 '17 at 10:32
2

For me to make this working, I had to skip the * with \. so I have `--include=\*.{php,html,js}` – Mehrad Mahmoudian Apr 17 '18 at 12:19
Mask didn worked for me there as above: grep -lir --include=*.{php} "mongo" . , my solution: grep -lir --include="*.php" "mongo" . – mishaikon Jun 26 '19 at 10:00
On Mac: egrep -ir --include=\\*.{php,html,js} "(document.cookie|setcookie)" . – Aidar Gatin Jun 23 '20 at 16:37
Several of the previous commenters have actually pointed to the same issue with grep vs egrep. As @Aman has mentioned above, egrep is same as 'grep -E'. With grep, '*' does not need to be escaped but the '|' does, and with egrep, it's the opposit. – NurShomik Jan 27 '21 at 05:24

score 74 · Answer 2 · edited Nov 06 '18 at 17:35

74

Try something like grep -r -n -i --include="*.html *.php *.js" searchstrinhere .

the -i makes it case insensitlve

the . at the end means you want to start from your current directory, this could be substituted with any directory.

the -r means do this recursively, right down the directory tree

the -n prints the line number for matches.

the --include lets you add file names, extensions. Wildcards accepted

For more info see: http://www.gnu.org/software/grep/

edited Nov 06 '18 at 17:35

kkonrad

1,262
13
32

answered May 27 '11 at 13:57

Raoul

3,849
3
24
30

6

Or perhaps use the `-l` option (just print filenames that match) instead of `-n` – glenn jackman May 27 '11 at 14:03

Michael Berkowski · Answer 3 · 2011-05-27T14:05:50.077

18

find them and grep for the string:

This will find all files of your 3 types in /starting/path and grep for the regular expression '(document\.cookie|setcookie)'. Split over 2 lines with the backslash just for readability...

find /starting/path -type f -name "*.php" -o -name "*.html" -o -name "*.js" | \
 xargs egrep -i '(document\.cookie|setcookie)'

edited May 27 '11 at 14:05

answered May 27 '11 at 13:58

Michael Berkowski

267,341
46
444
390

1

Like universal usage of find, but to my mind better to use `-exec grep -l 'sth' {} \;` – NGix Nov 26 '12 at 18:28
Thanks @Michael Berkowski This way fastest more than 5 or 8 times `# egrep -ir --include=file.foo "(foo|bar)" /dir` on ~500Gb weigth directory. – Qh0stM4N Jan 24 '18 at 13:55

Fredrik Pihl · Answer 4 · 2011-05-27T14:03:32.607

14

Sounds like a perfect job for grep or perhaps ack

Or this wonderful construction:

find . -type f \( -name *.php -o -name *.html -o -name *.js \) -exec grep "document.cookie\|setcookie" /dev/null {} \;

edited May 27 '11 at 14:03

answered May 27 '11 at 13:54

Fredrik Pihl

44,604
7
83
130

+1 Using `-exec grep...` is better than my `xargs` method because it won't choke on spaces in filenames. – Michael Berkowski May 27 '11 at 14:09
@MichaelBerkowski : You can use it like this to deal with whitespace in filenames: `find . -type f -print0 | xargs -0 -I {} grep "search_string" {}`. Of course, the other options can be added as well. – Pascal Jul 01 '19 at 11:00

nos · Answer 5 · 2011-05-27T14:05:42.077

5

find . -type f -name '*php' -o -name '*js' -o -name '*html' |\
xargs grep -liE 'document\.cookie|setcookie'

edited May 27 '11 at 14:05

answered May 27 '11 at 14:03

nos

223,662
58
417
506

score 4 · Answer 6 · answered Oct 31 '15 at 20:09

Just to include one more alternative, you could also use this:

find "/starting/path" -type f -regextype posix-extended -regex "^.*\.(php|html|js)$" -exec grep -EH '(document\.cookie|setcookie)' {} \;

Where:

-regextype posix-extended tells find what kind of regex to expect
-regex "^.*\.(php|html|js)$" tells find the regex itself filenames must match
-exec grep -EH '(document\.cookie|setcookie)' {} \; tells find to run the command (with its options and arguments) specified between the -exec option and the \; for each file it finds, where {} represents where the file path goes in this command.

while
- E option tells grep to use extended regex (to support the parentheses) and...
- H option tells grep to print file paths before the matches.

And, given this, if you only want file paths, you may use:

find "/starting/path" -type f -regextype posix-extended -regex "^.*\.(php|html|js)$" -exec grep -EH '(document\.cookie|setcookie)' {} \; | sed -r 's/(^.*):.*$/\1/' | sort -u

Where

| [pipe] send the output of find to the next command after this (which is sed, then sort)
r option tells sed to use extended regex.
s/HI/BYE/ tells sed to replace every First occurrence (per line) of "HI" with "BYE" and...
s/(^.*):.*$/\1/ tells it to replace the regex (^.*):.*$ (meaning a group [stuff enclosed by ()] including everything [.* = one or more of any-character] from the beginning of the line [^] till' the first ':' followed by anything till' the end of line [$]) by the first group [\1] of the replaced regex.
u tells sort to remove duplicate entries (take sort -u as optional).

...FAR from being the most elegant way. As I said, my intention is to increase the range of possibilities (and also to give more complete explanations on some tools you could use).

Find files containing a given text

6 Answers6

Linked