98

I want to grep for files containing the words Dansk, Svenska or Norsk on any line, with a usable returncode (as I really only like to have the info that the strings are contained, my one-liner goes a little further then this).

I have many files with lines in them like this:

Disc Title: unknown
Title: 01, Length: 01:33:37.000 Chapters: 33, Cells: 31, Audio streams: 04, Subpictures: 20
        Subtitle: 01, Language: ar - Arabic, Content: Undefined, Stream id: 0x20, 
        Subtitle: 02, Language: bg - Bulgarian, Content: Undefined, Stream id: 0x21, 
        Subtitle: 03, Language: cs - Czech, Content: Undefined, Stream id: 0x22, 
        Subtitle: 04, Language: da - Dansk, Content: Undefined, Stream id: 0x23, 
        Subtitle: 05, Language: de - Deutsch, Content: Undefined, Stream id: 0x24, 
(...)

Here is the pseudocode of what I want:

for all files in directory;
 if file contains "Dansk" AND "Norsk" AND "Svenska" then
 then echo the filename
end

What is the best way to do this? Can it be done on one line?

Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
Christian
  • 981
  • 1
  • 7
  • 3

17 Answers17

98

You can use:

grep -l Dansk * | xargs grep -l Norsk | xargs grep -l Svenska

If you want also to find in hidden files:

grep -l Dansk .* | xargs grep -l Norsk | xargs grep -l Svenska
fedorqui
  • 275,237
  • 103
  • 548
  • 598
vmpstr
  • 5,051
  • 2
  • 25
  • 25
  • 1
    Clever solution; one thing to note (generally speaking; not relevant to what the OP was asking for) is that the overall _exit code_ will be _0_ even in case of (conceptual) failure. Thus, if you were interested in determining failure vs. success, you'd either have to examine whether stdout output is empty or not, or employ @EddSteel's approach instead. – mklement0 Sep 20 '12 at 22:19
  • @mklement: In Bash, the `PIPESTATUS` array contains the exit values of the members of a pipeline. – Dennis Williamson Oct 03 '12 at 16:50
  • @DennisWilliamson That's good to know, thank you. Another option is to turn the `pipefail` shell option on (temporarily): `shopt -so pipefail` – mklement0 Oct 03 '12 at 20:19
  • 4
    You might want to use `grep -Z` and `xargs -0` if your filenames can contain spaces. – Ben Challenor Jun 25 '13 at 15:14
  • 1
    This can cause "Argument list too long" errors if you have many files. – AnnanFay May 12 '15 at 16:48
24
grep –irl word1 * | grep –il word2 `cat -` | grep –il word3 `cat -`
  • -i makes search case insensitive
  • -r makes file search recursive through folders
  • -l pipes the list of files with the word found
  • cat - causes the next grep to look through the files passed to it list.
fedorqui
  • 275,237
  • 103
  • 548
  • 598
Gerry
  • 241
  • 2
  • 2
24

Yet another way using just bash and grep:

For a single file 'test.txt':

  grep -q Dansk test.txt && grep -q Norsk test.txt && grep -l Svenska test.txt

Will print test.txt iff the file contains all three (in any combination). The first two greps don't print anything (-q) and the last only prints the file if the other two have passed.

If you want to do it for every file in the directory:

   for f in *; do grep -q Dansk $f && grep -q Norsk $f && grep -l Svenska $f; done
Edd Steel
  • 719
  • 4
  • 16
  • but then there's no need to execute grep 3 times. – kurumi Jan 25 '11 at 23:47
  • 1
    I know you can combine patterns with -e, but I couldn't see a way of making a conjunction in grep alone. – Edd Steel Jan 26 '11 at 09:13
  • 1
    Great; re `for f ...`: use `"$f"` (double-quoting) rather than just `$f` to ensure that filenames with embedded spaces, etc. are handled correctly. – mklement0 Sep 20 '12 at 22:15
  • The advantage of this approach over @vmpstr's is that the exit code correctly reflects whether all search terms where found or not. – mklement0 Sep 20 '12 at 22:27
10

You can do this really easily with ack:

ack -l 'cats' | ack -xl 'dogs'
  • -l: return a list of files
  • -x: take the files from STDIN (the previous search) and only search those files

And you can just keep piping until you get just the files you want.

fedorqui
  • 275,237
  • 103
  • 548
  • 598
Ben Johnson
  • 2,632
  • 3
  • 21
  • 20
  • When I try this, it says `Unknown option: x`. Is there a certain version of ack which supports this x flag? –  Apr 10 '16 at 04:49
8

How to grep for multiple strings in file on different lines (Use the pipe symbol):

for file in *;do 
   test $(grep -E 'Dansk|Norsk|Svenska' $file | wc -l) -ge 3 && echo $file
done

Notes:

  1. If you use double quotes "" with your grep, you will have to escape the pipe like this: \| to search for Dansk, Norsk and Svenska.

  2. Assumes that one line has only one language.

Walkthrough: http://www.cyberciti.biz/faq/howto-use-grep-command-in-linux-unix/

Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
Damodharan R
  • 1,497
  • 7
  • 10
5
awk '/Dansk/{a=1}/Norsk/{b=1}/Svenska/{c=1}END{ if (a && b && c) print "0" }' 

you can then catch the return value with the shell

if you have Ruby(1.9+)

ruby -0777 -ne 'print if /Dansk/ and /Norsk/ and /Svenka/' file
kurumi
  • 25,121
  • 5
  • 44
  • 52
  • 1
    in your awk END clause, you probably want: `if (a && b && c) {exit 0} else {exit 1}`, or more tersely `exit !(a && b && c)` – glenn jackman Jan 25 '11 at 16:24
  • your ruby solution doesn't look right. that will only print paragraphs that contain all the search words. the question is: does the file (as a whole) contain all the words, even if they don't all appear in the same paragraph. – glenn jackman Jan 25 '11 at 16:29
  • thanks. changed if the whole file is needed, then have to use -0777 – kurumi Jan 25 '11 at 23:46
4

This searches multiple words in multiple files:

egrep 'abc|xyz' file1 file2 ..filen 
mech
  • 2,775
  • 5
  • 30
  • 38
  • 4
    In addition to finding files that have both strings, this will also find files that have either 'abc' OR 'xyz' alone. I think OP was asking for files that contain 'abc' AND 'xyz'. – Chris Warth Sep 05 '18 at 22:24
2

Simply:

grep 'word1\|word2\|word3' *

see this post for more info

fedorqui
  • 275,237
  • 103
  • 548
  • 598
moshe beeri
  • 2,007
  • 1
  • 17
  • 25
  • I would add the `-l` flag, but other than that, this answer seems the most straightforward to me, unless I'm missing something. – xdhmoore Jul 20 '17 at 18:51
  • Yep, It is also more efficient since you do not process all the data within multiple pipe and filters – moshe beeri Sep 25 '17 at 22:37
  • 5
    The question asks about an expression that returns files containing all three terms; this returns lines (instead of filenames) containing any of the three (instead of all three). – Benjamin W. Nov 29 '17 at 23:09
2

This is a blending of glenn jackman's and kurumi's answers which allows an arbitrary number of regexes instead of an arbitrary number of fixed words or a fixed set of regexes.

#!/usr/bin/awk -f
# by Dennis Williamson - 2011-01-25

BEGIN {
    for (i=ARGC-2; i>=1; i--) {
        patterns[ARGV[i]] = 0;
        delete ARGV[i];
    }
}

{
    for (p in patterns)
        if ($0 ~ p)
            matches[p] = 1
            # print    # the matching line could be printed
}

END {
    for (p in patterns) {
        if (matches[p] != 1)
            exit 1
    }
}

Run it like this:

./multigrep.awk Dansk Norsk Svenska 'Language: .. - A.*c' dvdfile.dat
Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
2

If you have git installed

git grep -l --all-match --no-index -e Dansk -e Norsk -e Svenska

The --no-index searches files in the current directory that is not managed by Git. So this command will work in any directory irrespective of whether it is a git repository or not.

Kamaraju Kusumanchi
  • 1,809
  • 19
  • 12
2

Here's what worked well for me:

find . -path '*/.svn' -prune -o -type f -exec gawk '/Dansk/{a=1}/Norsk/{b=1}/Svenska/{c=1}END{ if (a && b && c) print FILENAME }' {} \;
./path/to/file1.sh
./another/path/to/file2.txt
./blah/foo.php

If I just wanted to find .sh files with these three, then I could have used:

find . -path '*/.svn' -prune -o -type f -name "*.sh" -exec gawk '/Dansk/{a=1}/Norsk/{b=1}/Svenska/{c=1}END{ if (a && b && c) print FILENAME }' {} \;
./path/to/file1.sh
Nick Henry
  • 41
  • 1
1

I did that with two steps. Make a list of csv files in one file With a help of this page comments I made two scriptless steps to get what I needed. Just type into terminal:

$ find /csv/file/dir -name '*.csv' > csv_list.txt
$ grep -q Svenska `cat csv_list.txt` && grep -q Norsk `cat csv_list.txt` && grep -l Dansk `cat csv_list.txt`

it did exactly what I needed - print file names containing all three words.

Also mind the symbols like `' "

avasal
  • 14,350
  • 4
  • 31
  • 47
Simas
  • 11
  • 1
1

If you only need two search terms, arguably the most readable approach is to run each search and intersect the results:

 comm -12 <(grep -rl word1 . | sort) <(grep -rl word2 . | sort)
Ankur Dave
  • 106
  • 4
1

Expanding on @kurumi's awk answer, here's a bash function:

all_word_search() {
    gawk '
        BEGIN {
            for (i=ARGC-2; i>=1; i--) {
                search_terms[ARGV[i]] = 0;
                ARGV[i] = ARGV[i+1];
                delete ARGV[i+1];
            }
        }
        {
            for (i=1;i<=NF; i++) 
                if ($i in search_terms) 
                    search_terms[$1] = 1
        }
        END {
            for (word in search_terms) 
                if (search_terms[word] == 0) 
                    exit 1
        }
    ' "$@"
    return $?
}

Usage:

if all_word_search Dansk Norsk Svenska filename; then
    echo "all words found"
else
    echo "not all words found"
fi
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
0

I had this problem today, and all one-liners here failed to me because the files contained spaces in the names.

This is what I came up with that worked:

grep -ril <WORD1> | sed 's/.*/"&"/' | xargs grep -il <WORD2>
giusti
  • 3,156
  • 3
  • 29
  • 44
0

A simple one-liner in bash for an arbitrary list LIST for file my_file.txt can be:

LIST="Dansk Norsk Svenska"
EVAL=$(echo "$LIST" | sed 's/[^ ]* */grep -q & my_file.txt \&\& /g'); eval "$EVAL echo yes || echo no"

Replacing eval with echo reveals, that the following command is evaluated:

grep -q Dansk  my_file.txt && grep -q Norsk  my_file.txt && grep -q Svenska my_file.txt &&  echo yes || echo no
Tik0
  • 2,499
  • 4
  • 35
  • 50
0

To search piped input for multiple strings, where the maximum length of the input can be predicted, grep context is helpful:

content_generator | grep -C 1000 Dansk | grep -C 1000 Norsk | grep Svenska
Roger Dueck
  • 615
  • 7
  • 16