0

I've looked all over for this but I can only find 'Compare two text files for differences', or 'compare two files and create another with the missing words'. I have a list of words & short phrases and I need to check inside a text file for occurrences of these words/phrases. Each phrase in the list file is on a separate line. I just need to print/log the phrases that appear in the text file.

I know I should already have something worked out but I'm not sure where to start. Can I get a hint even?

boogiewonder
  • 145
  • 1
  • 9
  • surely you just do a `grep` on the file: http://stackoverflow.com/questions/633396/whats-the-best-way-to-search-for-a-string-in-a-file – jonk Dec 07 '15 at 17:00
  • 1
    Your question is a little vague. For example, if the list of search terms includes `foo`, should a file with the word `food` in it match? If the search terms include the phrase `hello world` should a file match if it has `hello` at the end of one line and `world` at the beginning of the next line, i.e. `"hello\nworld"`? What about even more white space, e.g. `"hello \n\tworld"`? Does case matter? How big is the text file (would reading the whole thing into memory be a problem)? Finally, is there a reason you can't use `grep` as @jonk suggests? – Jordan Running Dec 07 '15 at 17:19
  • Where is your minimal input example? Where is the minimal example of your code showing your attempt at solving the problem? Please read "[ask]" and provide the information needed. As is, we have no idea of your expertise, what you tried, what sort of text you're matching, the sort of words/short phrases, or the output, leading to an extremely broad and poorly defined question. Please read http://meta.stackoverflow.com/questions/261592/how-much-research-effort-is-expected-of-stack-overflow-users to get an idea of what's expected. – the Tin Man Dec 07 '15 at 17:56
  • This is a duplicate of "[What's the best way to search for a string in a file?](http://stackoverflow.com/questions/633396/whats-the-best-way-to-search-for-a-string-in-a-file?lq=1)". – the Tin Man Dec 07 '15 at 18:30
  • I tried to manipulate everything on that page @theTinMan but couldn't get it to work. To explain the problem better: One file (listfile) has a list of names in rows, eg. `Bob, Brian Smith, Ted, Jane Doe` . Another file (textfile) has random text. I need to check each word in the listfile like Bob, or "Brian Smith" against the random text in the textfile. Then the words that match are returned/printed/stored. The text file was huge but I parsed it down to about 200 lines so nothing major. The list is 50. If I could account for spaces in between words that would be a huge bonus. – boogiewonder Dec 07 '15 at 19:09
  • What does "couldn't get it to work" mean? Please don't describe the problem in a comment. Edit your question, and add the information there as if it was there originally, where it's expected to be. A description does little good; Code is worth 1,000 words and it lets us provide a fix that applies directly to your work. How big is huge? I work with files that are 2-3GB. – the Tin Man Dec 07 '15 at 19:16
  • I answered here and it's dissappeared! Would someone be deleting my comments? I'm only trying to explain myself. – boogiewonder Dec 07 '15 at 19:47

1 Answers1

0

The assumption is you want to print each and every occurrence of a phrase, I dont know your format but assuming the phrases are in some iterable object the below will work:

text = 'hello man sup and hello'

phrases = ['hello man', 'sup']

def print_string_occurences(text, phrases)
  phrases.each do |p|
    text.scan(/#{p}/).each { |s| puts s }
  end
end

output:

hello man
sup
Pippo
  • 905
  • 11
  • 22
  • Rather than try to answer a poorly defined question, it's better to ask questions to get a good idea what the real problem is, then write a detailed answer. – the Tin Man Dec 07 '15 at 17:58
  • 2
    It's a problem we see on SO very often, caused by the points system. People treat it like they're winning a video game, rather than providing answers to great questions. SO's goal is to be a cookbook of programming questions and answers, but if the questions are not good then the answers end up being broad swipes at answering the question. Answer when the question makes sense, otherwise ask for clarification. Of course others will try to answer without knowing, but it's the spot-on answers that help and gather points over time. – the Tin Man Dec 07 '15 at 18:43