0

I'm trying to read a file (d:\mywork\list.txt) line by line and search if that string occurs in any of the files (one by one) in a particular directory (d:\new_work).

If present in any of the files (may be one or more) I want to delete the string (car\yrui3,) from the respective files and save the respective file.

list.txt:

car\yrui3,
dom\09iuo,
id\byt65_d,
rfc\some_one,
desk\aa_tyt_99,
.........
.........

Directory having multiple files: d:\new_work:

Rollcar-access.txt
Mycar-access.txt
Newcar-access.txt
.......
......

My code:

value=File.open('D:\\mywork\\list.txt').read
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
    line.chomp!
    print "For the string: #{line}"
    Dir.glob("D:/new_work/*-access.txt") do |fn|
      print "checking files:#{fn}\n"
      text = File.read(fn)
      replace = text.gsub(line.strip, "")
      File.open(fn, "w") { |file| file.puts replace }
    end
 end

The issue is, values are not getting deleted as expected. Also, text is empty when I tried to print the value.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
voltas
  • 553
  • 7
  • 24
  • 1
    We're not interested in whether you're new or experienced, we want well researched, and well-asked, concise, questions. I'd recommend reading "[How To Ask Questions The Smart Way](http://catb.org/esr/faqs/smart-questions.html)" as it explains working with communities such as SO. – the Tin Man May 11 '17 at 17:54
  • Your task is somewhat of an "[XY Problem](https://meta.stackexchange.com/q/66377/153968)". You're asking about your implementation, when you probably should have asked about solving the task first. I'd recommend rethinking the problem. What do the files represent? Why is the data scattered among multiple files? Consider using a database to store the content of the files, where you can rapidly search and delete. Even SQLite could make quick work of this, and do it very easily using an ORM like Sequel, Datamapper or Active Record. – the Tin Man May 11 '17 at 18:01

2 Answers2

2

There are a number of things wrong with your code, and you're not safely handling your file changes.

Meditate on this untested code:

ACCESS_FILES = Dir.glob("D:/new_work/*-access.txt")

File.foreach('D:/mywork/list.txt') do |target|
  target = target.strip.sub(/,$/, '')

  ACCESS_FILES.each do |filename|
    new_filename = "#{filename}.new"
    old_filename = "#{filename}.old"

    File.open(new_filename, 'w') do |fileout|
      File.foreach(filename) do |line_in|
        fileout.puts line_in unless line_in[target]
      end
    end

    File.rename(filename, old_filename)
    File.rename(new_filename, filename)
    File.delete(old_filename)
  end
end
  • In your code you use:

    File.open('D:\\mywork\\list.txt').read
    

    instead, a shorter, and more concise and clear way would be to use:

    File.read('D:/mywork/list.txt')
    

    Ruby will automatically adjust the pathname separators based on the OS so always use forward slashes for readability. From the IO documentation:

Ruby will convert pathnames between different operating system conventions if possible. For instance, on a Windows system the filename "/gumby/ruby/test.rb" will be opened as "\gumby\ruby\test.rb".

The problem using read is it isn't scalable. Imagine if you were doing this in a long term production system and your input file had grown into the TB range. You'd halt the processing on your system until the file could be read. Don't do that.

Instead use foreach to read line-by-line. See "Why is "slurping" a file not a good practice?". That'll remove the need for

    value.gsub!(/\r\n?/, "\n")
    value.each_line do |line|
      line.chomp!
  • While

    Dir.glob("D:/new_work/*-access.txt") do |fn|
    

    is fine, its placement isn't. You're doing it for every line processed in your file being read, wasting CPU. Read it first and store the value, then iterate over that value repeatedly.

  • Again,

    text = File.read(fn)
    

    has scalability issues. Using foreach is a better solution. Again.

  • Replacing the text using gsub is fast, but it doesn't outweigh the potential problems of scalability when line-by-line IO is just as fast and sidesteps the issue completely:

    replace = text.gsub(line.strip, "")
    
  • Opening and writing to the same file as you were reading is an accident waiting to happen in a production environment:

    File.open(fn, "w") { |file| file.puts replace }
    

    A better practice is to write to a separate, new, file, rename the old file to something safe, then rename the new file to the old file's name. This preserves the old file in case the code or machine crashes mid-save. Then, when that's finished it's safe to remove the old file. See "How to search file text for a pattern and replace it with a given value" for more information.

A final recommendation is to strip all the trailing commas from your input file. They're not accomplishing anything and are only making you do extra work to process the file.

Community
  • 1
  • 1
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
1

I just ran your code and it works as expected on my machine. My best guess is that you're not taking the commas at the end of each line in list.txt into account. Try removing them with an extra chomp!:

value=File.open('D:\\mywork\\list.txt').read
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
    line.chomp!
    line.chomp!(",")
    print "For the string: #{line}"
    Dir.glob("D:/new_work/*-access.txt") do |fn|
      print "checking files:#{fn}\n"
      text = File.read(fn)
      replace = text.gsub(line.strip, "")
      File.open(fn, "w") { |file| file.puts replace }
    end
 end

By the way, you shouldn't need this line: value.gsub!(/\r\n?/, "\n") since you're chomping all the newlines away anyway, and chomp can recognize \r\n by default.

eiko
  • 5,110
  • 6
  • 17
  • 35