select specific lines with specific condition from a large text file with python

Question

I have a text file which has 5 columns and more than 2 million lines as follows,

134   1   2   6    3.45388e-06
135   1   2   7    3.04626e-06
136   1   2   8    4.69364e-06
137   1   2   9    4.21627e-06
138   1   2   10   2.38822e-06
139   1   2   11   1.91216e-06
...
140   1   3   2    5.23876e-06
141   1   3   3    2.83415e-06
142   1   3   7    2.32097e-06
143   1   3   9    6.26283e-06
144   1   3   16   4.22556e-06
...  
145   2   1   2   3.67182e-06
146   2   1   4   4.61481e-06
147   2   1   6   1.1703e-06
...
148   2   2   7   4.61242e-06
149   2   2   21   1.84259e-06
150   2   2   22   4.31435e-06
...
151   2   3   23   4.34518e-06
152   2   3   24   3.76576e-06
153   2   3   25   2.61061e-06
...
154   3   1   2   4.07107e-06
155   3   1   7   4.83971e-06
156   3   1   8   3.43919e-06
...
157   3   2   29   6.27991e-06
158   3   2   30   7.44213e-06
159   3   2   31   9.56985e-06
...
160   3   3   32   1.38377e-05
161   3   3   33   1.62724e-05
162   3   3   34   9.85653e-06
...

Second column shows the number of each layer and in each layer I have a matrix which columns 3 and 4 are numbers of row and columns in that layer and the last column is my data First I want to put a condition that if second column is equal to a number (for example 3), print all the lines of this condition in another file, and I am doing that with this code:

    with open ('C:\ Python \ projects\A.txt') as f,  open('D:\ Python \New_ projects\NEW.txt', 'w') as f2:
      # read the data from layer 120 and write these data in New.txt file
      for f in islice (f,393084, 409467):
        f2.write(f)
a = np.loadtxt('D:\ Python \New_ projects\NEW.txt

But the problem is that for each layer I have to go to the file and find the first and last number of lines for each layer and put it in islice , which takes so long because the file is so big What can I do that I just say for column[2] = 4 save the lines in New text??

***** and After that I need another condition that for Column[2] = 4 , if 20 <=column[3] <= 50 and
80 <=column[4] <= 120 --> save these lines in another file..

Read 'A.txt' into a list and work on it: datalist=f.readlines() — kantal, Aug 12 '18 at 18:35

score 0 · Answer 1 · answered Aug 13 '18 at 04:48

0

use grep function and create another file with filtered data, grep is fast enough to work on files

answered Aug 13 '18 at 04:48

Sagan Pariyar

200
1
1
11

score 0 · Answer 2 · answered Aug 13 '18 at 21:23

0

As mentioned grep could be a fast non-python solution, using regex, for example:

grep -E '^[0-9]+\s+[0-9]+\s+2\s' testgrep.txt > output.txt

which save in the file output.txt all the lines with a 2 on the third column. See https://regex101.com/r/j5EfEE/1 for the details about the pattern.

answered Aug 13 '18 at 21:23

xdze2

3,986
2
12
29

Tnx, it sounds good,and I think I can use that, but I need to do in python, too, because this is a part of my code and I need to use this file afterwards in python, and even I do not know that it is possible to put a condition for example ( for 20 <=column[3] <= 50 print in output.txt ) using grep or not?? – poya_89 Aug 15 '18 at 11:12
I edited the question to make it more to the point... One way to do this in Python is use the `csv` module for example in a similar way of this answer [here](https://stackoverflow.com/a/51755998/8069403) i.e. reading the file line by line, and writing the filtered line on another file, I am not sure I would be efficient for large file, but the best is to make it work first with a relatively small file... Could you try to add in your question an attempt using `csv`? – xdze2 Aug 16 '18 at 20:27
Tnx, but I solve my problem using Panda, it was very simple and you can filter data in each column... – poya_89 Aug 21 '18 at 17:16

select specific lines with specific condition from a large text file with python

2 Answers2