-1

I am writing a code that read a large text file line by line and find the line that starts with UNIQUE-ID (there are many of them in the file) and it comes right before a certain line (in this example, the one that starts with 'REACTION-LAYOUT -' and in which the 5th element in the string is OLEANDOMYCIN). The code is the following:

data2 = open('pathways.dat', 'r', errors = 'ignore')

pathways = data2.readlines()

PWY_ID = []
line_cont = []
L_PRMR = [] #Left primary
car = []

#i is the line number (first element of enumerate), 
#while line is the line content (2nd elem of enumerate)

for i,line in enumerate(pathways):
    if 'UNIQUE-ID' in line:
        line_cont = line
        PWY_ID_line = line_cont.rstrip()
        PWY_ID_line = PWY_ID_line.split(' ')
        PWY_ID.append(PWY_ID_line[2])
    elif 'REACTION-LAYOUT -' in line:
        L_PWY = line.rstrip()
        L_PWY = L_PWY.split(' ')
        L_PRMR.append(L_PWY[4])
    elif 'OLEANDOMYCIN' in line:
        car.append(PWY_ID)
print(car)

However, the output is instead all the lines that contain PWY_ID (output of the first if statement), like it was ignoring all the rest of the code. Can anybody help?

Edit


Below is a sample of my data (there are like 1000-ish similar "pages" in my textfile):

//
UNIQUE-ID - PWY-741
.
.
.
.
PREDECESSORS - (RXN-663 RXN-662)
REACTION-LAYOUT - (RXN-663 (:LEFT-PRIMARIES CPD-1003) (:DIRECTION :L2R) (:RIGHT-PRIMARIES CPD-1004))
REACTION-LAYOUT - (RXN-662 (:LEFT-PRIMARIES CPD-1002) (:DIRECTION :L2R) (:RIGHT-PRIMARIES CPD-1003))
REACTION-LAYOUT - (RXN-661 (:LEFT-PRIMARIES CPD-1001) (:DIRECTION :L2R) (:RIGHT-PRIMARIES CPD-1002))
REACTION-LIST - RXN-663
REACTION-LIST - RXN-662
REACTION-LIST - RXN-661
SPECIES - TAX-351746
SPECIES - TAX-644631
SPECIES - ORG-6335
SUPER-PATHWAYS - PWY-5266
TAXONOMIC-RANGE - TAX-1224
//
Andy K
  • 4,944
  • 10
  • 53
  • 82
StudentOIST
  • 189
  • 2
  • 7
  • 21
  • Can you post a few lines from your text file? – anupsabraham Oct 20 '17 at 08:24
  • Can you give some example data – Matt Oct 20 '17 at 08:25
  • 1
    Not sure if I understand the question right; are you looking for one specific line where all three conditions are true? So a line which has 'UNIQUE-ID', 'REACTION-LAYOUT -', AND 'OLEANDOMYCIN'? – Lukas Ansteeg Oct 20 '17 at 08:25
  • A few problems. You probably don't want lists for `PWY_ID` and `L_PRMR` if you're appending to `car`. You're not checking that REACTION-LAYOUT came immediately after the UNIQUE-ID and then you're not checking for OLEANDOMYCIN in the line that matched REACTION-LAYOUT. – AndrewS Oct 20 '17 at 08:34
  • Sorry, I updated the question with an example from my data – StudentOIST Oct 20 '17 at 12:16
  • @LukasAnsteeg no, I am looking for the line that contains 'REACTION-LAYOUT', 'OLEANDOMYCIN' and then I want to search what is the line that contains the UNIQUE-ID – StudentOIST Oct 21 '17 at 03:23

2 Answers2

1

I think it would have been helpful if you'd posted some examples of data. But an approximation to what you're looking for is:

with open('pathways.dat','r', errors='ignore') as infile:
  i = infile.read().find(string_to_search)
  infile.seek(i+number_of_chars_to_read)

I hope this piece of code will help you focus your script on this line.

  • I don't understand this code, could you please explain what it does? – StudentOIST Oct 21 '17 at 03:25
  • The first line opens the file in read mode. The second line reads the WHOLE file and searches for the last string as a parameter. This causes the variable i to store the index of the first character where the string was found. Therefore, with this index and knowing the number of characters to read, is what we recover in the third line. Example, suppose that the input file has the following lines: RIO 789 CAR 943 BAG 134 Continue in next comment! – Daniel Garrido Oct 25 '17 at 20:18
  • In the second line, the part of infile.read () would read it completely, that is to say the three lines in this case, and on this it looks for the chain that we pass to it, suppose we look for CAR. This will save in the variable i the index where the string CAR of the first occurrence begins. If then in the third line we make number_of_chars_to_read were 7, for example, then the third line what it returns is "CAR 943". I hope with this mini example to have guided and helped you in your problem. Any other clarification you give me and I'll help you. – Daniel Garrido Oct 25 '17 at 20:18
  • PS: My original answer contained some blank spaces in the code that should not be there. I just edited it. – Daniel Garrido Oct 25 '17 at 20:18
0

print(car) is printing out the list of all lines added by PWD_ID.append(PWY_ID_line[2]) in the first if, since you are appending the whole list of PWD_ID to car when you do car.append(PWY_ID). so, if you want to print out the list of lines with OLEANDOMYCIN, you might want to just do car.append(line).

Matt
  • 364
  • 4
  • 10