0

I am trying to parse the following data file. The file is a snippet from the original file which is much larger, but has same structure.

0   0.0059354815313768  0.000109666861931809    4.67297178729149e-05    0.000160593629759828
1e-07   0.0059354815313768  0.000109666861931809    4.67297178729149e-05    0.000160593629759828    
1.20226443461741e-07    0.00593548153136993 0.000134002335569027    4.67297178728227e-05    0.000201020108334994    
1.31825673855641e-07    0.00593548153136543 0.000147957965791881    4.67297178727586e-05    0.000224203424726248    
1.44543977074593e-07    0.00593548153135997 0.000163260010030845    4.67297178726794e-05    0.000249623425870511    
1.58489319246111e-07    0.00593548153135335 0.000180038367935316    4.67297178725815e-05    0.000277495902647069
1.58489319fcdsdds-07    0.00593548153135335 0.000180038367935316    4.67297178725815e-05    0.000277495902647069

In the above data file its a 22 matrix, but can be a nn matrix. The elements are separated by \t. In case of a 2*2 matrix each row will have 5 elements (1st frequency and the other 2 and 2 elements make 1 value).

For example:

0   0.0059354815313768  0.000109666861931809    4.67297178729149e-05    0.000160593629759828

0 is frequency. 0.0059354815313768 0.000109666861931809 is element 1 (but they are two different values) and 4.67297178729149e-05 0.000160593629759828 is element 2 (similarly they are also two different values).

The matrixes can be for any number of frequencies. I do not know the frequencies in advance, but I do know the matrix size (i.e. its a 2*2 matrix) in advance.

The was I implementing it was:

  1. Split the items by \t and add them sequentially to a list.
  2. Run an outer loop until there are elements in the list.
  3. Run an inner loop until matrix size + 1 (for frequency). So in this example (2*2+1)
  4. The 0th element in the inner loop will be frequency. Append the frequency to a separated list and remove it from the original list.
  5. Build a map (key is frequency and value is the matrix). Or a python object.
  6. Keep removing the items from the original list.

Below is my code to get the frequency:

if __name__=="__main__":
with open("temp.txt", "r") as file:
    newline_break = ""
    list_test = []
    for readline in file:
        line_strip = readline.split('\t')
        for ll in line_strip:
            if ll != '' and ll != ' ':
                list_test.append(ll.strip())
    freq = []
    length = len(list_test)
    while length > 0:
        freq.append(list_test[0])
        for i in range(0, 6, 1):
            #print('poping', i)
            if len(list_test) > 0:
                list_test.pop()
        print('list 2 size', len(list_test))
        if len(list_test) > 0:
            print('list 2 item', list_test[0])
        length = len(list_test)
    print(len(list_test))
    print('Freq is: ',freq)

The code does remove the item, but it always prints "0".

Freq is:  ['0', '0', '0', '0', '0', '0', '0']
Ghoul Fool
  • 6,249
  • 10
  • 67
  • 125
  • There are only 9 items in each row. Removing the first 9 items removes everything. – Barmar Jun 24 '22 at 00:07
  • FYI, a simpler way to remove the first 9 items is `list[0:9] = []` – Barmar Jun 24 '22 at 00:07
  • You're removing the *last* 9 items, not the *first* 9. – Barmar Jun 24 '22 at 00:09
  • BTW, don't use `list` as a variable name. This is a built-in class name. – Barmar Jun 24 '22 at 00:09
  • But I am first appending the 0th item to another list. `freq.append(list[0])`. I get only 1e-07 even if I print freq. – user18994682 Jun 24 '22 at 00:11
  • The code never prints `freq`, I thought you were talking about `print('list 2 item', list[0])` – Barmar Jun 24 '22 at 00:13
  • If you're talking about `freq`, it doesn't matter what you're removing. – Barmar Jun 24 '22 at 00:15
  • Updated the code. – user18994682 Jun 24 '22 at 00:15
  • Oh, now I see what you're doing. First you're flattening the entire file into a single list, then you're trying to process it by every 9 items. Why are you flattening instead of keeping the data organized into rows? – Barmar Jun 24 '22 at 00:17
  • Because the original file has thousands of lines and the data is not properly organized. Its separated more like by tabs. In other words one line can be into multiple lines, but I know that one line ends after say 9 elements. I can explain further if needed. – user18994682 Jun 24 '22 at 00:20
  • The problem is that, after removing all of the unwanted elements from the "current row", the remaining element that you want is at the **end** of `list_test`, not the beginning. Each time through the loop, `freq.append(list_test[0])` appends the same value: the very first thing that was ever put into `list_test` (because nothing in the code ever displaces it). – Karl Knechtel Jun 24 '22 at 00:30
  • "Because the original file has thousands of lines and the data is not properly organized. Its separated more like by tabs. In other words one line can be into multiple lines, but I know that one line ends after say 9 elements." This doesn't answer the question that was asked of you, and isn't remotely a reason to try to take the approach that you are taking. It doesn't matter how many lines the file has: you want *only the first* item *from each* line - so, *while you are reading the lines*, just *find out what the first item is*, and *only put that* into `list_test`. – Karl Knechtel Jun 24 '22 at 00:32
  • " Its separated more like by tabs. In other words one line can be into multiple lines" This doesn't make sense. If the values for a given "row" are separated by tabs, then they *necessarily are on one line*. When you read "a line" from a file, it reads *as far as it has to* until it finds an *actual newline character*. It *does not matter* if that text would wrap in a text editor. – Karl Knechtel Jun 24 '22 at 00:34
  • I just updated the data file. This data file can have any number of frequency points. In this file `0`, `1e-07`, `1.20226443461741e-07`, `1.31825673855641e-07`, `1.44543977074593e-07` are the frequency points. After frequency points its a 10*10 matrix. The size of matrix can be n. 1 item = 2 values that's why 10*10 matrix has 200 values. I am trying to parse this file. I was splitting with '\t' and adding to list then running the outer loop until the list is empty. Running Inner loop for n*n+1 elements (+1 is for frequency). I can explain further if needed. – user18994682 Jun 24 '22 at 00:45
  • Please read [ask] and [mre] and try to create a small example of the data that demonstrates the problem. For example, we don't actually care (yet) that the values are numbers, right? If we just use single-letter examples for the individual values, is it possible to reproduce the problem that way? What if there are fewer items per line (but we still show the extra newlines in the file, and adjust the `9` in the code accordingly)? More to the point, try to explain, step by step, *how the data is structured*, *how it needs to be processed*, and *exactly what the result should be*. – Karl Knechtel Jun 24 '22 at 00:58
  • To be clear: the overall goal is to *extract every nth item* from the file, where "items" are separated by whitespace, treating newlines as unimportant. Correct? – Karl Knechtel Jun 24 '22 at 00:59
  • Does https://stackoverflow.com/questions/1403674 answer your question? – Karl Knechtel Jun 24 '22 at 01:01
  • Correct, but I do want to separate the frequencies from the matrixes. I've updated the question also. – user18994682 Jun 24 '22 at 01:19

2 Answers2

3

I think you're making this much more complicated than it needs to be. This works:

for line in open('x.txt'):
    parts = line.split()
    print(float(parts[0]))

Output:

1e-07
1e-08
1e-09
1e-10
1e-11
1e-12
1e-13
Tim Roberts
  • 48,973
  • 4
  • 21
  • 30
2

You're making this more complicated than it needs to be. If you want every 9th element of the list, use a slice with a step size of 9.

freq = list_test[0::9]
Barmar
  • 741,623
  • 53
  • 500
  • 612