0

I'm quite new to Python and generally used to Java. I'm currently trying to parse a text file outputted by Praat that is always in the same format and looks generally like this, with a few more features:

-- Voice report for 53. Sound T1_1001501_vowels --
Date: Tue Aug  7 12:15:41 2018

Time range of SELECTION
    From 0 to 0.696562 seconds (duration: 0.696562 seconds)
Pitch:
   Median pitch: 212.598 Hz
   Mean pitch: 211.571 Hz
   Standard deviation: 23.891 Hz
   Minimum pitch: 171.685 Hz
   Maximum pitch: 265.678 Hz
Pulses:
   Number of pulses: 126
   Number of periods: 113
   Mean period: 4.751119E-3 seconds
   Standard deviation of period: 0.539182E-3 seconds
Voicing:
   Fraction of locally unvoiced frames: 5.970%   (12 / 201)
   Number of voice breaks: 1
   Degree of voice breaks: 2.692%   (0.018751 seconds / 0.696562 seconds)

I would like to output something that looks like this:

0.696562,212.598,211.571,23.891,171.685,265.678,126,113,4.751119E-3,0.539182E-3,5.970,1,2.692

So essentially I want to print out a string of just the numbers between the colon and its following whitespace from each line, separated by commas. I know this might be a stupid question but I just can't figure it out in Python; any help would be much appreciated!

3 Answers3

1

Okay here is something simple, that you need to tweak a little to work for you.

import re
with open("file.txt", "r") as f:
  lines = [s.strip() for s in f.readlines()]
  numbers_list = []
  for _ in lines : 
    numbers_list.append(re.findall(r'\d+', _))
  print(numbers_list)

where file.txt is your file.

Nash
  • 105
  • 1
  • 9
  • I think the problem with stripping is that in some cases there are numbers in the string that I don't want included, for example I don't want the numbers between parentheses under voicing, but I do want the "E" in the scientific notation of some numbers to be included. This is why I've been trying to specifically get the substring between ":" and the whitespace that follows it. – ling-analysis Aug 07 '18 at 19:28
  • You can do another regex before the one that checks for the integers or an if statement to check weather the ":" is in the string for example. as i said this will get you all the numbers, and then it's up to you how you wanna tweak it – Nash Aug 07 '18 at 20:39
1

Maybe:

for line in text.splitlines():
         line=line.strip()
         head,sepa,tail=line.partition(":")
         if sepa:
             parts=tail.split(maxsplit=1)
             if parts and all( ch.isdigit() or ch in ".eE%-+" for ch in parts[0]):
                 num=parts[0].replace("%"," ")
                 try:
                     print(float(num.strip()))
                 except ValueError:
                     print("invalid number:",num)

Out:

0.696562
212.598
211.571
23.891
171.685
265.678
126.0
113.0
0.004751119
0.000539182
5.97
1.0
2.692
kantal
  • 2,331
  • 2
  • 8
  • 15
  • @ling-analysis There are two constructs to learn for a python newcomer: 1) generator comprehension that I have used in all(); 2) the try...except clause. (I must change the 'except' to "except ValueError") – kantal Aug 07 '18 at 19:58
0

Thank you for the help everyone! I actually came up with this solution:

import csv

input = 't2_5.txt'
input_name = input[:-4]

def parse(filepath):
data = []
with open(filepath, 'r') as file:
    file.readline()
    file.readline()
    file.readline()
    for line in file:
        if line[0] == ' ':
            start = line.find(':') + 2
            end = line.find(' ', start)
            if line[end - 1] == '%':
                end -= 1
            number = line[start:end]
            data.append(number)
with open(input_name + '_output.csv', 'wb') as csvfile:
    wr = csv.writer(csvfile)
    wr.writerow(data)

parse(input)