1

This is my file text:

Covid-19 Data
Country / Number of infections / Number of Death
USA  124.356  2.236
Netherlands  10.866  771
Georgia  90  NA
Germany  58.247  455

I created a function to calculate the ratio of deaths compared to the infections, however it does not work, because some of the values aren't floats.

f=open("myfile.txt","w+")

x="USA" + " " + " " + "124.356" + " " + " " + "2.236"
y="Netherlands" + " " + " " + "10.866" + " " + " " + "771"
z="Georgia" + " " + " " + "90" + " " + " " + "NA"
w="Germany" + " " + " " + "58.247" + " " + " " + "455"

f.write("Covid-19 Data" + "\n" + "Country" + " " + "/" + " " + "Number of infections" + " "  + "/" + " " + "Number of Death" + "\n")
f.write(x + "\n")
f.write(y + "\n")
f.write(z + "\n")
f.write(w)

f.close()

with open("myfile.txt", "r") as file:


        try:
            for i in file:
                t = i.split()
                    result=float(t[-1])/float(t[-2])
                    print(results)
        except:
            print("fail")
        file.close()

Does someone have an idea how to solve this problem ?

Rafael
  • 7,002
  • 5
  • 43
  • 52
Clem-Clem123
  • 57
  • 1
  • 1
  • 7
  • 1
    Maybe it just chokes on the first line of your file, which has the column name strings? Try leaving that line out of the loop. – Arne Apr 01 '20 at 17:42
  • 1
    That is a difficult file format to parse. Sometimes it has all 3 data points, sometimes not. Its separated by spaces but what about countries with spaces in their names? Maybe the best answer is to get a better dataset! Like a csv. Johns Hopkins is refreshing data on github daily https://github.com/CSSEGISandData/COVID-19.git. And there are other sources. – tdelaney Apr 01 '20 at 17:45
  • Are you stuck with this file format? I notice in your example that you generate it yourself. It would be much easier if the delimiter wasn't `" "`. – tdelaney Apr 01 '20 at 18:03

4 Answers4

3

You can do the following:

with open("myfile.txt", "r") as file:
    for i in file:
      t = i.split()

      try:
        result = float(t[-1]) / float(t[-2])
        print(result)
      except ValueError:
        pass

At the time you don't know if the values you are trying to divide are numeric values or not, therefore surrounding the operation with a try-catch should solve your problem.

If you want to become a bit more "clean" you can do the following:

def is_float(value):
  try:
    float(value)
  except ValueError:
    return False

  return True

with open("myfile.txt", "r") as file:
    for i in file:
      t = i.split()
      if is_float(t[-1]) and is_float(t[-2]):
        result = float(t[-1]) / float(t[-2])
        print(result)

The idea is the same, however.

Rafael
  • 7,002
  • 5
  • 43
  • 52
1

I used the same file that you attached in your example. I created this function hopefully it helps:

with open("test.txt","r") as reader:
    lines = reader.readlines()

for line in lines[2:]:
    line = line.replace(".","") # Remove points to have the full value
    country, number_infections, number_deaths = line.strip().split()
    try:
        number_infections = float(number_infections)
        number_deaths = float(number_deaths)
    except Exception as e:
        print(f"[WARNING] Could not convert Number of Infections {number_infections} or Number of Deaths {number_deaths} to float for Country: {country}\n")
        continue
    ratio = number_deaths/number_infections
    print(f"Country: {country} D/I ratio: {ratio}")

As you can see I avoided the headers of your file using lines[2:] that means that I will start from row 3 of your file. Also, added try/exception logic to avoid non-float converts. Hope this helps!

Edit Just noticed that the format for thousands is used with "." instead "," in that case the period was removed in line 7.

The results for this execution is:

Country: USA D/I ratio: 0.017980636237897647
Country: Netherlands D/I ratio: 0.07095527332965212

[WARNING] Could not convert Number of Infections 90.0 or Number of Deaths NA to float for Country: Georgia

Country: Germany D/I ratio: 0.007811561110443456
EnriqueBet
  • 1,482
  • 2
  • 15
  • 23
1

Fixed the following:

  • The first two lines in your text-file are headers. These need to be skipped
  • 'NA' Can't be converted to zero
  • If there is a 0 in your data, your program would crash. Now it wouldn't.
f=open("myfile.txt","w+")

x="USA" + " " + " " + "124.356" + " " + " " + "2.236"
y="Netherlands" + " " + " " + "10.866" + " " + " " + "771"
z="Georgia" + " " + " " + "90" + " " + " " + "NA"
w="Germany" + " " + " " + "58.247" + " " + " " + "455"

f.write("Covid-19 Data" + "\n" + "Country" + " " + "/" + " " + "Number of infections" + " "  + "/" + " " + "Number of Death" + "\n")
f.write(x + "\n")
f.write(y + "\n")
f.write(z + "\n")
f.write(w)

f.close()

with open("myfile.txt", "r") as file:

        #Skipping headers
        next(file)
        next(file)

        try:
            for i in file:
                t = i.split()

                #Make sure your code keeps working when one of the numbers is zero
                x = 0
                y = 0

                #There are some NA's in your file. Strings not representing
                #a number can't be converted to float
                if t[1] != "NA":
                    x = t[1]
                if t[2] != "NA":
                    y = t[2]

                if x == 0 or y == 0:
                    result = 0
                else:
                    result=float(x)/float(y)

                print(t[0] + ": " + str(result))

        except:
            print("fail")
file.close()

Output:

USA: 55.615384615384606
Netherlands: 0.014093385214007782
Georgia: 0
Germany: 0.12801538461538461
O'Niel
  • 1,622
  • 1
  • 17
  • 35
  • 1
    I disagree showing in the output the result for Georgia, as that will mean that all the infected people will die, according to your ratio. Also, I believe that the input gave by the user needs to be reformated as ```124.356``` will be equal to ```124356```, removing the period. – EnriqueBet Apr 01 '20 at 19:08
  • 1
    @EnriqueBet True. Edited. Better? :) – O'Niel Apr 01 '20 at 21:31
  • 1
    Yes, that seems a lot better! :D – EnriqueBet Apr 01 '20 at 21:35
0

Your header line in the file is Covid-19 Data. this is the first line and when you call t=i.split() you then have a list t which has data ['Covid-19', 'Data']

you cannot convert these to floats since they have letters in them. Instead you should read the first 2 header line before the loop and do nothing with them. However you are then going to have issues with Georgia as "NA" also cannot be converted to a float.

A few other points, its not good practice to have a catch all exception. Also you dont need to close the file explicitly if you open the file using a with statement.

Chris Doyle
  • 10,703
  • 2
  • 23
  • 42