Turning .txt file (empty dataframe with all data as column names) into dataframe

Question

I get this output when displaying the contents of a .txt file:

        Empty DataFrame
        Columns: [[{'city': 'Zurich, Switzerland', 'cost': '135.74'}, {'city': 'Basel, 
        Switzerland', 'cost': '135.36'}, {'city': 'Lausanne, Switzerland', 'cost': '131.24'}, 
        {'city': 'Lugano, Switzerland', 'cost': '130.32'}, {'city': 'Geneva, Switzerland', 
        'cost': '130.14'}, {'city': 'Bern, Switzerland', 'cost': '125.86'}, {'city': 'Tromso, 
        Norway', 'cost': '114.81'}, {'city': 'Stavanger, Norway', 'cost': '108.38'} etc.]

Does anyone know how to convert this into a data frame with columns "city" and "cost"? the Pandas.DataFrame() doesn't work, it outputs the same list of dictionaries as the original file.

Please provide a [mre] of your code. – AcK Feb 16 '21 at 16:37 — AcK, Feb 16 '21 at 16:37

Stryder · Answer 1 · 2021-02-16T16:06:37.617

If you already have a list of dicts with equal keys, you should be able to do just this:

pandas.__version__ ->>  '1.1.5'

dctlst = [{"a": 1, "b":1}, {"a":2, "b":2}]
from pandas import DataFrame
df = DataFrame(dctlst)
df
   a  b
0  1  1
1  2  2

Otherwise, You can use json to make a list of dictionaries out of it.

But first you have to clean up the text a bit (after reading it):

with open(r"C:\Users\User\Desktop\toDF.txt", "r") as txt:
        txt = txt.read()
txt = txt.replace("Columns: [", "").replace("etc.", "").replace("\n", "")

If you don't remove the extra opening bracket and the other stuff json won't load it. Also, json wants double quotes so replace single quotes with doubles:

txt = txt.replace("'", '"')
txt
'[{"city": "Zurich, Switzerland", "cost": "135.74"}, {"city": "Basel,         
Switzerland", "cost": "135.36"}, {"city": "Lausanne, Switzerland", "cost": "131.24"},         
{"city": "Lugano, Switzerland", "cost": "130.32"}, {"city": "Geneva, Switzerland",         
"cost": "130.14"}, {"city": "Bern, Switzerland", "cost": "125.86"}, {"city": "Tromso,         
Norway", "cost": "114.81"}, {"city": "Stavanger, Norway", "cost": "108.38"} ]'

Now it looks like a proper list of dictionaries which can be converted by json.loads

from json import loads
from pandas import DataFrame

lst = loads(txt)
df = DataFrame(lst)

df
                         city    cost
0         Zurich, Switzerland  135.74
1  Basel,         Switzerland  135.36
2       Lausanne, Switzerland  131.24
3         Lugano, Switzerland  130.32
4         Geneva, Switzerland  130.14
5           Bern, Switzerland  125.86
6      Tromso,         Norway  114.81
7           Stavanger, Norway  108.38

If you want the rows with cities to look prettier you can look into string operations: pandas string operations

this would work, but obviously depends on what you want:

df["city"] = df["city"].astype("string").str.replace(" ","")

    df
               city    cost
0    Zurich,Switzerland  135.74
1     Basel,Switzerland  135.36
2  Lausanne,Switzerland  131.24
3    Lugano,Switzerland  130.32
4    Geneva,Switzerland  130.14
5      Bern,Switzerland  125.86
6         Tromso,Norway  114.81
7      Stavanger,Norway  108.38

And this would make it even better:

df[["city", "country"]] = df["city"].str.split(",", expand= True)

df
        city    cost      country
0     Zurich  135.74  Switzerland
1      Basel  135.36  Switzerland
2   Lausanne  131.24  Switzerland
3     Lugano  130.32  Switzerland
4     Geneva  130.14  Switzerland
5       Bern  125.86  Switzerland
6     Tromso  114.81       Norway
7  Stavanger  108.38       Norway

Thank you so much for the great response!!! The problem is that I can't use the replace function on lists and if I iterate over it the error says that I can't use it on dictionaries either: "AttributeError: 'list' object has no attribute 'replace' " — Theresa_S, Feb 16 '21 at 15:47
Did you add the "etc." in yourself, to post it as the question? can you update your question to show how you tried to add the data to the dataframe? because if you already have a list of dictionaries with the equal keys than you don't need to do any of the replacing and json stuff.. you can just pass the list of dictionaries as the only argument to DataFrame(list_of_dicts) — Stryder, Feb 16 '21 at 16:02
Yes I added the "etc." myself. I wrote a script to scrape some data from a website and saved the output in a file named "living_costs.txt". If I convert this file to a csv file and put it into a dataframe all the data is located in the column names. using the pd.DataFrame(living_cost) function returns the exact same output as the txt file itself.This is why I tried to use your proposed way of doing it but I can't make the cleaning work because the input data is a list/dict. — Theresa_S, Feb 16 '21 at 16:17
Assuming you used pandas.read_csv("living_costs.csv"), there was something wrong with conversion of txt to csv, which I am not familiar with. IMHO, It also seems a bit too complicated and I highly recommend you learn about json files. You can simply store scraped data in dictionaries, dump them as json, load them back to dicts whenever you want and place them in DFs. Check out "https://www.youtube.com/watch?v=9N6a-VLBa2I&t=684s" for a great tutorial, and I promise you it's 100 times better than converting txt to csv. and if you really need csv, use df.to_csv after making it a dataframe . — Stryder, Feb 16 '21 at 16:30
I will watch the tutorial, thank you so much for your help!!! — Theresa_S, Feb 17 '21 at 10:31

score 0 · Answer 2 · answered Mar 29 '21 at 07:20

url = "https://www.numbeo.com/cost-of-living/region_rankings_current.jsp?region=150"
response = requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")
table = BeautifulSoup(str(soup.find_all("table", id="t2")[0]), "html.parser")
table_body = BeautifulSoup(str(table.find_all("tbody")[0]), "html.parser")

findings = table_body.find_all('tr')

living_costs= []

for finding in findings:
    city = finding.find("a", class_="discreet_link").string
    cost = finding.find("td", style ="text-align: right").string
    living_costs.append({"city": city, "cost": cost})

for dicti in living_costs:
    for word in dicti:
        word.replace("Columns: [", "").replace("\n", "")

df = pd.DataFrame(living_costs)
print(df)

Turning .txt file (empty dataframe with all data as column names) into dataframe

2 Answers2