-2

I have a dataframe which looks like this:

enter image description here

code to create the df:

dd = {'name': ["HARDIE'S MOBILE HOME PARK", 'CRESTVIEW RV PARK',
       'HOMESTEAD TRAILER PARK', 'HOUSTON PARK MOBILE HOME PARK',
       'HUDSON MOBILE HOME PARK', 'BEACH DRIVE MOBILE HOME PARK',
       'EVANS TRAILER PARK'],
       'country': ['USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA'],
      'coordinates': ['30.44126118, -86.6240656099999',
       '30.7190163500001, -86.5716222299999',
       '30.5115772500001, -86.4628417499999',
       '30.4424195300001, -86.64733076',
       '30.7629176200001, -86.5928893399999', '30.44417349, -86.59951996',
       '30.4427800300001, -86.62941091'],
      'status':['OPEN', 'CLOSED', 'OPEN', 'OPEN', 'OPEN', 'OPEN', 'OPEN']}

df2 = pd.DataFrame(data=dd)

What I want to do is to create a dictionary with the following structure:

{'destination1': 'CRESTVIEW RV PARK; 30.7190163500001, -86.5716222299999',
 'destination2': 'HOMESTEAD TRAILER PARK; 30.5115772500001, -86.4628417499999',
 'destination3': 'HOUSTON PARK MOBILE HOME PARK; 30.4424195300001, -86.64733076',
 'destination4': 'HUDSON MOBILE HOME PARK; 30.7629176200001, -86.5928893399999',
 'destination5': 'BEACH DRIVE MOBILE HOME PARK  ; 30.44417349, -86.59951996'}

As you may see, each value must contain name;coordinates from second row to the last row. I am using the following code to do that:

d1 = {f"destination{k}":v + "; " + i for k in range(1, len(df1)-1) for v,i in zip(df1.name, df1.coordinates)}

However, this is the output I am getting:

{'destination1': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091',
 'destination2': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091',
 'destination3': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091',
 'destination4': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091',
 'destination5': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091'}

It is only reading the last line from the dataframe and each key has the same value but what I want is that for each key, its value must come from each row from the dataframe from the second row to the last row.

If anyone has any idea of how to do that I would really appreciate your help.

brenda
  • 656
  • 8
  • 24
  • Please add textual representation of the data instead of images. How are people going to reproduce and test? – Epsi95 Feb 22 '21 at 04:42
  • Can you add code that creates your dataframe? people who are keen to help you aren't keen to manually re-type your data. Also, don't post images of a code or data. Post them as text. This makes things more searchable and easy for screen readers to communicate the contents of the question to people who are visually impaired. – Paul H Feb 22 '21 at 04:42
  • FYI: If the loop is `for i in l1: for j in l2` then it goes like `[...for i in l1 for j in l2]` in list comprehension. – Epsi95 Feb 22 '21 at 04:48
  • I have just updated the post with code to create the df. – brenda Feb 22 '21 at 04:50
  • Please provide the expected [MRE - Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example). Show where the intermediate results deviate from the ones you expect. We should be able to paste a single block of your code into file, run it, and reproduce your problem. This also lets us test any suggestions in your context. – Prune Feb 22 '21 at 04:57

2 Answers2

1

You can enumerate the zip like this,

dd = {'name': ["HARDIE'S MOBILE HOME PARK", 'CRESTVIEW RV PARK',
       'HOMESTEAD TRAILER PARK', 'HOUSTON PARK MOBILE HOME PARK',
       'HUDSON MOBILE HOME PARK', 'BEACH DRIVE MOBILE HOME PARK',
       'EVANS TRAILER PARK'],
       'country': ['USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA'],
      'coordinates': ['30.44126118, -86.6240656099999',
       '30.7190163500001, -86.5716222299999',
       '30.5115772500001, -86.4628417499999',
       '30.4424195300001, -86.64733076',
       '30.7629176200001, -86.5928893399999', '30.44417349, -86.59951996',
       '30.4427800300001, -86.62941091'],
      'status':['OPEN', 'CLOSED', 'OPEN', 'OPEN', 'OPEN', 'OPEN', 'OPEN']}

df1 = pd.DataFrame(data=dd)

d_out = {
    f"destination{idx+1}":'; '.join(v) for idx, v in enumerate(zip(df1.name[1:], df1.coordinates[1:]))
}

d_out

{'destination1': 'CRESTVIEW RV PARK; 30.7190163500001, -86.5716222299999',
 'destination2': 'HOMESTEAD TRAILER PARK; 30.5115772500001, -86.4628417499999',
 'destination3': 'HOUSTON PARK MOBILE HOME PARK; 30.4424195300001, -86.64733076',
 'destination4': 'HUDSON MOBILE HOME PARK; 30.7629176200001, -86.5928893399999',
 'destination5': 'BEACH DRIVE MOBILE HOME PARK; 30.44417349, -86.59951996',
 'destination6': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091'}

You don't have to do a dict comprehension to get this result, you can get this is if you can make a couple of columns in the pandas dataframe like this.

df1['destination'] = [f"destination{k}" for k in range(len(df1))]
df1['value'] = df1['name'] + "; " + df1['coordinates'] 

df1[['destination', 'value']][1:].set_index("destination").to_dict()['value']

{'destination1': 'CRESTVIEW RV PARK; 30.7190163500001, -86.5716222299999',
 'destination2': 'HOMESTEAD TRAILER PARK; 30.5115772500001, -86.4628417499999',
 'destination3': 'HOUSTON PARK MOBILE HOME PARK; 30.4424195300001, -86.64733076',
 'destination4': 'HUDSON MOBILE HOME PARK; 30.7629176200001, -86.5928893399999',
 'destination5': 'BEACH DRIVE MOBILE HOME PARK; 30.44417349, -86.59951996',
 'destination6': 'EVANS TRAILER PARK; 30.4427800300001, -86.62941091'}
Sreeram TP
  • 11,346
  • 7
  • 54
  • 108
1

The dict comprehension in your example has two for-loops:

d1 = {
    f"destination{k}":v + "; " + i
    for k in range(1, len(df1)-1)
    for v,i in zip(df1.name, df1.coordinates)
}

In these loops, k is being iterated independently from v and i. There are a number of issues with the second loop (to understand them, just step through the operation df1.name, df1.coordinates, and zip(df1.name, df1.coordinates) to see how this doesn't work - note that df1.name is a reserved attribute and refers to the dataframe's name, not to the column "name").

What you really want is to loop over multiple elements in df1 for each row. To do this, just use the first loop, but access the elements you want from the df when building the values:

d1 = {
    f"destination{k}": (df1.loc[k, 'name'] + "; " + df1.loc[k, 'coordinates'])
    for k in range(1, len(df1)-1)
}

Check out this FullStack Python guide's section on comprehensions for more info.

Alternatively, (and preferably) use pandas!

d1 = pd.Series(
    df1['name'] + '; ' + df['coordinates'],
    index=('destination' + df.index.astype(str)),
)

If at this point you really want a dictionary, you can convert the series to a dictionary with d1 = d1.to_dict()

Michael Delgado
  • 13,789
  • 3
  • 29
  • 54