-1

So, I have a data frame given below:

import pandas as pd

df = pd.DataFrame(
    {
        "id": [8233037, 8233313],
        "geometry": [
            "{'type': 'MultiLineString', 'coordinates': [[[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612386, -6.93127], [107.612681, -6.930843], [107.612796, -6.930606], [107.61285, -6.930428], [107.612932, -6.929788], [107.612968, -6.929409], [107.613061, -6.928426], [107.613064, -6.927535], [107.613086, -6.927077], [107.613084, -6.926826], [107.612923, -6.926555], [107.612596, -6.926166], [107.611872, -6.925277], [107.61177, -6.925134], [107.611729, -6.925015], [107.611715, -6.92489], [107.611715, -6.92473], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]]]}",
            "{'type': 'MultiLineString', 'coordinates': [[[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832], [107.625456, -6.910273], [107.625764, -6.910353], [107.625871, -6.910358], [107.626035, -6.910264]]]}",
        ],
    }
)
df

I want to have the results in the og dataframe with some single line strings separately, such as [107.625764, -6.910353], [107.625871, -6.910358], split to 107.625764, -6.910353 . The detail of expected results are in the picture below. Expected Results

All I know that we can apply str.split method with specifying any specific delimiter. The method as follows:

df[
    ["coordinate1", "coordinate2", "coordinate3", "coordinate4", "coordinate-n"]
] = df.geometry.str.split(
    " ",
    expand=True,
)

Problem: I don't know the proper delimiter that I should put after str.split(" ").

How to manipulate the value in a column of dataframe until I get my expected table, such as in the picture below? Expected Results

Laurent
  • 12,287
  • 7
  • 21
  • 37
Anwar San
  • 93
  • 10
  • your `geometry` looks like JSON string so first you could use module `json` to convert it from string to normal list/dict. And late you can simpler access values – furas Jul 23 '21 at 03:06
  • you show too many values on images - I don't understand which values you expect. – furas Jul 23 '21 at 03:07

2 Answers2

1

I have a solution with pure python.
first the geometry is a JSON like string,but it has syntax error which the key is single quote,JSON need double quotes,so I parse it by yaml.
Then I just need to format it to list data

import yaml

df = pd.DataFrame({ 'id':[8233037,8233313],
                    'geometry': ["{'type': 'MultiLineString', 'coordinates': [[[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612386, -6.93127], [107.612681, -6.930843], [107.612796, -6.930606], [107.61285, -6.930428], [107.612932, -6.929788], [107.612968, -6.929409], [107.613061, -6.928426], [107.613064, -6.927535], [107.613086, -6.927077], [107.613084, -6.926826], [107.612923, -6.926555], [107.612596, -6.926166], [107.611872, -6.925277], [107.61177, -6.925134], [107.611729, -6.925015], [107.611715, -6.92489], [107.611715, -6.92473], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]]]}","{'type': 'MultiLineString', 'coordinates': [[[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832], [107.625456, -6.910273], [107.625764, -6.910353], [107.625871, -6.910358], [107.626035, -6.910264]]]}"]})

data = []
for _,row in df.iterrows():
    id = row['id']
    geo = yaml.load(row['geometry'])['coordinates']
    geos = []
    for g in geo:
        geos += g
    data += [[id,g[0],g[1]] for g in geos]

df_new = pd.DataFrame(data,columns=['id','latitude','longtitude'])
df_new                    
    id      latitude    longtitude
0   8233037 107.612018  -6.921755
1   8233037 107.611888  -6.923030
2   8233037 107.611715  -6.924730
3   8233037 107.611715  -6.924890
4   8233037 107.611729  -6.925015
... ... ... ...
199 8233313 107.623747  -6.909832
200 8233313 107.625456  -6.910273
201 8233313 107.625764  -6.910353
202 8233313 107.625871  -6.910358
203 8233313 107.626035  -6.910264

204 rows × 3 columns
nay
  • 1,725
  • 1
  • 11
  • 11
  • Thank you, this simple syntax is all I need, understandable both syntax and your explanation. I don't know yaml yet, but I think it's really powerful. You gave the simple way. Thank you it works for me. – Anwar San Jul 24 '21 at 10:35
0

geometry looks like JSON string so first I would use module json to convert it from string to normal list/dict. And late you can simpler access values.

But it is not correct JSON so I can use module dirtyjson for this

df['data'] = df['geometry'].apply(lambda row:dirtyjson.loads(row))
print(df['data'])

Or (luckly) I can replace ' with " to get correct JSON

df['data'] = df['geometry'].apply(lambda row:json.loads(row.replace("'", '"')))

Result

0    {'type': 'MultiLineString', 'coordinates': [[[...
1    {'type': 'MultiLineString', 'coordinates': [[[...
Name: data, dtype: object

Next I get only coordinates

df['coordinates'] = df['data'].apply(lambda row:row['coordinates'])
print(df['coordinates'])

Result

0    [[[107.612018, -6.921755], [107.611888, -6.923...
1    [[[107.614077, -6.91033], [107.614837, -6.9100...
Name: coordinates, dtype: object

It is nested list so I flaten it

def flatten(row):
    result = []
    for item in row:
        result += item
    return result

df['coordinates'] = df['coordinates'].apply(flatten)
print(df['coordinates'])

Or I can use sum() with [] as start value

df['coordinates'] = df['coordinates'].apply(lambda row: sum(row, []))

Result

0    [[107.612018, -6.921755], [107.611888, -6.9230...
1    [[107.614077, -6.91033], [107.614837, -6.91005...
Name: coordinates, dtype: object

Now I can explode it to put every pair in separated row with id

df = df.explode('coordinates')
print(df[['id', 'coordinates']])

Result

         id              coordinates
0   8233037  [107.612018, -6.921755]
0   8233037   [107.611888, -6.92303]
0   8233037   [107.611715, -6.92473]
0   8233037   [107.611715, -6.92489]
0   8233037  [107.611729, -6.925015]
..      ...                      ...
1   8233313  [107.623747, -6.909832]
1   8233313  [107.625456, -6.910273]
1   8233313  [107.625764, -6.910353]
1   8233313  [107.625871, -6.910358]
1   8233313  [107.626035, -6.910264]

[204 rows x 2 columns]

And I can use Series to conver coordinates into two rows

df[ ['lat', 'long'] ] = df['coordinates'].apply(pd.Series)
print(df)

Result:

         id         lat      long
0   8233037  107.612018 -6.921755
0   8233037  107.611888 -6.923030
0   8233037  107.611715 -6.924730
0   8233037  107.611715 -6.924890
0   8233037  107.611729 -6.925015
..      ...         ...       ...
1   8233313  107.623747 -6.909832
1   8233313  107.625456 -6.910273
1   8233313  107.625764 -6.910353
1   8233313  107.625871 -6.910358
1   8233313  107.626035 -6.910264

[204 rows x 3 columns]

Full working code

import pandas as pd
import json
import dirtyjson

df = pd.DataFrame({
    'id': [8233037, 8233313],
    'geometry': ["{'type': 'MultiLineString', 'coordinates': [[[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612018, -6.921755], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]], [[107.612386, -6.93127], [107.612681, -6.930843], [107.612796, -6.930606], [107.61285, -6.930428], [107.612932, -6.929788], [107.612968, -6.929409], [107.613061, -6.928426], [107.613064, -6.927535], [107.613086, -6.927077], [107.613084, -6.926826], [107.612923, -6.926555], [107.612596, -6.926166], [107.611872, -6.925277], [107.61177, -6.925134], [107.611729, -6.925015], [107.611715, -6.92489], [107.611715, -6.92473], [107.611888, -6.92303], [107.611715, -6.92473], [107.611715, -6.92489], [107.611729, -6.925015], [107.61177, -6.925134], [107.611872, -6.925277], [107.612596, -6.926166], [107.612923, -6.926555], [107.613084, -6.926826], [107.613086, -6.927077], [107.613064, -6.927535], [107.613061, -6.928426], [107.612968, -6.929409], [107.612932, -6.929788], [107.61285, -6.930428], [107.612796, -6.930606], [107.612681, -6.930843], [107.612386, -6.93127]]]}","{'type': 'MultiLineString', 'coordinates': [[[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.614077, -6.91033], [107.614837, -6.910057], [107.615055, -6.909996], [107.615596, -6.909811], [107.616151, -6.909611], [107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.617315, -6.90917], [107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832]], [[107.618309, -6.908848], [107.618488, -6.908803], [107.618645, -6.908796], [107.61901, -6.908853], [107.620936, -6.909341], [107.621119, -6.909319], [107.621369, -6.909287], [107.623747, -6.909832], [107.625456, -6.910273], [107.625764, -6.910353], [107.625871, -6.910358], [107.626035, -6.910264]]]}"]
    })

#df['data'] = df['geometry'].apply(lambda row:dirtyjson.loads(row))
df['data'] = df['geometry'].apply(lambda row:json.loads(row.replace("'", '"')))
print(df['data'])

df['coordinates'] = df['data'].apply(lambda row:row['coordinates'])
print(df['coordinates'])

def flatten(row):
    result = []
    for item in row:
        result += item
    return result

#df['coordinates'] = df['coordinates'].apply(flatten)
df['coordinates'] = df['coordinates'].apply(lambda row: sum(row, []))
print(df['coordinates'])

df = df.explode('coordinates')
print(df[['id', 'coordinates']])

df[ ['lat', 'long'] ] = df['coordinates'].apply(pd.Series)
print(df[['id', 'lat', 'long']])
furas
  • 134,197
  • 12
  • 106
  • 148
  • Very detail! The steps you've explained are understandable. I will use these steps for another case if any similarity. But I've got the solution with simple way, using `yaml` to parse the syntax error of single quote in the sting. I really appreciate it for your detail steps you explained. – Anwar San Jul 24 '21 at 10:42