1

I have a data frame and one column consists of list value. I have attached the picture in excel format and data frame as well.

column
"[
""Hello""
]"
"[
""Hello"", 
 ""Hi""
]"
"[
""Hello"", 
 ""Hi"",
 """"
]"
"[
"""",
""Hello"", 
 ""Hi""
]"
"[
""Hello"",
""""
]"
"[
"""",
""Hello""

]"

1][1]enter image description hereThe column value looks like

column
------
[\n "Hello" \n]
[\n "Hello", \n "Hi"\n]
[\n "Hello", \n "Hi"\n, \n ""\n]
[\n ""\n, \n "Hello", \n "Hi"\n]
[\n "Hello" \n, \n ""\n]
[\n ""\n, \n "Hello" \n]

So, I want to remove \n and "" from the list and have value as

column
------
["Hello"]
["Hello", "Hi"]
["Hello", "Hi"]
["Hello", "Hi"]
["Hello"]
["Hello"]

So, how can we obtain following result using pandas and python?

Bad Coder
  • 177
  • 11

2 Answers2

0

I'm not sure how to handle the input data that you have because that is not correctly formatted Python. However, I think there are a couple of ways to solve the problem.

Input data (as correct Python)

column = [
    ['\n "Hello" \n'],
    ['\n "Hello"', '\n "Hi"\n'],
    ['\n "Hello"',' \n "Hi"\n', '\n ""\n'],
    ['\n ""\n', '\n "Hello"', '\n "Hi"\n'],
    ['\n "Hello" \n', '\n ""\n'],
    ['\n ""\n', '\n "Hello" \n']
]

Code: First map then List Comprehension

The map removes the whitespace including the newline \n characters. The list comprehension then removes the empty entries from each row ("").

def stripper(text):
    return text.strip().strip('"')

for row in column:
    output = list(map(stripper, row))
    print([i for i in output if i])

Output

['Hello']
['Hello', 'Hi']
['Hello', 'Hi']
['Hello', 'Hi']
['Hello']
['Hello']

Note that the end result has single quotes rather than double quotes. Let me know if this matters for what you're doing.

For fun

Just for fun, I took your input data absolutely literally, and wrote a set of replacements to result in exactly the output you have in the question.

Input data

column = r"""[\n "Hello" \n]
[\n "Hello", \n "Hi"\n]
[\n "Hello", \n "Hi"\n, \n ""\n]
[\n ""\n, \n "Hello", \n "Hi"\n]
[\n "Hello" \n, \n ""\n]
[\n ""\n, \n "Hello" \n]""".splitlines()

Code

for row in column:
    print(row.replace('\\n "', '"').replace('" \\n', '"').replace('""\\n, ', '').replace(', ""\\n', '').replace('"\\n', ''))

Output

["Hello"]
["Hello", "Hi]
["Hello", "Hi]
["Hello", "Hi]
["Hello"]
["Hello"]
Utkonos
  • 631
  • 6
  • 21
  • Thank you for the comment. So, when I tried to do ``` for row in df.column: df['column']=row.replace('\n','"') ``` I got an error message `AttributeError: 'list' object has no attribute 'replace'`. I have been getting this error all day. – Bad Coder Nov 03 '22 at 01:10
  • You are applying the `.replace()` to the `list`. You need to apply the `.replace()` to each member of the list. – Utkonos Nov 03 '22 at 01:24
  • what is the correct code to apply to replace each member of the list? I tried that but failed – Bad Coder Nov 03 '22 at 01:47
  • Use the first one at the top: `map`. Write a function that performs the replace, and then map each row to that function. Look at the code that I wrote as the example. – Utkonos Nov 03 '22 at 01:51
  • So, I did `def stripper(text): return text.replace('\n', '"') for row in df.column: output = list(map(stripper, row)) df['column']=[i for i in output if i] ` It returned with error `ValueError: Length of values does not match length of index` – Bad Coder Nov 03 '22 at 01:59
  • You haven't provided any code, so it's not easy to help you. Can you provide the whole code your using and a sample data input? – Utkonos Nov 03 '22 at 02:27
  • I have edited the question and paste the image for better question - @Utkonos – Bad Coder Nov 03 '22 at 05:05
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/249293/discussion-between-utkonos-and-bad-coder). – Utkonos Nov 03 '22 at 14:23
0

Taking an example that you provided with a dataframe df with column name column, we will use the following code snippet

def remove_empty_line(row):
    updated_list = list()
    for elem in row:
        updated_list.append(elem.replace("\n", "").strip())
return updated_list

df["column"] = df["column"].apply(lambda row: remove_empty_line(row))

Now you can check your df with df.head()

Dhruv Awasthi
  • 149
  • 1
  • 4