How to get list of strings from list-like string that includes nan?

Question

Here is toy-example, I've string like this:

import numpy as np
z = str([np.nan, "ab", "abc"])

Printed it looks like "[nan, 'ab', 'abc']" but I've to process z = str([np.nan, "ab", "abc"])

I want to get from z list of strings excluding nan:

zz = ["ab", "abc"]

To be clear: z is input (string, that look list-like), zz is wanted output (list)

There is no problem if z doesn't contain nan, in such ast.literal_eval(z) do the job, but with nan I get error about malformed node or string.

Note: np.nan doesn't have to be first.

The title of your question says "includes nan" but the body of your question says "excluding nan". Is your question how to filter a list removing all NaN values? — Josh Bothun, May 11 '22 at 09:18
Hi, I've mean that input string contains "nan" but output list doesn't contain it — Quant Christo, May 11 '22 at 09:20
Why are you *building* the example string that way? Is the actual string built that way as well? Could you remove nans *before* building the string? — Kelly Bundy, May 11 '22 at 09:22
@KellyBundy this is simplified example, I've obtain such strings when I read csv file from pandas that contained lists of strings in the column — Quant Christo, May 11 '22 at 09:27
@QuantChristo I'm not familiar with pandas, but perhaps you could fix that process? Sounds like the [XY problem](https://xyproblem.info/). — Kelly Bundy, May 11 '22 at 09:29
@KellyBundy I misread that bit, now the question is why do they have a string... — Jacques Gaudin, May 11 '22 at 09:31
do you expect to have substrings like `"abc nan, def"` in your original string? — mozway, May 11 '22 at 09:36
@mozway I don't expect to have nan substrings but I can't exclude such case — Quant Christo, May 11 '22 at 09:42
@KellyBundy Unfortunately I don't have control how those csv files were created but I need to parse them — Quant Christo, May 11 '22 at 09:43

score 2 · Answer 1 · answered May 11 '22 at 09:45

As I understand it, your goal is to parse csv or similar.

If you want a trade-off solution that should work in most cases, you can use a regex to get rid of the "nan". It will fail on the strings that contain the substring nan, (with comma), but this seems to be a reasonably unlikely edge case. Worth to explode with you real data.

z = str([np.nan, "ab", np.nan, "nan,", "abc", "x nan , y", "x nan y"])

import re
literal_eval(re.sub(r'\bnan\s*,\s*', '', z))

output: ['ab', '', 'abc', 'x y', 'x nan y']

score 1 · Answer 2 · answered May 11 '22 at 09:35

ast.literal_eval is suggested over eval exactly because it allows a very limited set of statements. As stated in the docs: "Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, None and Ellipsis." np.nan is none of those so it cannot be evaluated. There are few choices to handle this.

Remove nan by operating on the string before doing evaluation on it. Might be problematic if you want to avoid also removing nan from inside the actual strings.
NOT ADVISED - SECURITY RISKS - standard eval can handle this if you define nan variable in the namespace
And finally, I think the best choice but also hardest to implement: like explained here, you take the source code for ast, subclass it and reimplement literal_eval in such a way that it knows how to handle nan string on it's own.

score 0 · Accepted Answer · answered May 11 '22 at 09:31

0

What about:

eval(z,{'nan':'nan'}) # if you can tolerate then: 
[i for i in eval(z,{'nan':'nan'}) if i != 'nan']

It may have security considerations.

answered May 11 '22 at 09:31

Jacek Błocki

452
3
9

This is perfectly fine :) – Quant Christo May 11 '22 at 09:38
1

No, `eval` is rarely "perfectly fine" – matszwecja May 11 '22 at 09:39

score -2 · Answer 4 · answered May 11 '22 at 09:24

-2

Use filter() function:

list(filter(lambda f: type(f)==str, z))

answered May 11 '22 at 09:24

pedram

335
1
4
19

score -3 · Answer 5 · answered May 11 '22 at 09:19

-3

Many Solutions one of these is

z = [nan, 'string', 'another_one']
string_list = []

for item in z :
    # find the object come from str Class and Append it to the list
    if item.__class__ == str:
            string_list.append(item)

answered May 11 '22 at 09:19

Ayman

363
2
9

This is not a solution to my problem. input `z` is string not list. – Quant Christo May 11 '22 at 09:21
Okay I will Update it – Ayman May 11 '22 at 09:22

How to get list of strings from list-like string that includes nan?

5 Answers5