-2

Here is toy-example, I've string like this:

import numpy as np
z = str([np.nan, "ab", "abc"])

Printed it looks like "[nan, 'ab', 'abc']" but I've to process z = str([np.nan, "ab", "abc"])

I want to get from z list of strings excluding nan:

zz = ["ab", "abc"]

To be clear: z is input (string, that look list-like), zz is wanted output (list)

There is no problem if z doesn't contain nan, in such ast.literal_eval(z) do the job, but with nan I get error about malformed node or string.

Note: np.nan doesn't have to be first.

Quant Christo
  • 1,275
  • 9
  • 23

5 Answers5

2

As I understand it, your goal is to parse csv or similar.

If you want a trade-off solution that should work in most cases, you can use a regex to get rid of the "nan". It will fail on the strings that contain the substring nan, (with comma), but this seems to be a reasonably unlikely edge case. Worth to explode with you real data.

z = str([np.nan, "ab", np.nan, "nan,", "abc", "x nan , y", "x nan y"])

import re
literal_eval(re.sub(r'\bnan\s*,\s*', '', z))

output: ['ab', '', 'abc', 'x y', 'x nan y']

mozway
  • 194,879
  • 13
  • 39
  • 75
1

ast.literal_eval is suggested over eval exactly because it allows a very limited set of statements. As stated in the docs: "Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, None and Ellipsis." np.nan is none of those so it cannot be evaluated. There are few choices to handle this.

  • Remove nan by operating on the string before doing evaluation on it. Might be problematic if you want to avoid also removing nan from inside the actual strings.
  • NOT ADVISED - SECURITY RISKS - standard eval can handle this if you define nan variable in the namespace
  • And finally, I think the best choice but also hardest to implement: like explained here, you take the source code for ast, subclass it and reimplement literal_eval in such a way that it knows how to handle nan string on it's own.
matszwecja
  • 6,357
  • 2
  • 10
  • 17
0

What about:

eval(z,{'nan':'nan'}) # if you can tolerate then: 
[i for i in eval(z,{'nan':'nan'}) if i != 'nan']

It may have security considerations.

Jacek Błocki
  • 452
  • 3
  • 9
-2

Use filter() function:

list(filter(lambda f: type(f)==str, z))
pedram
  • 335
  • 1
  • 4
  • 19
-3

Many Solutions one of these is

z = [nan, 'string', 'another_one']
string_list = []

for item in z :
    # find the object come from str Class and Append it to the list
    if item.__class__ == str:
            string_list.append(item)
Ayman
  • 363
  • 2
  • 9