0

I have a dataset and one of the columns contains integers for some rows and strings for other rows. The column type is object.

e.g:

Index     Column of interest
1         21678849
2         37464859
3         barbara
4         28394821
5         francis

I can't force the column to change type using .astype('str'). And I am unable to use .isstring, .isdigit, or .isinstance. I've tried looking at solutions for converting on objects to string but these don't seem to work.

I've also tried:

[True if x.isin([1,2,3,4,5,6,7,8,9,0]) else False for x in df['column_of_interest']]

But that just gives me: AttributeError: 'str' object has no attribute 'isin'

Anyone have any other ideas of how I can manage this?

Ideally I would like to create a third column that categorises whether the row is an int or a str.

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
Jameson
  • 167
  • 2
  • 11
  • 2
    you can use `pd.to_numeric(df['Column of interest'],errors='coerce')` will force your strings into nulls. – Umar.H Oct 08 '20 at 14:56
  • So this could also work. I would then just need to add another step to make the column that identifies whether something is int or Null. – Jameson Oct 08 '20 at 15:48
  • 1
    you could chain it into one, `df['DataType'] = np.where(pd.to_numeric(df['Column of interest'],errors='coerce').isnull(), 'Text','Number')` lots of ways to do this, pandas has built in datatypes you could always leverage those too. – Umar.H Oct 08 '20 at 19:36

4 Answers4

2

You can try is instance:

[isinstance(x, int) for x in df['column_of_interest']]
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Thanks. Unfortunately, all the values come out as False. It seems the dtype is stuck in some limbo. – Jameson Oct 08 '20 at 15:04
  • @Jameson do `df.to_dict()` and paste the output instead of your data. It's hard to tell which is text which is not. – Quang Hoang Oct 08 '20 at 15:07
  • Sorry the reason I couldn't share the actual df is because there was a lot of personal information on it. – Jameson Oct 08 '20 at 15:45
1

Okay, this works and I tested it:

import pandas as pd

#----------------------------------------
# Prepare the data in df.
#----------------------------------------

from io import StringIO

TESTDATA = StringIO("""Index;column_of_interest
1;21678849
2;37464859
3;barbara
4;28394821
5;francis""")

df = pd.read_csv(TESTDATA, sep=";")

#----------------------------------------
# The actual code to solve the problem.
#----------------------------------------

def is_integer(x):
    try:
        int(x)
        return True
    except ValueError:
        return False

print([is_integer(x) for x in df['column_of_interest']])

Output is

[True, True, False, True, False]

Of course some of the code doesn't apply to you, but I wanted a full working example which I (and others) could actually test. I assume you can pick out what you need from it.

The code to test for integerness was taken from https://stackoverflow.com/a/1267145/1629102.

And finally code that adds the data as a new column:

import pandas as pd

#----------------------------------------
# Prepare the data in df.
#----------------------------------------

from io import StringIO

TESTDATA = StringIO("""Index;column_of_interest
1;21678849
2;37464859
3;barbara
4;28394821
5;francis""")

df = pd.read_csv(TESTDATA, sep=";")

#----------------------------------------
# The actual code to solve the problem.
#----------------------------------------

def is_integer(x):
    try:
        int(x)
        return True
    except ValueError:
        return False

is_integer_list = [is_integer(x) for x in df['column_of_interest']]

df["Is_integer"] = is_integer_list

print(df)

with this output:

   Index column_of_interest  Is_integer
0      1           21678849        True
1      2           37464859        True
2      3            barbara       False
3      4           28394821        True
4      5            francis       False
Jesper
  • 1,611
  • 13
  • 10
  • You're welcome, Jameson! And thanks for prompting me to learn a little bit about pandas! I heard it mentioned many times, but this was my first time actually trying it. – Jesper Oct 08 '20 at 15:46
0

Try this-

[True if x in [1,2,3,4,5,6,7,8,9,0] else False for x in df['column_of_interest']]
Shradha
  • 2,232
  • 1
  • 14
  • 26
  • Ok, so the code worked. Thank you! Unfortunately, the output are all False values. – Jameson Oct 08 '20 at 14:54
  • That code does not work. It tests whether x is a single digit number, i.e. betwen 0 and 9. x.is_integer() likely works better, cf. my answer. – Jesper Oct 08 '20 at 15:00
0

I admit I don't know pandas, but from reading about it I boldly suggest using

x.is_integer()

instead of

x.isin([1,2,3,4,5,6,7,8,9,0])

So the code would be

[x.is_integer() for x in df['column_of_interest']]
Jesper
  • 1,611
  • 13
  • 10
  • Thanks for the offer, but it doesn't work. The error is "AttributeError: 'str' object has no attribute 'is_integer'". – Jameson Oct 08 '20 at 15:01
  • Ah, x is a string (type str). Then you could check whether it is an integer with `bool(re.match(r"^\d+$", x))` and `import re` earlier in the code. – Jesper Oct 08 '20 at 15:09
  • Well, maybe I should just back out, as I don't know enough about pandas and shouldn't suggest too much that turns out not to work! – Jesper Oct 08 '20 at 15:16
  • Okay, I didn't give up after all. Instead I read further about pandas and came up with an [answer](https://stackoverflow.com/a/64265784/1629102) that I actually tested and actually works. – Jesper Oct 08 '20 at 15:40
  • 1
    Hahahah I love the enthusiasm! – Jameson Oct 08 '20 at 15:44