1

I have a lot of strings, some of which consist of 1 sentence and some consisting of multiple sentences. My goal is to determine which one-sentence strings end with an exclamation mark '!'.

My code gives a strange result. Instead of returning '1' if found, it returns 1.0. I have tried: return int(1) but that does not help. I am fairly new to coding and do not understand, why is this and how can I get 1 as an integer?

'Sentences'                                                                        
0  [This is a string., And a great one!]      
1  [It's a wonderful sentence!]
2  [This is yet another string!]
3  [Strange strings have been written.]                
4  etc. etc.                                  

e = df['Sentences']

def Single(s):
    if len(s) == 1: # Select the items with only one sentence
        count = 0
        for k in s: # loop over every sentence
            if (k[-1]=='!'): # check if sentence ends with '!'
                count = count+1
        if count == 1: 
        return 1
    else:
        return '' 

df['Single'] = e.apply(Single)

This returns the the correct result, except that there should be '1' instead of '1.0'.

'Single'                                                                        
0  NaN
1  1.0
2  1.0
3                                  
4  etc. etc.  

Why does this happen?

jpp
  • 159,742
  • 34
  • 281
  • 339
twhale
  • 725
  • 2
  • 9
  • 25
  • Possible duplicate of [Convert floats to ints in Pandas?](https://stackoverflow.com/questions/21291259/convert-floats-to-ints-in-pandas) – mkrieger1 May 08 '18 at 11:37
  • 1
    Your function IS returning `1` - but at no point are you actually looking at the return value; you're only seeing the value as retrieved from your dataframe. I'm not familiar with Pandas (I assume that's where `df` is coming from), but I guess it decided that floats were the most appropriate datatype for representing a mixture of numbers and empty strings. – jasonharper May 08 '18 at 11:38

2 Answers2

2

The reason is np.nan is considered float. This makes the series of type float. You cannot avoid this unless you want your column to be of type Object [i.e. anything]. This is inefficient and inadvisable, and I refuse to show you how to do this.

If there is an alternative value you can use instead of np.nan, e.g. 0, then there is a workaround. You can replace NaN values with 0 and then convert to int:

s = pd.Series([1, np.nan, 2, 3])

print(s)
# 0    1.0
# 1    NaN
# 2    2.0
# 3    3.0
# dtype: float64

s = s.fillna(0).astype(int)

print(s)
# 0    1
# 1    0
# 2    2
# 3    3
# dtype: int32
jpp
  • 159,742
  • 34
  • 281
  • 339
0

Use astype(int)

Ex:

df['Single'] = e.apply(Single).astype(int)
Rakesh
  • 81,458
  • 17
  • 76
  • 113