This is the raw distribution of the var FREQUENCY
NaN 22131161
1.0 4182626
7.0 218343
3.0 145863
1 59432
0.0 29906
2.0 28129
4.0 15237
5.0 4553
8.0 3617
3 2754
7 2635
9.0 633
2 584
4 276
0 112
8 51
5 42
6.0 19
A 9
I 7
9 6
Q 3
Y 2
X 2
Z 1
C 1
N 1
G 1
B 1
Name: FREQUENCY, dtype: int64
- group 1.0 should be the same as 1. I wrote df['x']=df['x].replace({'1.0:'1'}). it does not change anything. 9.0 vs 9, 3.0 vs.3 have same symptom
- How could frequency be render as int64 where letters are present?
- Desired outcome 1: group all letter groups +NaN into one group. Remaining numeric value groups consolidate (1.0 and 1 =1,for example). In SAS, I just run this : y=1*X. I just give a value of 10 to represent character groups + NaN. How to do it in Python, especially elegantly?
- Outcome 2: extract a binary variable z=1 if x=NaN. Otherwise z=0