What is the difference between "t = V1+' ~ '+V2" and "t = V1+' ~ '+V2"? I am getting error "invalid non-printable character U+00A0" with one of it

Question

Trying to work on an Statistical test and below is the code with error.

if (data[V1].dtypes == 'float64') or (data[V1].dtypes == 'int64'):
    if (data[V2].dtypes == 'float64') or (data[V2].dtypes == 'int64'):
        corre=data[V1].corr(data[V2])
        print ('Correlation between', V1, 'and',V2,'is',round(corre,2))
    else:
        if (data[V2].dtypes == 'object'):
            #issue with V2, V1
            t = V1+' ~ '+V2
            model = ols(t , data=data).fit()
            anovres = sm.stats.anova_lm(model, typ=2)
            print(anovres)
        else:
            print('invalid type')
else:
    if (data[V1].dtypes == 'object'):
        if (data[V2].dtypes == 'float64') or (data[V2].dtypes == 'int64'):
            #issue with V2, V1
            t = V1+' ~ '+V2
            model = ols(t , data=data).fit()
            anovres = sm.stats.anova_lm(model, typ=2)
            print(anovres)
        else:
            if (data[V2].dtypes == 'object'):
                data_table=pd.crosstab(data[V1],data[V2])
                Observed_Values = data_table.values
                val=stats.chi2_contingency(data_table)
                Expected_Values=val[3]
                no_of_rows=len(data_table.iloc[0:4,0])
                no_of_columns=len(data_table.iloc[0,0:2])
                ddof=(no_of_rows-1)*(no_of_columns-1)
                alpha=0.05
                from scipy.stats import chi2
                chi_square=sum([(o-e)**2./e for o,e in zip(Observed_Values,Expected_Values)])
                chi_square_statistic=chi_square[0]+chi_square[1]
                p_value=1-chi2.cdf(x=chi_square_statistic,df=ddof)
                print('p-value:',p_value)
                print('significance level:',alpha)
                print('degree of freedom:',ddof)
                if p_value<=alpha:
                    print ('reject H0,There is a relationship between',V1,'and',V2)
                else:
                    print ('reject H0,There is no relationship between', V1, 'and',V2)
            else:
                print('invalid type')
    else:
        print('invalid type')

The Above code is having error in line 8 if I replace it with another data. I am getting the right output Error in One data

Output coming as expected in Other

You seem to have added a [non-breaking space](https://www.compart.com/en/unicode/U+00A0) character to your code. Remove it. PS: In the future, don't show error messages as screenshots. They're text. Give us the text. — gspr, Dec 05 '22 at 17:20
You don't need all of that code to reproduce the problem. When the error complains about an unprintable unicode character, you can do a search like "uniocode U+00A0" to find out what that character is. Its a non-breaking space, which looks a lot like a regular space but with some extra display rules implied. The python parser doesn't recognize this character as a space, so breaks. You've used some sort of text editor that inserted that NBSP - so don't use that tool any more. Or maybe there was some html that you pasted. The solution is to delete nbsp and use the regular space bar. — tdelaney, Dec 05 '22 at 17:25
I think it's hilarious that a non-breaking space does in fact break your code. SCNR. — Robert, Dec 05 '22 at 17:29

score 0 · Answer 1 · answered Dec 05 '22 at 17:37

"invalid non-printable character U+00A0" says it all. That's the Unicode Non-breaking space character. You can use it to control how spaces are displayed. A regular space character is U+0020. Python doesn't recognize that character as valid white space, so parsing breaks.

The solution is to delete that character and use a regular space instead.

What is the difference between "t = V1+' ~ '+V2" and "t = V1+' ~ '+V2"? I am getting error "invalid non-printable character U+00A0" with one of it

1 Answers1