2

I am new to Julia and I have a Python function that I want to use in Julia. Basically what the function does is to accept a dataframe (passed as a numpy ndarray), a filter value and a list of column indices (from the array) and run a logistic regression using the statsmodels package in Python. So far I have tried this:

using PyCall

py"""
import pandas as pd
import numpy as np
import random
import statsmodels.api as sm
import itertools
def reg_frac(state, ind_vars):
    rows = 2000
    total_rows = rows*13
    data = pd.DataFrame({
    'state': ['a', 'b', 'c','d','e','f','g','h','i','j','k','l','m']*rows, \
    'y_var': [random.uniform(0,1) for i in range(total_rows)], \
    'school': [random.uniform(0,10) for i in range(total_rows)], \
    'church': [random.uniform(11,20) for i in range(total_rows)]}).to_numpy()
    try:
        X, y = sm.add_constant(np.array(data[(data[:,0] == state)][:,ind_vars], dtype=float)), np.array(data[(data[:,0] == state), 1], dtype=float)
        model = sm.Logit(y, X).fit(cov_type='HC0', disp=False)      
        rmse = np.sqrt(np.square(np.subtract(y, model.predict(X))).mean())
    except:
        rmse = np.nan
    return [state, ind_vars, rmse] 
"""

reg_frac(state, ind_vars) = (py"reg_frac"(state::Char, ind_vars::Array{Any}))

However, when I run this, I don't expect the results to be NaN. I think it is working but I am missing something.

reg_frac('b', Any[i for i in 2:3])

  0.000244 seconds (249 allocations: 7.953 KiB)
3-element Array{Any,1}:
    'b'
    [2, 3]
 NaN

Any help is appreciated.

Przemyslaw Szufel
  • 40,002
  • 3
  • 32
  • 62
Kay
  • 2,057
  • 3
  • 20
  • 29
  • Does the code work in Python (without calling it from Julia)? You've added an `except` clause that sets `rmse` to `np.nan`, so it wouldn't be too suprising if it ended up being NaN. Also any reason you don't just fit the logit model in Julia? – Nils Gudat Sep 01 '20 at 12:45
  • Yes the code works in python. model is just an example. I have the model in Julia. I just want to be able to import python functions as part of my Julia journey. – Kay Sep 01 '20 at 12:55
  • @PrzemyslawSzufel It works in Python. I just run it and it works. You sure you run it well? Just run `reg_frac('b',[2,3])`. This was my answer `['b', [2, 3], 0.28999238875117006]` – Kay Sep 02 '20 at 02:15
  • It does not work. And it can't work because there are several variables undefined in your code such as `rows` or `total_rows`. – Przemyslaw Szufel Sep 02 '20 at 10:08
  • @PrzemyslawSzufel you are right. My bad, I had those variables already loaded. I have updated the post – Kay Sep 02 '20 at 15:18

1 Answers1

1

In Python code you have strs while in Julia code you have Chars - it is not the same.

Python:

>>> type('a')
<class 'str'>

Julia:

julia> typeof('a')
Char

Hence your comparisons do not work. Your function could look like this:

reg_frac(state, ind_vars) = (py"reg_frac"(state::String, ind_vars::Array{Any}))

And now:

julia> reg_frac("b", Any[i for i in 2:3])
3-element Array{Any,1}:
  "b"
  [2, 3]
 0.2853707270515166

However, I recommed using Vector{Float64} that in PyCall gets converted in-flight into a numpy vector rather than using Vector{Any} so looks like your code still could be improved (depending on what you are actually planning to do).

Przemyslaw Szufel
  • 40,002
  • 3
  • 32
  • 62
  • Oh wow, you made my day! I am actually going to be looping through this for millions of combinations of `state` and `ind_vars` so any performance improvements will be very helpful. I am interested in where to use the `Vector{Float64}`. Again, how best do I restructure my code to for performance / speed – Kay Sep 03 '20 at 16:16
  • Do you mean `reg_frac(state, ind_vars) = (py"reg_frac"(state::String, ind_vars::Vector{Float64}))`? – Kay Sep 03 '20 at 16:35
  • 1
    yes something like that. You could also write `reg_frac(state::String, ind_vars::Vector{Float64}) = (py"reg_frac"(state, ind_vars))` which is what is normally done. – Przemyslaw Szufel Sep 03 '20 at 18:20