I had a set of parameters that I manually (I want it manual) fitted with the pseudo-inverse using PolynomialFeatures:
poly_feat = PolynomialFeatures(degree=Degree_mdl)
Kern_train = poly_feat.fit_transform(X_train)
c_pinv = np.dot(np.linalg.pinv( Kern_train ),Y_train)
then I had some multivariate polynomial that I simplified with the help of sympy by using the poly class and the coeffs()
function. Though, the coeffs function says it returns the non-zero coeffs in lex order. Thus, I just wanted to know how I can get the order of PolynomialFeatures to match the above so that I can compare coefficients term by term if I needed.
Does someone know how to match the order of the two so that such comparison is possible?
I do know what lexicographical order means from looking a mathematica documentation which I think makes intuitive sense to me (basically the lexicographical order is done by the degree of the monomial so xy x^2 and y^2 all have same order and are "larger" than any degree one term like x or y). However, the detail I think it boils down is to find out how sympy vs PolynomialFeatures order things. Sympy says it orders lexicographically but when I inspect my polynomials it does not seem to obey the order I would have expected (while PolyFeatures does obey some order but then breaks dies in some unknown way to me when it has multiple terms of the same order like x^2y, xy^2, y^2). So this is what came out when I inspected sympy:
(Pdb) s_expr
Poly(-4.92243832500572e-13*x1**3 - 3.86418002630562e-13*x1**2*x2 - 284.848327636719*x1**2 - 1.97301728991142e-13*x1*x2**2 - 11.1939144134521*x1*x2 + 66.1333587984857*x1 - 1.35329085177577e-13*x2**3 - 108.171173095703*x2**2 + 28.227414137076*x2 - 11.0110442095318, x1, x2, domain='RR')
(Pdb) s_expr.coeffs()
[-4.92243832500572e-13, -3.86418002630562e-13, -284.848327636719, -1.97301728991142e-13, -11.1939144134521, 66.1333587984857, -1.35329085177577e-13, -108.171173095703, 28.2274141370760, -11.0110442095318]
(Pdb) s_expr.coeffs()[::-1]
[-11.0110442095318, 28.2274141370760, -108.171173095703, -1.35329085177577e-13, 66.1333587984857, -11.1939144134521, -1.97301728991142e-13, -284.848327636719, -3.86418002630562e-13, -4.92243832500572e-13]
and this is what came out when I inspected PolynomailFeatures:
>>> xx
array([[2, 3]])
>>> poly_feat.fit_transform(xx)
array([[ 1., 2., 3., 4., 6., 9., 8., 12., 18., 27.]])
# maps to the following ordering:
## [1,x1,x2,x1^2,x1x2,x2^2,x1^3,x1^2x2,x1x2^2,x2^3]
so right now I am looking and these and wondering how I can make them have the exact same ordering including when monomials have the same order. Any ideas would be really helpful.
I've taken a look at their source code but I have not been able to digest it fully to understand whats going on (specially on the sympy side). Any help is appreciated!
I only worked examples with degree 3 and 2 input dimensions but it would be nice for it to work for arbitrary inputs and degree.
Bounty section: making 3 dimension with degree 3 work (and above I hope)
I've tried to make the coefficients match for for dimension 3 and for degree 3 but they don't match for some reasons. It doesn't seem PolyFeatures is using either of grevlex
,grlex
if anyone has any idea how to make it would I'd love to hear it. I made the coefficients of the polinomial match what the value should be for that monomial if the inputs where [x3,x2,x1] = [5,3,2]
so for example the coefficient for x3**2
has a coefficient of 25
or if x2*x3**2
has a coefficient of 75
. So the output I get is:
x_poly_feat_list = [1, 2, 3, 5, 4, 6, 10, 9, 15, 25, 8, 12, 20, 18, 30, 50, 27, 45, 75, 125]
poly = Poly(125*x3**3 + 75*x3**2*x2 + 50*x3**2*x1 + 25*x3**2 + 45*x3*x2**2 + 30*x3*x2*x1 + 15*x3*x2 + 20*x3*x1**2 + 10*x3*x1 + 5*x3 + 27*x2**3 + 18*x2**2*x1 + 9*x2**2 + 12*x2*x1**2 + 6*x2*x1 + 3*x2 + 8*x1**3 + 4*x1**2 + 2*x1 + 1, x3, x2, x1, domain='ZZ')
c_grevlex = [1, 2, 3, 5, 4, 6, 10, 9, 15, 25, 8, 12, 20, 18, 30, 50, 27, 45, 75, 125]
c_grlex = [1, 2, 3, 5, 4, 6, 9, 10, 15, 25, 8, 12, 18, 27, 20, 30, 45, 50, 75, 125]
len(c_grlex) 20
len(c_grevlex) 20
len(x_poly_feat_list) 20
all_match_grlex = False
all_match_grevlex = False
which means that it doesn't match.
full code:
from sklearn.preprocessing import PolynomialFeatures
import numpy as np
from sympy import *
# nb monomials (n+d,d), d=degree, n=# of inputs
def check(n,d,user_array=None):
if user_array is None:
x = np.arange(2,2+n).reshape(1,n) # e.g. array([[2, 3]])
else:
x = user_array.reshape(1,n)
#x = np.arange(2,2+n).reshape(1,n) # e.g. array([[2, 3]])
print('x = ', x)
##
poly_feat = PolynomialFeatures(d)
x_poly_feat = poly_feat.fit_transform(x)
##
x_poly_feat_list = [ int(i) for i in x_poly_feat[0]]
#print('x_poly_feat = ', x_poly_feat)
#print('x_poly_feat = ', list(x_poly_feat[0]))
print('x_poly_feat_list = ', x_poly_feat_list)
return x_poly_feat_list
def check_sympy_degree():
x3,x2,x1 = symbols('x3 x2 x1')
poly = Poly( 125*x3**3 + 75*x2*x3**2 + 45*x2**2*x3 + 27*x2**3 + 50*x1*x3**2 + 30*x1*x2*x3 + 18*x1*x2**2 + 20*x1**2*x3 + 12*x1**2*x2
+ 8*x1**3 + 25*x3**2 + 15*x2*x3 + 9*x2**2 + 10*x1*x3 + 6*x1*x2 + 4*x1**2 + 2*x1 + 3*x2 + 5*x3 + 1,(x3,x2,x1) )
c_grevlex = poly.coeffs(order='grevlex')
c_grlex = poly.coeffs(order='grlex')
print('poly = ',poly)
print('c_grevlex = ', c_grevlex[::-1])
print('c_grlex = ', c_grlex[::-1])
return c_grlex, c_grevlex
if __name__ == '__main__':
#check(n=2,d=3)
##
x_poly_feat_list = check(n=3,d=3,user_array=np.array([2,3,5]))
##
c_grlex, c_grevlex = check_sympy_degree()
print('len(c_grlex)',len(c_grlex))
print('len(c_grevlex)',len(c_grevlex))
print('len(x_poly_feat_list)',len(x_poly_feat_list))
all_match_grlex = all( c_grlex[i] == x_poly_feat_list for i in range(len(x_poly_feat_list)) )
all_match_grevlex = all( c_grevlex[i] == x_poly_feat_list for i in range(len(x_poly_feat_list)) )
print('all_match_grlex = ',all_match_grlex)
print('all_match_grevlex = ',all_match_grevlex)