overflow encountered in scalar power error (Linear Regression & Gradient Descent With Large Digit)

Question

So i was trying out a manual gradient descent with a large digit and got overflow encountered in scalar power

I use this dataset from kaggle to calculate land price X = LT, Y = Harga https://www.kaggle.com/datasets/wisnuanggara/daftar-harga-rumah

The code i used to input into numpy array:

import os
import openpyxl
from openpyxl import Workbook
import numpy as np

wb = openpyxl.load_workbook('DATA RUMAH.xlsx')
ws = wb.active

y_train_data = np.array([])
x_train_data = np.array([])

def get_x_train():
    x_train = np.array([])  # Initialize x_train as a local variable
    for x in range(2, 1011):
        data = ws.cell(row=x, column=5).value
        x_train = np.append(x_train, data)
    return x_train

def get_y_train():
    y_train = np.array([])  # Initialize y_train as a local variable
    for y in range(2, 1011):
        data = ws.cell(row=y, column=3).value
        y_train = np.append(y_train, data)
    return y_train

Full Code

import math, copy
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('./deeplearning.mplstyle')
from lab_utils_uni import plt_house_x, plt_contour_wgrad, plt_divergence, plt_gradients

# Load our data set
x_train = get_x_train()   #features
y_train = get_y_train()  #target value

#Function to calculate the cost
def compute_cost(x, y, w, b):
   
    m = x.shape[0] 
    cost = 0
    
    for i in range(m):
        f_wb = w * x[i] + b
        cost = cost + (f_wb - y[i])**2
    total_cost = 1 / (2 * m) * cost

    return total_cost

def compute_gradient(x, y, w, b): 
    """
    Computes the gradient for linear regression 
    Args:
      x (ndarray (m,)): Data, m examples 
      y (ndarray (m,)): target values
      w,b (scalar)    : model parameters  
    Returns
      dj_dw (scalar): The gradient of the cost w.r.t. the parameters w
      dj_db (scalar): The gradient of the cost w.r.t. the parameter b     
     """
    
    # Number of training examples
    m = x.shape[0]    
    dj_dw = 0
    dj_db = 0
    
    for i in range(m):  
        f_wb = w * x[i] + b 
        dj_dw_i = (f_wb - y[i]) * x[i] 
        dj_db_i = f_wb - y[i] 
        dj_db += dj_db_i
        dj_dw += dj_dw_i 
    dj_dw = dj_dw / m 
    dj_db = dj_db / m 
        
    return dj_dw, dj_db

def gradient_descent(x, y, w_in, b_in, alpha, num_iters, cost_function, gradient_function):
    """
    Performs gradient descent to fit w,b. Updates w,b by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
        x (ndarray (m,))  : Data, m examples 
        y (ndarray (m,))  : target values
        w_in,b_in (scalar): initial values of model parameters  
        alpha (float):     Learning rate
        num_iters (int):   number of iterations to run gradient descent
        cost_function:     function to call to produce cost
        gradient_function: function to call to produce gradient
      
    Returns:
        w (scalar): Updated value of parameter after running gradient descent
        b (scalar): Updated value of parameter after running gradient descent
        J_history (List): History of cost values
        p_history (list): History of parameters [w,b] 
    """
    
    # Specify data type as np.float64 for w, b
    w = np.float64(w_in)
    b = np.float64(b_in)
    
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    p_history = []
    
    for i in range(num_iters):
        # Calculate the gradient and update the parameters using gradient_function
        dj_dw, dj_db = gradient_function(x, y, w , b)     

        # Update Parameters using equation (3) above
        b = b - alpha * dj_db                            
        w = w - alpha * dj_dw                            

        # Save cost J at each iteration
        J_history.append(cost_function(x, y, w, b))
        p_history.append([w, b])

        # Print cost every at intervals 10 times or as many iterations if < 10
        if i % math.ceil(num_iters/10) == 0:
            print(f"Iteration {i:4}: Cost {J_history[-1]:0.2e} ",
                  f"dj_dw: {dj_dw: 0.3e}, dj_db: {dj_db: 0.3e}  ",
                  f"w: {w: 0.3e}, b: {b: 0.5e}")
 
    return w, b, J_history, p_history  # Return w and J,w history for graphing

# Initialize parameters with np.float64 data type
w_init = np.float64(0)
b_init = np.float64(0)

# Some gradient descent settings
iterations = 100000
tmp_alpha = np.float64(1.0e-4)

# Run gradient descent
w_final, b_final, J_hist, p_hist = gradient_descent(x_train, y_train, w_init, b_init, tmp_alpha,
                                                    iterations, compute_cost, compute_gradient)

# Print the result
print(f"(w, b) found by gradient descent: ({w_final:8.4f}, {b_final:8.4f})")

Output:

    RuntimeWarning: overflow encountered in scalar add
  cost = np.float64(cost + (f_wb - y[i])**2)
    RuntimeWarning: overflow encountered in scalar power
  cost = np.float64(cost + (f_wb - y[i])**2)
    RuntimeWarning: overflow encountered in scalar add
  dj_dw += np.float64(dj_dw_i)
    RuntimeWarning: invalid value encountered in scalar subtract
  w = np.float64(w - alpha * dj_dw)

i tried to normalize but i think it skews the data too much, how do i make it so that the gradient descent can process the huge digit?

What does "huge digit" mean, exactly? How huge? A float64 can hold up to 10**308. — Tim Roberts, Aug 02 '23 at 01:47
@TimRoberts it's 10 digit, but idk why it still overflow even tho i tried using float64 — Fauzan Anggito Wicakson, Aug 02 '23 at 02:09
Overflow could also happen when attempting to perform invalid operations, such as taking the square root of a negative number. It could also occur if the parameters for your algo (GD) is set "wrongly", where numbers are out of proportions. I would suggest you run your algo based on the sklearn default settings to see if your algo works, before tweaking with it further. — ripalo, Aug 02 '23 at 03:06
Are you absolutely sure all your data is floats? If some of your inputs are ints, they DO overflow after 10 digits. Remember that numpy ints are not infinite like Python ints. I don't SEE this happening in your code, but there's stuff we aren't seeing. — Tim Roberts, Aug 02 '23 at 04:35
@TimRoberts i added my other code that was used to process the data in the excel into a numpy array, other than that you're seeing everything i have so far — Fauzan Anggito Wicakson, Aug 02 '23 at 10:36
I'm concerned that you do not seem to have done even the most basic debugging. You have a `print` statement that prints the cost every 10 iterations. If you change that to `if 1:`, you'll see that your cost increases exponentially, so by iteration 91, the cost is 2.9E+184, and the other values are around 10**100. Multiplying those does exceed the capacity of a float. Something in your algorithm is diverging. That's what you need to chase. — Tim Roberts, Aug 02 '23 at 18:13
What are you actually trying to do here? If you're just trying to find a "best fit" line for that data, I can do that in two lines of code. I'm can post that as an answer plus a plot of the result, if that's what you want. — Tim Roberts, Aug 02 '23 at 18:39
@TimRoberts yeah i was trying to make a manual code of linear regression, gradient descent, and also predict a price with the land area input of the user, the reason im not using a library is because im trying to use it for my uni paper so that it fully shows the workings and algorithm — Fauzan Anggito Wicakson, Aug 03 '23 at 17:15
Also the code did work when i normalize the data but i got an off result — Fauzan Anggito Wicakson, Aug 03 '23 at 17:18

score 0 · Answer 1 · answered Aug 03 '23 at 17:34

0

Here is a linear regression solution using least squares. This produces a line that seems to match the data pretty well, as witnessed by the plot at the end.

import os
import openpyxl
import math
import numpy as np
import matplotlib.pyplot as plt

def get_x_train():
    x_train = np.array([]) 
    for x in range(2, 1011):
        data = ws.cell(row=x, column=5).value
        x_train = np.append(x_train, data)
    return x_train

def get_y_train():
    y_train = np.array([])
    for y in range(2, 1011):
        data = ws.cell(row=y, column=3).value
        y_train = np.append(y_train, data)
    return y_train

wb = openpyxl.load_workbook('DATA RUMAH.xlsx')
ws = wb.active

# Load our data set

x_train = get_x_train()   #features
y_train = get_y_train()  #target value

# Compute linear regression.
A = np.vstack([x_train,np.ones(len(x_train))]).T
m,b = np.linalg.lstsq(A, y_train, rcond=None)[0]
print(m,b)

plt.plot(x_train, y_train,'o')
plt.plot(x_train, m*x_train+b, 'r')
plt.show()

answered Aug 03 '23 at 17:34

Tim Roberts

48,973
4
21
30

Thank you, i found out the problem was my learning rate which is too big (i changed it to -e10) but i also ran your code to compare result i got: (w, b) found by gradient descent: (32469417.2368, 79098.4748) your code: 33047038.429474074 -213998046.51161024 which i think is pretty similar, but why is your B in minus? – Fauzan Anggito Wicakson Aug 03 '23 at 23:51
Also can i message you sir to ask question about how to prove the accuracy of the prediction (like coefficients and other) for my paper? thanks and sorry for the trouble. – Fauzan Anggito Wicakson Aug 03 '23 at 23:58
Given the scale of the numbers, I don't think the delta is very significant. I don't know how you prove the accuracy other than putting in the known Xs and computing the error. There are several error techniques you could use. – Tim Roberts Aug 04 '23 at 04:54
i got an output of 9.29e+18 for the mean squared error (cost) is it too big given the scale of the data? – Fauzan Anggito Wicakson Aug 04 '23 at 19:03
Since the data is in the 5E10 range, and the sqrt of your error is about 3E9, I think thats acceptable The dots do tend to spread from the line. – Tim Roberts Aug 04 '23 at 19:22
would it work if i min-max scaled the Y to reduce the cost further and descale the prediction after? – Fauzan Anggito Wicakson Aug 04 '23 at 21:02
i got a minus number after descaling the prediction, but the model still fits though – Fauzan Anggito Wicakson Aug 04 '23 at 21:30
i posted another question about scaling here: https://stackoverflow.com/questions/76839247/getting-a-minus-prediction-after-min-max-scaling-the-price-in-a-linear-regressio if you would be so kind to check it out – Fauzan Anggito Wicakson Aug 04 '23 at 22:04

score 0 · Accepted Answer · edited Aug 07 '23 at 15:56

Changed my learning rate and iterations to

iterations = 1000000
tmp_alpha = np.float64(1.0e-10)

And it worked:

> Iteration    0: Cost 5.60e+19  dj_dw: -2.878e+12, dj_db: -7.626e+09   w:  2.878e+02, b:  7.62614e-01
Iteration 100000: Cost 1.72e+19  dj_dw: -1.186e+12, dj_db: -3.097e+09   w:  1.909e+07, b:  5.03158e+04
Iteration 200000: Cost 1.06e+19  dj_dw: -4.890e+11, dj_db: -1.231e+09   w:  2.696e+07, b:  7.05947e+04
Iteration 300000: Cost 9.51e+18  dj_dw: -2.015e+11, dj_db: -4.613e+08   w:  3.020e+07, b:  7.84937e+04
Iteration 400000: Cost 9.32e+18  dj_dw: -8.307e+10, dj_db: -1.442e+08   w:  3.154e+07, b:  8.12901e+04
Iteration 500000: Cost 9.29e+18  dj_dw: -3.424e+10, dj_db: -1.351e+07   w:  3.209e+07, b:  8.19834e+04
Iteration 600000: Cost 9.29e+18  dj_dw: -1.411e+10, dj_db:  4.036e+07   w:  3.231e+07, b:  8.18099e+04
Iteration 700000: Cost 9.29e+18  dj_dw: -5.816e+09, dj_db:  6.256e+07   w:  3.241e+07, b:  8.12791e+04
Iteration 800000: Cost 9.29e+18  dj_dw: -2.397e+09, dj_db:  7.172e+07   w:  3.245e+07, b:  8.06010e+04
Iteration 900000: Cost 9.29e+18  dj_dw: -9.883e+08, dj_db:  7.549e+07   w:  3.246e+07, b:  7.98622e+04
(w, b) found by gradient descent: (32469417.2368, 79098.4748)

overflow encountered in scalar power error (Linear Regression & Gradient Descent With Large Digit)

2 Answers2