Q-Learning AI Isn't Recognizing Easy Pattern

Question

I have a Q-Learning program trying to predict my stock simulated stock market where the price of the stock goes 1-2-3-1-2-3...

I have been trying to debug this for a few days and just can't get it. I even completely started from scratch and the problem persists. If you have extra time, I just need an extra set of eyes on this.

The getStock() function is what simulates the stock price.

The reducePricesToBinary() function takes the stocks and makes it into an put of [Whether the stock went up or down last, how many times it went down/up in a row, how many times the stock went up/down in a row]

The readAI() function just reads what should happen given the inputs

The checkGuess() function checks the previous guess and changes the policyGradient based on whether or not it was right.

Thank you so much!

import requests
import sys
import time

# Constants
learningRate = 0.5
stocksToBuy = 250
discountFactor = 0.5

# Variables declared:

# getStock()
currentStockPrice = 0
pastStockPrice = 0

# reducePricesToBinary()
binaryVersionOfPrices = ""

# Ai()
AI = dict()

# convertBinaryToInputs()
inputsForAI = [0,0,0]

# Ai
guess = 0
oldGuess = 0
reward = 0
pastInputsForAI = ['0',0,0]
firstTurnOver = False

# Buying and Selling stocks
money = 1000000
shares = 0

#
countToSaveEveryFifteen = 0

# Saving anything to a file.
def save(name, data):
    with open(name, 'w') as f:
        f.write(str(data))

def saveEverything():
    save("AI", AI)
    save("binaryStockPrices", binaryVersionOfPrices)
    save("money", money)
    save("shares", shares)

# Runs after an error.
def onExit():
    saveEverything()
    sys.exit()

# Prints and saves an error log if a function crashes.
def crashProgram(errorMessage):
    print(errorMessage)
    with open("crashLogs", 'w') as f:
        f.write("{}\n\n".format(errorMessage))
    onExit()

# Runs a function with try catches to catch an error.
def doFunction(function):
    try:
        function()
    except Exception, e:
        crashProgram("Fatal error running {}().\n{}".format(function.__name__, e))

# Gets the current stock value.
#def getStock():
#    global currentStockPrice
#    res = requests.get("https://markets.businessinsider.com/stocks/aapl-stock")
#    stockCostString = ""
#    for x in range (9):
#        stockCostString += res.text[res.text.find('"price": "')+10 + x]
#    currentStockPrice = float(stockCostString)
#    print(currentStockPrice)

def getStock():
    global currentStockPrice
    currentStockPrice = 1 if currentStockPrice == 3 else (2 if currentStockPrice == 1 else 3)

# Turns the prices into 0's and 1's.
def reducePricesToBinary():
    global pastStockPrice
    global binaryVersionOfPrices
    binaryString = "1" if currentStockPrice > pastStockPrice else "0" if currentStockPrice < pastStockPrice else ""
    binaryVersionOfPrices += binaryString
    pastStockPrice = currentStockPrice

# Converts the binaryStockPrices to inputs for the AI.
def convertBinaryToInputs():
    global inputsForAI
    inputsForAI[0] = binaryVersionOfPrices[len(binaryVersionOfPrices)-1]
    counterOfFirstNumber = 1
    counterOfSecondNumber = 1
    while(binaryVersionOfPrices[len(binaryVersionOfPrices) - counterOfFirstNumber] == inputsForAI[0]):
        counterOfFirstNumber+=1
    counterOfFirstNumber-=1
    while(binaryVersionOfPrices[len(binaryVersionOfPrices) - counterOfFirstNumber - counterOfSecondNumber]!=inputsForAI[0]):
        counterOfSecondNumber += 1
    counterOfSecondNumber-=1
    inputsForAI[0] = binaryVersionOfPrices[len(binaryVersionOfPrices)-1]
    inputsForAI[1] = counterOfFirstNumber
    inputsForAI[2] = counterOfSecondNumber


# AI functions
def readAI():
    global guess
    try:
        AIGuess = AI[inputsForAI[0], inputsForAI[1], inputsForAI[2]]
    except:
        AI[inputsForAI[0], inputsForAI[1], inputsForAI[2]] = 0.5
        AIGuess = 0.5
    guess = AIGuess
    print("GUESS: {}".format(guess))
    print("INPUTS: {}".format(inputsForAI))
    return guess

def checkGuess():
    global firstTurnOver
    if(firstTurnOver):
        global oldGuess
        global reward
        global pastInputsForAI
        oldGuess = 0 if oldGuess == -1 else 1
        print("Old guess: " + str(oldGuess) + " Input: " + str(int(round(float(inputsForAI[0])))))
        reward = 1 if oldGuess == int(round(float(inputsForAI[0]))) else -1
        AI[pastInputsForAI[0], pastInputsForAI[1], pastInputsForAI[2]] = (1-learningRate) * AI[pastInputsForAI[0], pastInputsForAI[1], pastInputsForAI[2]] + learningRate * (reward + discountFactor * 1)
        oldGuess = int(round(float(guess)))
    pastInputsForAI = inputsForAI
    firstTurnOver = True

def buySellStocks():
    global money
    global shares
    oldStocks = shares
    if(guess > 0):
        while(money > currentStockPrice and (shares - oldStocks) < stocksToBuy * guess):
            money -= currentStockPrice
            shares += 1
    else:
        while(shares > 0 and (oldStocks - shares) > stocksToBuy * guess):
            money += currentStockPrice
            shares -= 1

# Loads the binaryVersionOfPrices from a file.
def loadBinaryPrices():
    global binaryVersionOfPrices
    with open("binaryStockPrices", 'r') as f:
        binaryVersionOfPrices = f.read()

def loadMoney():
    global money
    with open("money", 'r') as f:
        money = int(f.read())

def loadShares():
    global shares
    with open("shares", 'r') as f:
        shares = int(f.read())

# Loads the AI from a file.
def loadAI():
    global AI
    with open("AI", 'r') as f:
        AI = eval(f.read())

#Prints relative information
def printStuff():
    print("Stock price: {}\nCurrent balance: {}\nCurrent shares: {}\nTotal value: {}\nGuess: {}\n".format(currentStockPrice, money, shares, money + shares * currentStockPrice, guess))

# Loads all variables from files.
def onProgramStart():
    doFunction(loadAI)
    doFunction(loadBinaryPrices)
    doFunction(loadMoney)
    doFunction(loadShares)

# Saves every 15 checks
def saveEveryFifteen():
    global countToSaveEveryFifteen
    countToSaveEveryFifteen += 1
    if(countToSaveEveryFifteen == 15):
        saveEverything()
        countToSaveEveryFifteen = 0

# Runs all functions.
def doAllFunctions():
    doFunction(reducePricesToBinary)
    doFunction(convertBinaryToInputs)
    doFunction(readAI)
    doFunction(checkGuess)
    doFunction(buySellStocks)
    doFunction(saveEveryFifteen)
    doFunction(printStuff)
    doFunction(getStock)

# Loads variables from files.
onProgramStart()

# Repeats the process.
while(1):
    doAllFunctions()
    time.sleep(0.5)

Why so many global variables, along with functions which take no parameters and return no values? Don’t you think this contributed to making things difficult to debug? — AMC, Dec 20 '19 at 04:21
I don’t know what your background is, but in Python variable and function names should generally follow the `lower_case_with_underscores` style. You don’t need the parentheses in the if statements and while loops (which IDE are you using, that wouldn’t point this out?!). Calling `eval()` on the entire contents of a file is a bad idea. I also think that `doFunction` is poor style/design, I will try to find the particular source I have in mind. Those nested if statements in `reducePricesToBinary()` are terrifying. — AMC, Dec 20 '19 at 04:25
Tomorrow I will take an in depth look at the entire program. Before that, however, I will share a version which tries to preserve as much of the current design while also respecting some design and style conventions! :) Which Python version are you using, by the way? — AMC, Dec 20 '19 at 04:29
@AMC Comment 1 - I had one originally with arguments and returns and that ended up failing so I went the complete opposite route. Comment 2 - Yeah, I kinda hop around a lot of different languages and have kind of settled on C++ standards, but I'll try to fix that for future python projects, thank you! The doFunction was used mostly because I wanted to customize my error messages a bit. Comment 3 - Thank you, I've been working on it for a while after I posted this and I keep running into the same problem. — , Dec 20 '19 at 04:34
I understand wanting to customize the messages, but currently all it does is reduce the amount of information we get. Do I need any data to run the program? — AMC, Dec 20 '19 at 04:55
@AMC I guess it does reduce the information we get, I'll probably take that out next time I get a chance to work on it. And sorry, I forgot to mention that yes, you do need some data to run the program. The first time you run it, it should make four new files in the working directory. In AI, put two brackets {}. In binaryStockPrices, put a few zeroes and ones 1010101. Thank you so much again for your help. I'm off to sleep for tonight but should be back in 8 hours. — , Dec 20 '19 at 05:09

score 1 · Answer 1 · answered Dec 20 '19 at 21:11

As I mentioned in my comment, here is a version of the program after some basic refactoring:

import sys
import time

# constants
learning_rate: float = 0.5
stocks_to_buy: float = 250
discount_factor: float = 0.5

# variables declared:

# get_stock()
current_stock_price: int = 0
past_stock_price: int = 0

# reduce_prices_to_binary()
binary_version_of_prices: str = ''

# ai()
a_i: dict = {}

# convert_binary_to_inputs()
inputs_for_a_i = [0, 0, 0]

# ai
guess = 0
old_guess = 0
reward = 0
past_inputs_for_a_i = ['0', 0, 0]
first_turn_over: bool = False

# buying and selling stocks
money: int = 1000000
shares: int = 0

#
count_to_save_every_fifteen: int = 0


# saving anything to a file.
def save(name, data):
    with open(name, 'w') as f:
        f.write(str(data))


def save_everything():
    save("a_i", a_i)
    save("binary_stock_prices", binary_version_of_prices)
    save("money", money)
    save("shares", shares)


# runs after an error.
def on_exit():
    save_everything()
    sys.exit()


# gets the current stock value.
# def get_stock():
#    global current_stock_price
#    res = requests.get("https://markets.businessinsider.com/stocks/aapl-stock")
#    stock_cost_string = ""
#    for x in range (9):
#        stock_cost_string += res.text[res.text.find('"price": "')+10 + x]
#    current_stock_price = float(stock_cost_string)
#    print(current_stock_price)

def get_stock():
    global current_stock_price
    if current_stock_price == 3:
        current_stock_price = 1
    elif current_stock_price == 1:
        current_stock_price = 2
    else:
        current_stock_price = 3


# turns the prices into 0's and 1's.
def reduce_prices_to_binary():
    global past_stock_price
    global binary_version_of_prices
    if current_stock_price > past_stock_price:
        binary_string = "1"
    elif current_stock_price < past_stock_price:
        binary_string = "0"
    else:
        binary_string = ""
    binary_version_of_prices += binary_string
    past_stock_price = current_stock_price


# converts the binary_stock_prices to inputs for the a_i.
def convert_binary_to_inputs():
    global inputs_for_a_i
    inputs_for_a_i[0] = binary_version_of_prices[len(binary_version_of_prices) - 1]
    counter_of_first_number = 1
    counter_of_second_number = 1
    while binary_version_of_prices[len(binary_version_of_prices) - counter_of_first_number] == inputs_for_a_i[0]:
        counter_of_first_number += 1
    counter_of_first_number -= 1
    while (binary_version_of_prices[
               len(binary_version_of_prices) - counter_of_first_number - counter_of_second_number] !=
           inputs_for_a_i[0]):
        counter_of_second_number += 1
    counter_of_second_number -= 1
    inputs_for_a_i[0] = binary_version_of_prices[len(binary_version_of_prices) - 1]
    inputs_for_a_i[1] = counter_of_first_number
    inputs_for_a_i[2] = counter_of_second_number


# a_i functions
def read_ai():
    global guess
    try:
        a_i_guess = a_i[inputs_for_a_i[0], inputs_for_a_i[1], inputs_for_a_i[2]]
    except:
        a_i[inputs_for_a_i[0], inputs_for_a_i[1], inputs_for_a_i[2]] = 0.5
        a_i_guess = 0.5
    guess = a_i_guess
    print(f'guess: {guess}')
    print(f'inputs: {inputs_for_a_i}')
    return guess


def check_guess():
    global first_turn_over
    if first_turn_over:
        global old_guess
        global reward
        global past_inputs_for_a_i
        old_guess = 0 if old_guess == -1 else 1
        print(f'old guess: {old_guess}, input: {round(float(inputs_for_a_i[0]))}')
        if old_guess == round(float(inputs_for_a_i[0])):
            reward = 1
        else:
            reward = -1
        a_i[past_inputs_for_a_i[0], past_inputs_for_a_i[1], past_inputs_for_a_i[2]] = (1 - learning_rate) * a_i[
            past_inputs_for_a_i[0], past_inputs_for_a_i[1], past_inputs_for_a_i[2]] + learning_rate * (
                                                                                              reward + discount_factor * 1)
        old_guess = int(round(float(guess)))
    past_inputs_for_a_i = inputs_for_a_i
    first_turn_over = True


def buy_sell_stocks():
    global money
    global shares
    old_stocks = shares
    if guess > 0:
        while money > current_stock_price and (shares - old_stocks) < stocks_to_buy * guess:
            money -= current_stock_price
            shares += 1
    else:
        while shares > 0 and (old_stocks - shares) > stocks_to_buy * guess:
            money += current_stock_price
            shares -= 1


# loads the binary_version_of_prices from a file.
def load_binary_prices():
    global binary_version_of_prices
    with open("../resources/ai_stock_files/binary_stock_prices", 'r') as f:
        binary_version_of_prices = f.read()


def load_money():
    global money
    with open("../resources/ai_stock_files/money") as f:
        money = int(f.read())


def load_shares():
    global shares
    with open("../resources/ai_stock_files/shares") as f:
        shares = int(f.read())


# loads the _a_i from a file.
def load_a_i():
    global a_i
    with open("../resources/ai_stock_files/a_i") as f:
        a_i = eval(f.read())


# prints relative information
def print_stuff():
    print(f"stock price: {current_stock_price}\n"
          f"current balance: {money}\n"
          f"current shares: {shares}\n"
          f"total value: {money + shares * current_stock_price}\n"
          f"guess: {guess}\n")


# loads all variables from files.
def on_program_start():
    load_a_i()
    load_binary_prices()
    load_money()
    load_shares()


# saves every 15 checks
def save_every_fifteen():
    global count_to_save_every_fifteen
    count_to_save_every_fifteen += 1
    if count_to_save_every_fifteen == 15:
        save_everything()
        count_to_save_every_fifteen = 0


# runs all functions.
def do_all_functions():
    reduce_prices_to_binary()
    convert_binary_to_inputs()
    read_ai()
    check_guess()
    buy_sell_stocks()
    save_every_fifteen()
    print_stuff()
    get_stock()


# loads variables from files.
on_program_start()

# repeats the process.
while True:
    do_all_functions()
    time.sleep(0.5)

Thank you for doing all that :D. The problem remains though, the AI is consistently guessing wrong. When the stock value is 1, the AI guesses incorrectly that it will decrease, when the stock is 2, it guesses correctly that it will increase, but when the stock is 3, it incorrectly guesses it will increase. It seems to be one guess behind, but I can't seem to find where the error is. — , Dec 20 '19 at 22:23
@Drew My next step was going to be to rewrite the program, but that's probably going to require some explanation from you as to its functioning. — AMC, Dec 20 '19 at 22:31
I'm probably also going to try to rewrite it keeping in mind the way you rewrote mine. "require some explanation from you as to its functioning" - Okay, here it is. get_stock() - gets the current price of the stock, in this case, switches between 1, 2 and 3. reduce_prices_to_binary() - takes whether or not the stock went up or down and adds that to a string of the history of whether the stock went up or down. Part 1 of ? — , Dec 20 '19 at 22:37
convert_binary_to_inputs() - Creates the inputs for the policy gradient of the AI. The inputs are: 1). A string of whether the stock went up or down last. 2). Assuming input 1 was 0, input 2 gives how many times it went down in a row. 3). Assuming input 1 was 0, input 3 gives how many times it went up in a row. For each different combination of these inputs, there is a policy. read_ai() reads the policy and if it doesn't exist, defaults to 0.5 i.e. inputs = ['0', 1, 2], read_ai(inputs) = -0.5. buy_sell_stocks() takes this guess and if it is positive; buys stocks, and negative; sells. 1/? — , Dec 20 '19 at 22:42
After the read_ai takes a guess, it checks the stock again, it repeats everything and then check_guess() takes the last guess and how the stock changed and checks if the last guess was right. If it was right, it reinforces the guess by making policy in the policy gradient get closer to -1 or closer to 1(My recode will be 0 to 1 where anything below 0.5 sells stocks), if it was wrong, it pushes the policy the other way. After checking the guess, it repeats this process. Thanks again for helping! If you need any clarification because I'm bad at explaining, let me know! — , Dec 20 '19 at 22:47
@Drew I was thinking of starting from scratch, but I think the debugging experience will do me some good. Any idea as to where I should start? — AMC, Dec 21 '19 at 01:34
@Drew Don’t worry about it lol. I’ll take a look at the github — AMC, Dec 21 '19 at 02:46
I think I figured out the bug. When I do the check_ai() the function is supposed to edit the policy of the last inputs based on the result but, for some reason the last_inputs is equal to the current inputs but I only have one assignment to last_inputs and it’s after the check so I don’t know how that’s happening. — , Dec 21 '19 at 15:45
@Drew Great! I’m going to try writing a refactored version, then. — AMC, Dec 21 '19 at 21:32
Thanks so much for your help :D. I also managed to make the get_inputs function super efficient. I want to keep my project kind of secret, but I can pm you the code if you want to see what I managed to fix. — , Dec 21 '19 at 23:21
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/204660/discussion-between-drew-and-amc). — , Dec 21 '19 at 23:21

score 0 · Answer 2 · answered Dec 21 '19 at 17:13

When correcting the policies in the policy gradient, I was using inputs from a cycle ago and over compensating by calling the functions in an order where it already used inputs from a cycle ago, effectively making the gradient off by two inputs. Since I was cycling the input in 3's it made it look like an "off-by-one" error when in reality I was off by two, making it hard to detect.

Q-Learning AI Isn't Recognizing Easy Pattern

2 Answers2