Get state of TicTacToe board in Q-Learning

Question

I'm just getting into reinforcement learning and q-learning, and I wanted to try and create a Tic-Tac-Toe AI. With a Q-Table, I need to find the "state" of the board, and I was having trouble finding a way to do this.

For extra clarification, a state is a number that represents the current board, including the value of each of the nine squares.

A board that looks like:

[[0, 0, 0],
 [0, 0, 0],
 [0, 0, 0]]

would be state 0, as it is the first board. Beyond this, I am not sure how to calculate the state of the board based on the array.

[EDIT] I'm coming here because I honestly don't know where to start; I can't find anything on the web, and if you dislike my question you could at least tell me why.

score 1 · Accepted Answer · answered Jun 11 '20 at 14:26

1

I think you need something like this.

import numpy as np
max_number = 10
L = [[1, 0, 0],
 [0, 0, 0],
 [0, 5, 0]]

L_1d = sum(L, [])
print(L_1d)
# [1, 0, 0, 0, 0, 0, 0, 5, 0]
degrees = max_number ** np.arange(len(L_1d))
print(degrees)
# [        1        10       100      1000     10000    100000   1000000   10000000 100000000]
state = L_1d @ degrees
print(state)
# 50000001

answered Jun 11 '20 at 14:26

V. Ayrat

2,499
9
10

Thank you, this works perfectly. The only thing is that it seems that it doesn't work with numpy arrays. Do you have a way to solve this? – CircuitSacul Jun 11 '20 at 15:33
1

`L_1d = L.reshape(-1)` – V. Ayrat Jun 11 '20 at 15:38

Get state of TicTacToe board in Q-Learning

1 Answers1