How should I store a sparse decision tree(move list) in a database?

Question

I have been thinking of making an AI for a board game for a long time, and recently I've started to gather resources and algorithms. The game is non-random, and most of the time, there < 3 moves for a player, sometimes, there are >20 moves. I would like to store critical moves, or ambiguous moves so that the AI learns from its mistakes and will not make a same mistake the next time. Moves that surely win or lose need not be stored. So I actually have a sparse decision tree for the beginning of games. I would like to know how I should store this decision tree in a database? The database does not need to be SQL, and I do not know which database is suitable for this particular problem.

EDIT: Please do not tell me to parse the decision tree into memory, just imagine the game as complicated as chess.

Why a database? Just load it at run-time from a text-file. You'll only see a benefit from a database with more than several hundred-thousand entries. Keep it simple, don't bother with the database. — beatgammit, Aug 07 '11 at 10:17
Why a database? Because the tree is too big to fit in memory. I know a solved always win branch non-sparse decision tree for that game is 200mbs, the game has at least more than 14 of these kind of branches. — TiansHUo, Aug 07 '11 at 14:29
Have you heard of Redis? It's a simple, key-value store with which you can implement your tree logic. Without knowing how your decision tree is set up this could be a suitable solution. — beatgammit, Aug 07 '11 at 21:14
@tjameson Redis is an option, but as you said it is a *simple* key-value store, where actually I would like a database that is better optimized for a game tree. — TiansHUo, Aug 09 '11 at 10:50
@TiansHUo- You can do some basic querying. The one you might be interested in is keys. This can do some very basic key searching, and if you organize your keys such that they follow a tree-like structure (`top:next:next:next:next`), it could be pretty fast. It has other features than a strict key-value store. I'm not submitting this as a solution because I don't know how exactly your game is set up. — beatgammit, Aug 09 '11 at 17:25
I also think that ultimately the solution depends on how you store the decision tree in a form that you can manage. It can be a textual chain of steps like TiansHUo suggested, or for a traditional database it can be as simple as a parent field or as fancy as modified preorder tree, which allow you to get whole subtree in one query but at cost of complexity and slow structure modification. If we know the game better we may be able to help you better. — Sheepy, Aug 16 '11 at 02:45
@TiansHUo: are you aware of [minimax algorithm](http://en.wikipedia.org/wiki/Minimax)? are you trying to cache the resulting tree? are you aware of strategy trees? are you planning to cache only the chosen strategy? [will save you a lot of space] — amit, Aug 16 '11 at 08:44
@amit, I'm aware of the minimax algorithm. I am not trying to cache the resulting tree, I am trying to cache steps that are difficult to choose. — TiansHUo, Aug 18 '11 at 02:59
@Sheepy, the game is gomoku/renju. You can look up wikipedia on it. Usually the player has few choices when defending, so those steps don't need to be pre-calculated. It is only at critical steps where a lookup table may be important. — TiansHUo, Aug 18 '11 at 03:05

score 1 · Answer 1 · answered Aug 22 '11 at 10:41

1

As you will be traversing the tree, neo4j seems like a good solution to me. SQL is no good choice because of the many joins you would need for queries. As i understand the question, you are asking for a way to store some graph in a database, and neo4j is a database explicitely for graphs. For the sparseness, you can attach arrays of primitives or Strings to the edges of your graph to encode sequences of moves, using PropertyContainers (am i right that by sparseness and skipping of nodes you mean your tree-edges are sequences of moves rather than single moves?).

answered Aug 22 '11 at 10:41

Michael Kutschke

221
1
2

Not exactly, the same situation can be achieved by many moves. – TiansHUo Aug 23 '11 at 02:24
okay, in that case, it's even easier because the tree just becomes a "normal" graph. Just annotate the edges of the graph with the moves (and nodes/states with the score), and make sure your edges are directed. Then you can traverse your graph like you would the corresponding tree. If you know how the sparse tree looks like, you can store it in that fashion. For example, you can then query your DB for a state, and see if you saved moves for it. – Michael Kutschke Aug 23 '11 at 11:53

score 0 · Answer 2 · answered Aug 21 '11 at 22:07

0

Firstly what you are trying to do sounds like a case based reasoning(CBR) problem see: http://en.wikipedia.org/wiki/Case-based_reasoning#Prominent_CBR_systems . CBR will have a database of decisions, your system would in theory pick the best outcomes available.

Therefore I would suggest using neo4j which is a nosql graph database. http://neo4j.org/

So to represent your game each position is a node in the graph, and each node should contain potential moves from said position. You can track scoring metrics which are learnt as games progress so that the AI is more informed.

answered Aug 21 '11 at 22:07

Steve

21,163
21
69
92

Note that in my question I asked for a *sparse* tree, which means that many nodes may be skipped. But `neo4j` uses a full tree, which means every move must be saved. That means the database will be too big. – TiansHUo Aug 22 '11 at 02:41
And under what conditions are you looking to skip steps? In node4j you just design your underlying datamodel so that under certain situations that it just skips to the relevant decision process. – Steve Aug 22 '11 at 07:32

score 0 · Answer 3 · edited May 23 '17 at 12:00

I would use a document database (NOSQL) like RavenDB because you can store any data structure in the database.

Documents aren't flat like in a normal SQL database and that allows you to store hierarchical data like trees directly:

{ 
   decision: 'Go forward', 
   childs: [ 
      { decision: 'Go backwards' },
      { 
         decision: 'Stay there',
         childs: [
            { decision: 'Go backwards' }
         ]
      }
   ]
}

Here you can see an example JSON tree which can be stored in RavenDB.

RavenDB also has a built-in feature to query hierarchical data: http://ravendb.net/faq/hierarchies

Please look at the documentation to get more information how RavenDB works.

Resources:

What type of NoSQL database is best suited to store hierarchical data?

score 0 · Answer 4 · answered Aug 22 '11 at 17:24

0

You can use memory mapped file as storage. First, create "compiler". This compiler will parse text file and convert it into compact binary representation. Main application will map this binary optimized file into memory. This will solve your problem with memory size limitation

answered Aug 22 '11 at 17:24

vromanov

881
6
11

I would like the file to be incremental, and not pre-calculated. So a database is best. – TiansHUo Aug 23 '11 at 02:25

score 0 · Answer 5 · answered Aug 22 '11 at 17:26

Start with a simple database table design.

Decisions: CurrentState BINARY(57) | NewState BINARY(57) | Score INT

CurrentState and NewState are a serialized version of the game state. Score is a weight given to the NewState (positive scores are good moves, negative scores are bad moves) your AI can update these scores appropriately.

Renju, uses a 15x15 board, each location can be either black, white or empty so you need Ceiling( (2bits * 15*15) / 8 ) bytes to serialize the board. In SQL that would be a BINARY(57) in T-SQL

Your AI would select the current moves it has stored like...

SELECT NewState FROM Decisions WHERE CurrentState = @SerializedState ORDER BY Score DESC

You'll get a list of all the stored next moves from the current game state in order of best score to least score.

Your table structure would have a Composite Unique Index (primary key) on (CurrentState, NewState) to facilitate searching and avoid duplicates.

This isn't the best/most optimal solution, but because of your lack of DB knowledge I beleive it would be the easiest to implement and give you a good start.

score 0 · Answer 6 · answered Aug 23 '11 at 00:55

If I compare with chess engines, those play from memory, maybe apart from opening libraries. Chess is too complicated to store a decinding decision tree. Chess engines play by assigning heuristic evaluations to potential and transient future positions (not moves). Future positions are found by some kind of limited depth search, may be cached for some time in memory, but often are plainly recalculated each turn as the search space is just too big to store in a way faster to look up than recalculating is possible.

score 0 · Answer 7 · answered Aug 23 '11 at 00:58

0

Do you know Chinook — the AI that solves checkers? It does this by compiling a database of every possible endgame. While this is not exactly what you are doing, you might learn from it.

answered Aug 23 '11 at 00:58

Don Reba

13,814
3
48
61

score 0 · Answer 8 · answered Dec 11 '11 at 18:53

I can't clearly conceive neither the data structures you handle in your tree nor their complexity.

But here are some thoughts which may interest you :

Map your decision tree into sparse matrix, a tree is a graph after all
Devise a storage/retrieval strategy taking advantage of sparse matrix properties.

score 0 · Answer 9 · answered Dec 17 '11 at 12:04

I would approach this with the traditional way an opening book is handled in chess engines:

Generate all possible moves
For each move:
1. Make that move
2. Look the resulting position up in your database
3. Undo the move
Make the move that had the highest score in your database

Looking up a move

Chess engines usually compute a hash function of the current game state via Zobrist hashing, which is a simple way to construct a good hash function for gamestates.

The big advantage of this approach is that it takes care of transpositions, that is, if the same state can be reached via alternate paths, you don't need to worry about those alternate paths, only about the game states themselves.

How chess engines do this

Most chess engines use static opening books that are compiled from recorded games and hence use a simple binary file that maps these hashes to a score; e.g.

struct book_entry {
    uint64_t hash
    uint32_t score
}

The entries are then sorted by hash, and thanks to operating system caching, a simple binary search through the file will find the needed entries very quickly.

Updating the scores

However, if you want the engine to learn continously, you will need a more complicated data structure; at this point it is usually not worth doing yourself, and you should use an available library. I would probably use LevelDB, but anything that lets you store key-value pairs is fine (Redis, SQLite, GDBM, etc.)

Learning the scores

How exactly you update the scores depends on your game. In games with a lot of data available, a simple approach such as just storing the percentage of games won after the move that resulted in the position works; if you have less data, you can store the result of a game tree search from the position in question as score. Machine learning techniques such as Q learning are also a possibility, although I do not know of a program that actually does this in practice.

score -1 · Answer 10 · answered Aug 07 '11 at 13:36

I'm assuming your question is asking about how to convert a decision tree into a serial format that can be written to a location and later used to reconstruct the tree.

Try using a pre-order traversal of the tree, using a toString() function (or its equivalent) to convert the data stored at each node of the decision tree to a textual descriptor. By pre-order traversal, I mean implementing an algorithm that first performs the toString() operation on the node, and writes the output to a database or file, and then recursively performs the same operation on its child nodes, in a specified order. Because you are dealing with a sparse tree, your toString() operation should also include information about the existence or non-existence of subtrees.

Reconstructing the tree is simple - the first stored value is the root node, the second is the root member of the left subtree, and so on. The serial data stored for each node should provide information as to which subtree the next inputted node should belong to.

Downvoted - This is not the answer, I do not want to parse the tree into memory, and I need at least a memory indexed file that can find branches as fast as possible. — TiansHUo, Aug 07 '11 at 14:31
It is also not true, you need pre-order + in-order traversal to be able to reconstruct a binary tree. not to mention the OP's tree is NOT a binary one. — amit, Aug 16 '11 at 08:23

How should I store a sparse decision tree(move list) in a database?

10 Answers10

Looking up a move

How chess engines do this

Updating the scores

Learning the scores