-1

I have a challenge that my teacher gave to beat an army of his soldiers on a 18x24 grid, with random obstacles placed on the board. The game is turn based and I have an army of 50 soldiers, each of which needs to either move or attack on their turn.

My problem is I only have access to creating a class of soldiers to fight in this environment. Currently I have a method that evaluates the board position by looking at how many soldiers there are left from each team and does yourTeam - enemyTeam to get the current score, and I have a method that will produce the legal moves for the soldier.

I want to know how I would create a reinforcement learning agent in Java with what I have access to. If you know any ways to do this or any resources that may help that would be great. Thank you for the help!

1 Answers1

2

Java is not a good language for doing math heavy computation (which is what you will need to do for RL). You could attempt to implement the Q-Learning, value-iteration or policy-iteration algorithms but I would avoid doing anything with neural networks/modern deep RL approaches here as your work load will increase dramatically.

With regard to your problem, if you are to implement one of the old-school algorithms. Think about your state and action space. I have serious concerns about the size of your action space, even with a small number of moves for each solider (say 3 - attack, move up, move down) with 50 soldiers the action space will be very large - 50^3, even this many will be difficult to deal with, any more (even 4 or 5) will send you deep into some complex topics in RL.

Other problems are - defining a good reward signal, efficiently running (potentially millions) of simulated games.

The short answer is, this is not something to be taken lightly, it would be challenging and time consuming even for someone who has experience in the field and using Java is a no-no (Python is better). Given you probably don't have long to find a good solution, I would recommend trying a different approach - planning based maybe, or hard coding a reasonable strategy.

If you still want to go ahead and read up on the topic here are some good resources:

  • Reinforcement Learning an Introduction (Sutton & Barto) - any edition is fine
  • Selected chapters in Artificial Intelligence: A Modern Approach (Russel & Norvig)

Hope this helps and sorry it may not have been the answer you we hoping for!

BenedictWilkins
  • 1,173
  • 8
  • 25
  • Thank you for the quick response. I know RL in java is going to be bad, but his AI is very good and I couldn't think of any ways to hard code an algorithm. Other algorithms such as minimax would be too computationally expensive though, so I'm not sure how to solve it. Any strategy ideas for hard coding it? – Ben Moskowitz Oct 24 '19 at 20:11
  • 1
    "Java is a no-no (Python is better)." I read on various places that Python is apparently better for ML/RL tasks but I can't grasp why that would be? I would assume Java runs much faster than Python since Python needs to be interpreted... – PLEXATIC Oct 25 '19 at 08:23
  • 1
    @PLEXATIC The reason is that there are many good libraries for python which are written in C - Numpy, Tensorflow, Pytorch and more - why were they built in Python and not Java? 1. It is more difficult to work with JVM than Cython. 2. Python is a good prototyping language 3. Java is strongly typed (which provides little advantage for scientific computing) 4. We don't care that python is slower because the main heavy computations are performed in C. There are probably other reasons, [this](https://www.datacamp.com/community/blog/python-scientific-computing-case) is good place to start. – BenedictWilkins Oct 25 '19 at 09:36
  • @BenMoskowitz RL is in general extremely computationally expensive as much or more so than minimax. Do you know what approach your teacher takes? There are approaches that reduce the complexity of minimax - depth limiting. There are also ways that you can deal with the large action space, e.g. grouping the soldiers in a clever way so that they share the same actions. Perhaps Q-Learning with a reduced action space is a way to go. I cant help with hard-coded strategies as I don't know what strategy he is using, sorry! Hope this helps. – BenedictWilkins Oct 25 '19 at 09:45
  • @BenedictWilkinsAI My teacher purposefully didn't tell us the way he implemented it. I'll definitely have a look at depth limiting for minimax, but I think I'm going to avoid Q-learning as I was having trouble finding any good implementations I can use in Java and due to the computational complexity involved. I'm also thinking of other strategies, such as A*. Any thoughts on that? – Ben Moskowitz Oct 25 '19 at 15:38
  • @BenMoskowitz sounds like a good plan, I have not had experience with A* for this sort of problem, if you think you can apply it then go for it ! Minimax is probably a good direction to go though. If this has answered your question I would be greatful if you could mark it as answered ! Good luck! – BenedictWilkins Oct 26 '19 at 21:04