This article is about two common methods of reinforcement learning (RL) for artificial neural networks (ANN’s).
Backpropagation … Based on gradient descent
It’s the training of a single ANN with randomly initialised parameters. As soon as you’ve got feedback on how good you ANN has performed this time you’re trying to modify the parameters from the back to the front. By doing that your ANN normally gets better and better, which means closer to the maximum or minimum of the unknown function. The problem that can happen is that you are running in a local maximum or minimum instead of the global one.
Genetic algorithm … Based on natural selection
At the beginning you’re creating multiple ANN’s with randomly initialised parameters, known as the first generation. The aim is to increase the skill to solve the problem from generation to generation with a constant number of ANN’s. As soon as you got feedback on how good every ANN has performed in this generation you are selecting the best ones, recombine them to create the next generation and add some mutations to them.