• What are some useful open-source options for reinforcement learning?

  • How do major providers handle reinforcement learning?

  • How do AI startups handle reinforcement learning?

  • Is there anything that reinforcement learning can’t do?


Copyright: venturebeat.com – “What is reinforcement learning? How AI trains itself”


Machine learning (ML) might be considered the core subset of artificial intelligence (AI), and reinforcement learning may be the quintessential subset of ML that people imagine when they think of AI.

Reinforcement learning is the process by which a machine learning algorithm, robot, etc. can be programmed to respond to complex, real-time and real-world environments to optimally reach a desired target or outcome. Think of the challenge posed by self-driving cars.

The algorithms involved can also “learn” from, or be improved by, this process of taking in and responding to new circumstances.

Other forms of ML may be “trained” by sometimes massive sets of “training data,” often enabling an algorithm to classify or cluster data — or otherwise recognize patterns — based on the relationships and outcomes on which it has been trained. Machine learning algorithms begin with training data and create models that capture some of the patterns and lessons embedded in the data.

Reinforcement learning is part of the training process that often happens after deployment when the model is working. The new data captured from the environment is used to tweak and adjust the model for the current world.

Thank you for reading this post, don't forget to subscribe to our AI NAVIGATOR!


Reinforcement learning is accomplished with a feedback loop based on “rewards” and “penalties.” The scientist or user creates a list of successful and unsuccessful outcomes, and then the AI uses them to adjust the model. It might tweak some of the weights in the model, or even reevaluate some or all of the training data in light of the new reward or penalty.

For instance, an autonomous car may have a set of straightforward rewards and penalties that are predetermined. The algorithm gets a reward if it arrives on time and doesn’t make any sudden speed changes like panic braking or quick acceleration. If the car hits the curb, gets in a bad traffic jam or brakes unexpectedly, the algorithm is penalized. The model can be retrained with particular attention to the process that led to the bad results. […]

Read more: www.venturebeat.com