If you’ve ever taught a dog to sit or shake, you’re familiar with the concept of reinforcement learning. Positive reinforcement is when an animal—or a child, if you’re lucky—learns a desired behavior based on the rewards it receives for the steps it takes to reach the desired outcome.
copyright by www.forbes.com
For example, you give your dog a treat for sitting at the door when he needs to go out for a bathroom break, or you give your child a high five—or in my house, $5—when they do well on their spelling test. The subject—in this case, the dog or child—learns which behavior is good or bad based on the response it receives along the way. The same concept can be applied to artificial intelligence (AI).
Start to learn from mistakes
Historically, one of the flaws of AI is traditionally, machines and computer programs can’t learn from their mistakes. Instead, they rely on a complex set of data that helps them recognize words, things, and missions. Rather than learning by trial and error, like humans do, they refer to their internal set of hard-coded “instructions” to determine right and wrong. And while deep learning allows them to be reprogrammed with mass amounts of new data to achieve better outcomes, they can’t improve those outcomes on their own. This process, also called “supervised learning” requires extensive involvement on the part of the programmer. That’s where reinforcement learning comes in. Recently, tech giants like Alphabet, and Google have been working to teach artificial intelligence programs to think for themselves through reinforcement learning. In other words , they’re helping them solve perceived problems, “rather than being taught what solutions look like.”
Need to find new ways to learn
Many would agree the technology is still in its infancy—or as one writer put it, it’s green-and-black-DOS-screen stage. Although it’s been tremendously successful in gaming—including Google DeepMind/AlphaGo’s much-hyped victory in the game Go—few have been able to find solid commercial uses quite yet, outside of content personalization and ad placement or other somewhat insignificant victories such as saving power or sorting trash, etc. There are a few ways programmers will be working to develop the technology in coming years to make it more useful in the commercial world, as well as our personal lives.