Last month an article on Artificial Intelligence “Evolution, rewards, artificial intelligence” was published. This article was based on the analysis of Reward is Enough, a paper issued by UK-based AI lab DeepMind. As the title suggests, scientists hypothesized that the right reward is all you need to create the abilities associated with intelligence.
This is in contrast with artificial intelligence systems that try to replicate specific functions of natural intelligence. Such as classifying images or completing sentences. The scientists even suggest that with a well-defined reward, a complex environment, and the right reinforcement learning algorithm, artificial general intelligence can be achieved.
The article and paper triggered a heated debate on various social media platforms. With reactions varying from that are supporting the idea to rejecting them. Of course, both sides made valid claims. But the truth lies somewhere in between.
Natural evolution is proof that the reward hypothesis is scientifically rational. But implementing to reach human-level intelligence won’t be an easy path.
Before understanding Evolution, rewards, artificial intelligence, it is important to understand what is artificial intelligence exactly?
In the past few years, artificial intelligence companies have made a lot of innovations. Scientists and artificial intelligence companies have worked to design and develop all kinds of complicated mechanisms and technologies to replicate abilities associated with human beings.
And these efforts have resulted in artificial intelligence systems that are capable of solving a specific problem in a limited environment. With various companies and scientists giving artificial intelligence definitions, you may have this question of what is artificial intelligence?
AI is the simulation of the human intelligence process by machines. Specific applications of artificial intelligence by computer systems include expert systems, natural language processing, speech recognition, and machine vision.
“Reward is Enough” is a new paper submitted by scientists at DeepMind. This paper draws inspiration from studying the evolution of natural intelligence as well as drawing lessons from recent developments in AI.
The authors of “Reward is Enough” suggest that reward maximization and trial-and-error experience are sufficient to develop behavior. This behavior exhibits the kind of abilities associated with intelligence. And from this, they conclude that reinforcement learning can lead to the development of artificial general intelligence.
In the analysis of reward is enough, the scientists at DeepMind presented a hypothesis. The hypothesis stated that intelligence and its abilities related to it can be presumed as subserving the maximization of reward by an agent acting in its environment. And scientific evidence supports this claim.
Homo sapiens and animals owe their intelligence to a law called natural selection. In simple words, nature gives preference to lifeforms that have a better chance of surviving in an environment. Those members of the human race that can withstand challenges posed by the environment and other lifeforms will pass on their genes to the next generation. Those that don’t get eliminated.
Unlike the digital world, every newly born organism inherits the genes of its parent(s). Therefore, by undergoing mutations, small changes to their genes can have a huge impact. These small changes have a simple effect like small changes in muscle texture or skin color and can also become the core for developing new organs.
If these mutations help improve the chances of the organism’s survival they will be passed onto future generations, where further mutations might reinforce them. This has enabled us to better survive and reproduce. This mechanism of mutation and natural selection has been enough to give rise to different lifeforms.
Patricia Churchland in her book “The Origin of Moral Intuition” explains how natural selection has led to the development of the cortex. Cortex is the part of the brain that gives mammals the ability to learn from their surroundings and to develop social behavior. In humans, the evolution of the cortex has led to complex cognitive faculties.
Therefore, if survival is considered as the reward, the main hypothesis is scientifically sound. But implementing this would be very complicated.
In their paper, scientists claim that the reward hypothesis can be implemented with reinforcement learning algorithms. A reinforcement learning agent begins with making random actions. Based on how those actions align with its goals, the agent receives rewards. With time, the agent learns to develop sequences of actions that maximize its reward in its environment.
The authors of this paper further mentioned that according to their hypothesis, general intelligence can instead be understood and implemented by, maximizing a singular reward in one single complex environment. This is where the hypothesis separates from practice. The environment that scientists have so far explored with reinforcement learning is not nearly as computational resources of very wealthy tech companies.
Imagine we have the technology to create a simulation of the world. We could start when the first life forms emerged. You would need to have an exact representation of the state of the earth at that time. You will even need the initial state of the environment at the time and we still don’t have a definite theory on that.
An alternative would be to create a shortcut and start from an event. Like when our monkey ancestors still lived on earth, this might cut down the time of training. But it would be a complex state to start from. At the same time, there were millions of different lifeforms on Earth. They evolved together. Taking any of them out of the equation would have a huge impact on the simulation. Therefore, concluding two key problems: compute power and initial state.
Also read: AMP for WordPress: Top to Toe Explained
Many might say that you don’t need an exact simulation of the work and only need an approximate space in which your reinforcement learning agent wants to operate. The scientist in their paper gives an example of a kitchen robot. However, to function properly it would require various understandings and knowledge about kitchenware. Kitchens were made by humans. Like the shape of drawer handles, doorknobs, etc, everything you see in a kitchen has been optimized for the function of humans.
For this scenario, you can create shortcuts. Such as avoiding the complexities of walking or hands with fingers and joints. Many scenarios that would be easy to handle for a human would become prohibitive for a robot. Developing a notion that is heavily linked to human knowledge, life, and goals would be very complicated. A robot would have a hard time co-existing and cooperating with living beings that have been optimized for survival.
Here, you can take shortcuts. It can be created by hierarchical goals, equipping the robot and its reinforcement learning models with prior knowledge. These can be used to steer in the right direction. This would help in making it easier for the robot to understand and interact with humans and their surroundings.
But then this would be contracting the reward-only approach. In theory, the reward is sufficient for any kind of intelligence. But in practice, there is a balance and imbalance between environment complexity, reward design, and agent design.
The main aim of Evolution, Rewards, Artificial Intelligence is to clarify where the line between theory and practical stands. In the future, we might be able to achieve a level of computing power that will make this possible. As of now, Artificial Intelligence needs more research and development to reach a level where it can reach artificial general intelligence.