Robotics and AI in 2024 and beyond — part 2

Eugenio Culurciello
5 min readFeb 24, 2024
designed with AI by Dall-E2

How do we get robots to be useful in our everyday life? The robots we can afford today have severe limitations in understanding the space around it and in continual aggregation of knowledge and expertise. Robotics to date is often a collage of disparate algorithms and models that is unable to grown and learn to generalize and operate in our environment. What can we do to break this vicious cycle?

One idea may be to start from zero. Can we train a robot from scratch in the same way we train a human baby? What would it take for a baby robot to start learning about the world, interacting and eventually imitating other agents?

This idea is effectively “growing a brain” for robots, connecting it to the robot body, capabilities, and possibilities of interaction with the world. How can this be achieved? A possible way to do this is to do what we do with our babies: stimulating them to interact with us and the world. A human baby at the beginning does not even know where to look, or what is what. Yet within a handful of months, it builds up a complex representation of herself, other humans, and the world around her. All that came from training and stimulation, that parents, grand-parents, care-givers often give non-stop for every one of the baby’s awake hours. What does a baby learn?

The ”first steps of learning”:

1. What to see, to hear, to feel

2. What are other people around

3. How to interpret other people (agents)

4. What is my own body

5. The environment around me

6. How to imitate

Once a baby learns to imitate others, there is a sudden acceleration in learning. Imitation seems like a trivial task to our trained minds, but it has immense potential. It allows to transfer information from one individual to another (young baby) one very effectively. Learning to imitate is the cornerstone of training because it offers examples of behavior that was previously optimized, preventing the young baby from lengthy, costly, potentially dangerous trials-and-errors.

But how do babies learn to imitate? Imitation is not the simplest task that a baby (or a baby robot) needs to learn, neither is one of the first. Yet it may be the giant step forward in developmental learning that bootstraps lifelong learning and an intelligent brain.

In order to imitate another agent, human being, an agent needs to understand what is self, what is others, what are objects, and what are agents, how the world around works, how the self or other agents interact with the world and its objects, and the dynamics of these interactions. In substance it needs to learn the foundations of knowledge that enable an acceleration of learning. A baby that learns to imitate can now learn many skills by just observing others. A robot that learned to imitate, can train on a multitude of YouTube videos.

”First steps of learning”: How do we get there?

Child development and early learning sciences offer innumerable insights on how human babies grow to learn the first steps of learning. All these scientific discoveries, theory of the mind and brain, neuroscience, psychology, help us immensely in the creation and growing of a brain for robots. One advantage we have is that we can build one and validate our theories. The disadvantage is that this cannot be done by trial-and-error.

Here follows a summary of the components and ideas we may start with. These ideas follow decades of scientific discovery in the fields mentioned above.

Brain and body: an entity needs to have a brain capable of learning a knowledge graph of the environment it is embedded in. Meaning it needs to support the kind of learning that we want to achieve, eg.: human level. The architecture of this brain is currently unknown, but recent AI foundational models (Large Language Models or LLM) offers insights because they learn a knowledge graph (on words), albeit not multi-modal and embodied. A robot body is foundational to learning because experiences are a combination of body sensory interactions with the world around us. Eyes, ears, touch, all senses come from the body, thus all knowledge comes from the body.

Own Goals: an entity needs to make its own goals. Imagine the first days for a baby or a baby robot: it needs to learn where to look or what to pay attention to. How does it decide to look at a care-giver face or the motion of another agent as opposed to leaves moving in the background? Even basic sensory attention needs to be directed by some goal. Learning a model of the world cannot be based on static pre-defined cost-functions or reward-functions. The robot ought to learn its own goals and fine-tune its capabilities from the ground-up. How do humans learn all this? What are the ingredients? Are babies born with pre-defined goals (look at faces, suppress hunger, reject pain, etc.)?

Learning algorithms: An essential component of the brain of a robots is the way it learns from its experiences. Since we want the entity to be able to generate its own goals, we cannot define specific learning function to optimize. We cannot rely on supervised learning because we would need an oracle agent to always provide supervision. We cannot rely on reinforcement learning because it is based on pre-defined reward functions. We cannot use imitation learning because it is essentially a form of supervision. The only possible type of learning in blank-slate robot or young human baby is self-supervised predictive learning. In the real world there is no ground truth but itself. Prediction forecasts the future, and the future is only a few instants away. When that future is realized, we can estimate the value of our prediction, and its validity.

Supervision and reinforcement come into play as agents own goals and techniques. They need to be learned.

Learning environment: an entity needs a stimulating environment embedded with trainers, other agents that have previously learned a task and can show examples, provide feedback. A static environment with just objects is not enough. In robotics the cost of tutoring is prohibitive as it involves humans operating in real time. Automata are not helpful because they cannot provide fine feedback and the agent can only learn as much as the automata: pre-confectioned solutions. Training agents provide feedback with voice, and no-verbal cues like facial expressions and sounds.

A Proposal

Imagine a robot, sitting on a virtual desk inside your screen. A human operator can interact with the robot and a few objects on the tale, can talk, can move artificial arms in the virtual space.

How can our robot:

- Learn to pay attention to the interactions? Not to look at leaves outside…

- Learn what to see, hear?

… to be continued… in part 3

about the author

I have more than 20 years of experience in neural networks in both hardware and software (a rare combination). About me: Medium, webpage, Scholar, LinkedIn.

If you found this article useful, please consider a donation to support more tutorials and blogs. Any contribution can make a difference!