designed by Dall-E-2

Artificial scientists and autonomous laboratories

What is next for ChatGPT?

Eugenio Culurciello
7 min readDec 5, 2023


One of the goal of machine learning is to reach the levels of artificial intelligence. AI, the word, we hear every day, is an empty promise if we are not striving to create artificial brains that can be at the level or above of our own brain.

In the last 20 years machine-learning, notably in the form of neural networks, has produce an incessant set of innovations: from detecting, localizing, and categorizing the content of images, to learning sequences, predictive models, natural-language processing and understanding, just to name a few. But looking back all these innovations were aimed at solving individual problems, one-by-one with a tailor crafted solution. We were always starting with a dataset of inputs and desired outputs; we have been creating neural architectures designed to take the data and the task into account and developed many disconnected one-hit-songs models.

But no more. Since we are and want to talk about AI, we will need to set course for a unified vision and a set of unified tools that can implement artificial brains.

Intelligence is about creating knowledge graphs. Knowledge Graphs (KG), what is seemingly such a dry and ethereal concept, is just concepts in our brain, ideas, that are interlinked together by learning in the physical world. Take the words “Egyptian”, “Manx”, “cat” are individual concepts in our brain that are strongly linked, because “Egyptians loved cats” and “there is a type of cat called: Manx cat”. The words are connected points in our knowledge graph.

Recent Large Language models (LLM) such as Generative Pre-trained Transformers (GPT — with ChatGPT being one of the most famous versions) are machine learning models that can predict the next word in a sequence of text. These models approximate a Knowledge Graph over words and portions of words, and their ability to produce high-quality text, plans, algorithms, drafts that are similar to human output gives us confidence that the path to AI is along these lines. The capabilities of these models and the emergent skills that are learned training them is still a topic of research and will be for quite some time. These models are very large and capable and were trained without fixed datasets of input and output pairs, so they are time-consuming to test.

What is the future for ChatGPT? It is learning in the physical world. Our brain is also implementing a knowledge graph, but it is based on multi-modal “concepts”. When we learned the concept “cat” we did so by connecting visual, auditory, sensory representations of a “cat”, interlinking the way the cat moves, feels to touch, the noise it makes. The word “cat” is just a label that we use on the concept to allow us to communicate it to other people, given that our brains are disconnected (no WiFi, ethernet or radio link between them).

Our future AI models are going to be called Large World Models (LWM) and they will be trained to predict the next representation in a multi-modal input data flow. The complexity of this training modality lies in the fact that the “next representation” of the real world is not as simple as predicting the next words in a text. Words are a knowledge “representation”, but multi-modal data like images and sounds are harder to break into parts. Is the next portion of the image on the right what we need to predict? Or is the one below or above? Is it the next 10 millisecond of sound, or the next note, or the next segment?

Learning in the physical world also requires a body. We cannot learn multi-modal if we do not have a fixed set of sensors that can record and encode these modalities. This is the path needed to train an artificial robot brain. Learning will be based on physical interaction with the environment and prediction.

Learning streams of “concepts” via predictive learning requires to predict the next concept from a set of previous ones. This is the main learning modality. Learning occurs in a continual fashion, but it is mostly inactive if our prediction matches the physical next state of concepts. In other world this artificial brain is ignoring every information that it already has when its predictive abilities are correct. When prediction fails, the artificial brain signals a “surprise” which enables learning; this example we stumbled on needs to be learned!

Reinforcement learning occurs when we need to optimize for a specific goal. Our action affect the environment in ways we also need to learn and predict. It is not yet clear how to mix multiple learning modalities that could be pushing internal representation in different directions. Also learning sequences of concepts seems simple enough, but in the physical world “concepts” streams can be tricky: they are mostly attuned to changes in the environment and are not confined to a fixed sampling in space or time. It is about noticing changes in the environment, possibly because of our actions.

Mental simulation must also play a fundamental role in reinforcement learning and planning, because they allow an artificial brain to be more efficient in sampling possibilities. Rather than trying every possible move at random, an effective and efficient agent needs to simulate many and only try a very small set.

Artificial Scientists

designed by Dall-E-2

What would it take to create an artificial scientist? This is an algorithm that can read the entire body of knowledge in one or multiple fields and is able to gather data, create hypotheses, run experiments, generate reports and theses.

Some science like mathematics may be able to be trained purely on text and symbols. Experiments on LLM trained to perform mathematical proofs are promising. But for any physical science, the artificial scientist will be based on LWM, trained on literature and videos. Similar to human reading abilities, inferring words from vision, AI literature reading needs to be vision-based, because human-readable documents are a collection of not just text, but also plots, graphs, diagrams, tables, paragraphs titles, captions, etc. An artificial scientist needs to be able to read a plot: “what is the value of Y at X=1?” and also understand diagrams and complex figures: “what is block B connected to?”. Current LLM can already understand tables, databases, datasets by being able to write scripts and code interfacing and running on external files.

The multi-modal LWM needed for an artificial scientist is a vision-based input model that can read pages from sequential blocks of pixels. This allows it to be able to read all components of an article. I believe this training and system will be able to generate sound science plans just as an LLM is able to plan for many activities. In fact, most of scientific literature imbues research plans and scripts for testing hypotheses.

Artificial scientist trained in this fashion will thus be able to generate research plans. A research plan will begin with a set of hypotheses, a list of experiments that need to be performed. They will also have the ability to run external tools, such as simulators, code, scripts and read the output of those tools. They will be able to run tools multiple times, and aggregate all outputs into a final report, or collection of theses to summarize the results from experiments.

Simulations are an ideal companion to autonomous research plans because they allow to test the hypotheses and obtain ground truth. These can then constitute a new set of training examples that can self-improve until hypotheses are validated or eliminated because of lack of resources (here: time).

Autonomous Laboratories

designed by Dall-E-2

An artificial scientist will be able to interface and run autonomous laboratories. These are physical machines that can be interlinked to performed experiments in large scale and with autonomy. An artificial scientist is the controller of an autonomous lab, in the same way that a brain is a controller for a body. Videos and audio are needed for an artificial scientist operating the physical world, thus the core artificial brain is a robotic LWM trained on videos operations of the laboratory, including visuals of inputs, outputs, desired outcomes. The artificial scientist controller will need to be able to read instrumentation files and reports, and examine samples either by visual or physical inspection, or by using yet another tool or machine.

An autonomous laboratory of the simplest form is a manufacturing robot that assembles an item based on instructions. But the combination of an artificial scientist as controller allows the laboratory to do more than just follow instructions. It can generate output based on an incomplete set of instruction, in the same way that a caption can generate an image by mean of an artist. The complexity in an autonomous lab is not just its controller, the artificial scientist, but the instruments and materials needed to operate it, and the requirements for resetting, disinfecting, cleaning and restoring each tool to the initial conditions, ready for the next batch of processing.

Autonomous labs need to interact with humans and thus learn human-automata teaming techniques. This is not unlike the artificial brain of a robot co-habiting space with other humans, pets, creatures.

designed by Dall-E-2

about the author

I have more than 20 years of experience in neural networks in both hardware and software (a rare combination). About me: Medium, webpage, Scholar, LinkedIn.

If you found this article useful, please consider a donation to support more tutorials and blogs. Any contribution can make a difference!