Artificial Intelligence, AI in 2025 and beyond

Eugenio Culurciello
19 min readJan 3, 2025

--

SOME THOUGHTS…

We are reaching the second quarter of the 21 stcentury: it is an exciting milestone. I have been in “adult” life in the last 25 years and most of it was dedicated to learning technologies such as electrical engineering, computer design, AI, machine-learning, microchip design, etc.

Gemini-Pro: “a robot learning to see like a baby on a table with toys”

Looking back at the last 25 years, much has changed, and little has changed. In the year 2000 we had cellphones, and we had the Internet, albeit it was not as fast. We had social networks, but we could not really carry a computer and apps with us yet. AI was non-existent but there were plenty of computer algorithms such as internet search. Actually… we did have the Palm Pilot, so we did have a computer in our hands. Also: it did have apps.

Our cars had mostly combustion engines, and we had no robots to help around the house. Our TV were much bulkier and the screen much smaller. We had video game consoles, and we also had portable video games… that we have even in the previous quarter century!

Our lives did not change much, but they also did. We spend a lot less time with other people now. We work with them online and we play with them online. We used to talk to them on the phone, but now we prefer to text them. We get aggravated much more often for the use of words uttered online, and we are less amenable to meet face-to-face or cry on each other shoulders.

One thing that changed is how we interact with computers and code today. Since the last 2–3 years we have seen the rise of foundational AI models. These are model that can provide a variety of use without the need to be explicitly programmed or provided with large number of examples.

What is next in the field of AI? What is gaining traction and what is just smoke? This is the main content of this article.

AUTOMATIC COMPUTER CODE

When I started working on neural networks, around 2010, we were dreaming of writing the last program we will ever need to write: the neural network that learns it all and can now code itself.

Now we are there.

We have foundational models that can write code, and when looped into agents, they can also run the code, gather error messages, and correct their own code. They can even write modules to extend their capabilities-as in a dream scenario Matrix-like movie.

Today one can write an entire cell-phone application from scratch without having to write a single line of code, and at the most needing to feed-back manually error messages and asking the agent: “please fix it”. Similarly, one can write multi-file coding projects to completion, even graphical interfaces, for example feeding-back Java-script issues from a web demo.

I said we are there, but it is not quite…

Agentic coding models today can write an entire program that is mostly functional. Error messages can be fixed by the agent, but sometimes agents get stuck in a loop because they cannot find the right file or web link. Or they get stuck because they feed back the same information and answers over and over.

The trouble comes when we have some graphical output, as in a web page or a cell-phone application or when using an external graphical tool. Then current coding agents operate open-loop, meaning that they do not have yet the ability to feed-back graphical output information. Given that we already have multi-modal models that can “look” at images and screen capture snippets, I would say we will be able to close the loop soon. Today you can experiment with web-based AI Studio tools to have them code a website based on an image of another website, or a cell-phone app view.

Outlook: gaining significant traction

GENERATING WRITING

Of course, AI that can write for you have been around a couple of year starting with ChatGPT and all its siblings foundational large-language models (LLMs). The tool can be prompted to provide well-written text, or to improve existing text snippets or paragraphs. It makes us all proficient writers.

If you tried to generate significant portions of text with LLMs, you know that the best way is to describe what you want generated with the maximum number of words. This sometimes will reveal the nature of the beast: it takes significant text input to specify all we want to be generated. This is even so when using coding AI models or foundational AI models with graphical outputs.

Most of us may start with a list of bullet-points notes, and LLM often reply by extending such lists, but not necessarily generating entire paragraphs for each. It will take more effort and prompting to get what you need.

Summarization of text is very effective if the input is just a few pages. When trying to gather information from hundreds of pages, the best approach may be to do it step by step, by breaking down the input text into a few pages, and running the same prompt on each. Retrieval-augmented generation, or RAG, is a technique to feed large amounts of text to an LLM. Think hundreds or thousands of pages from many documents. RAG has severe limitations because it does not use the full power of the LLM. Foundational models build a knowledge graph that connects all concepts they were trained on. RAG instead retrieves pieces of text by similarity using smaller and more limited AI models, and then feeds the prompt and retrieved section to the LLM.

A better approach is using the full LLM power on the entire body of text we want to use. This is of course costly, as we will need an LLM to really read all pages and extract the information we want. Say for example we want to search for “the recipe for eggplant pasta” over 100 cooking books. This is easily done with RAG or a simple and old basic text search. The power of LLM comes when you need to search nuanced version for your text that need understanding. For example: “what recipe need to stir-fry and eggplant in small cubes or slices?” may need more than just plain text search. You can think of even more difficult examples that will quickly reveal limitation of RAG, such as “Summarize the take-home points of this meeting transcript”.

An LLM agent may need to parse the entire body of text, and then summarize useful content into a “working memory”. This can be then fed to the LLM to provide answer against a prompt, or to extract information. This is more costly, but it is also similar to the way humans parse large amount of text or data. Divide and conquer.

One area that still needs work is when you need to fill a form or document using data from many other documents. This tedious job is an everyday frustration for everyone working a desk job, yet it has not been satisfactory addressed to data. One reason is that documents are intrinsically graphical, and filling forms requires an understanding of the graphical sections and where to place content, and under which section. This is often obvious to a human, but not as obvious to an LLM. Complications also arise from the need to extract characters and words from the document (OCR) and the relative position of the form input areas.

Outlook: text-based LLMs are improving and very useful for desk jobs. More features are surely coming.

GENERATING IMAGES AND VIDEOS

Generative AI gets our attention even more when we see the images and videos it can create by just using textual prompts. The power of connecting images and captions allowed the creation of masterful neural networks that can transform text into images, images into videos, refine image, augment resolution, add or remove portions of an image or video, and even create them from noise.

Today image generation is very advanced and can produce any kind of realistic or stylized array of pixels. The new tools also allow to directly change or adapt portion of an image to our descriptions and commands. This was not the case in the past, but even commercial tools now allow to mark a region of interest, sometimes automatically, and then change it as desired.

Video on the other hand is still in infancy, producing short segments that are consistent, but rarely more than a handful of seconds long. Videos of a single person talking or close-up can generally fare better, and can be even used as digital customer-care agents.

Even entire video games can be realized in the direct movie format today. These are video game sequences produced entirely with textual inputs. They do rely on the same video-generating techniques but are impressive in the way they emulate the changes as if you were moving inside a 3D world of a modern video-game. These tools will surely be utilized to create in-game animations and other video snippets for gaming, and also for video effects and movies production.

Outlook: video-producing foundational AI models have still some ways to go before being able to produce substantial parts of our videos and movies.

APPLICATIONS OF LLMs

What are the next possible applications of foundational AI models? If you look online today there seems to be an LLM for everything: law, healthcare, educational tutors, meeting aids and summarizers. If you can turn your data into text, LLMs are going to be the best way to solve your problem, whether it is writing or organizing textual knowledge or generating new text from existing text.

The ease of use of LLM, the fact that they need less data and the complex training techniques of Neural Networks 1.0 (the ones from ~2010 to 2020) significantly lowered the barrier to creating and adapting foundational models for a variety of uses. This also means that a team without AI or neural network expertise can now create entire businesses just using foundational cloud-based LLMs and models. This in a way satisfies the wish of neural network researchers: they wrote the last program you will ever need!

What makes an application of LLMs a viable business opportunity, or a just a useful tool? Most application of LLM will require large complex models that reside online, therefore requiring hefty monthly usage fees. If the value added to the user is higher than those fees, then there is a clear business model. As an example, a LLM that can help lawyers navigated thousands of pages to prepare for trial is a probably going to make money. On the other hand, an application as “ask me any question about cats/dogs, and how to take care of them” may be satisfied by smaller LLM models but more importantly may not afford the same monthly or yearly payments from its potential users.

What is exciting for investor is that the field of foundational AI model gave software as a service (SaaS) another large push, potentially gaining much from little investment. But lowering the barrier to entry means also more noise, more competition, with difficult value proposition or clear advantages for users. This translates to company that are not really deep-tech, but rather a purely marketing play with a really good user forum and customers interaction team.

One suggestion to investors and startup founders is to look at niche areas of use of LLMs, in small potential focused pockets of advantage, where the value is high. Examples could be “AI for mining materials”, “AI for common hearth diseases”, “AI for quantum computing” and so forth. Many of these niche applications may require the use of multi-modal models, to differentiate by added functionality that goes beyond parsing just text data.

Another idea to consider is to try to rely on local, smaller open-source foundational models and LLMs, thus opening the door to a large reduction in cost of operations. Even better if the software or application can be installed on user hardware (their phones or PCs) thus removing the needs for custom cloud setups.

Outlook: text-based applications are already saturated and the ease of use of AI tools and LLM is lowering the barrier to entry. Today anyone can make AI applications very easily.

Multi-modal LLMs are foundational AI models that build a knowledge graph based non only of text and words, but also on concepts available in other forms of media. For example, the idea of a “cat” is not just the word, but also the way it looks, moves, purrs, smells. Similarly, we need AI foundational models to be able to create a knowledge graph over the data we are most familiar with text, images, videos, plots, diagrams, tables, databases, dataset files, and more.

What will this do? It will give foundational AI model the ability to work and reason more closely to the human plethora of senses, and thus a more real understanding of the physical world.

What applications will this enable? The first one is in coding foundational models that can close the loop with graphical outputs. For example, we can use multi-modal AI models to create webpages, cellphone applications, but also complex graphics as in photo editing, 3D models, engineering and architectural designs, mechanical parts, etc.

As an example, today it is still very difficult to design 3D objects given the sophistication and complexity of tools such as Blender, 3D software editors, CAD tools for engineering, architecture, mechanical design. Similarly, video editing and photo editing depends on too many menus and highly complex operational pipelines, often offering multiple solution to achieve the same objective. Multi-modal LLMs will be able to remove or lower the barrier of entry to a wider set of users.

Outlook: multi-modal foundational AI models are going to provide yet another major revolution in automation and concept learning.

ROBOTICS

The field of robotics is still behind. Robots today offer very limited abilities to help us on everyday tasks such as cooking, cleaning the house, driving in all-weather situations, automating a home. The main reason is the lack of a robotic brain that can learn embodied multi-modal concepts.

Generative AI models, and specifically multi-modal foundational models are the core of a robotic brain. The missing step for robotics is embodiment. This means correlating the robot own actions with the multi-modal concepts if can perceive. Datasets that show how a robot should move to fulfill a task are few and cannot be easily transferred to other robots or configurations. We have a myriad of video of people performing any task, but these do not contain explicit sequences of control for arms and legs and actuators.

Humans learn to imitate by watching others perform a task. We have not yet devised an algorithm that can do the same and with the same learning speed. In other words, the missing piece is an algorithm that can learn by watching videos. Learn to control its own limbs in the same way as the actors in the video.

Outlook: robotics is still behind the revolution of generative AI models and will be for another few years.

EMBEDDED AI

Embedded AI foundational models is one area that has not progressed as much as their cloud counterpart. This is because of the more limited computing capabilities of embedded devices — these devices are similar to cell-phones hardware than a laptop, and also the more limited capabilities of smaller AI foundational models.

Embedded AI can have major application in smart cameras, home appliances, smart homes, but also on drones, bicycles and scooters, robotics, and vehicles. These devices need to have a combination of medium-range LLMs and / or high-performing vision systems. These models would allow embedded devices to understand a visual scene of at least 1080p resolution, ideally 4K and operating in real-time. They can also offer the ability to run 8B to 30B equivalent LLM models on device and with at least 10 or more tokens per second of outputs.

Embedded processors like the RockChip RK3588, NXP iMX8 processor, or the Qualcomm Snapdragon X Plus offer powerful standard CPUs coupled with high-performance AI accelerators and embedded GPUs. These processors, coupled with small AI foundational models and proper high-quality trained models, can offer a solution for the space of embedded applications.

One note to add is that beside all the complex AI capabilities we have today, simple solutions such as turning on a light when a person enters a dark room still require an expert to set up and deploy… clearly we have a lot to do to make the technology work for us, including all software and hardware manufacturers, so that simple things as home automation can finally bring us really smart homes. But this is just an example we can all understand, because imagine the same devices working in a smart factory, helping reduce cost, improve safety for workers, and monitor all aspects of production.

Outlook: expect more and more capabilities from small portable embedded devices, getting progressively close to the same cloud capabilities of one year ago, at least from a single user perspective.

AI HARDWARE

AI needs hardware to run. The recent surge of LLM has put some serious demand on the potential of silicon-based microchips. Traditionally memory microchips and computer processor microchips have been implemented in different silicon manufacturing processes, with different materials and substrates that make them impossible to merge. In addition, the 2D layout of microprocessors puts a constraint over how many wires of the same length one can afford to connect neighboring microchips. The combination of these issues has made memory interactions with processors a bottleneck. The 2D nature of microchips also places constraints on the number of processing elements that can be arranged in proximity. So really the limitation of modern microchips is density and 2D confinement. We will see that it boils down to wires per unit area.

What precludes us from making 3D microchips? HEAT — as modern microprocessors and memory microchips produce heat during operation at high frequency which cannot be easily removed in a sandwich of many layers. Typically, heat is removed with fluid immersion or air-cooling. The latter being the prevalent mode, given that fluids complicate the design of circuit boards and add a lot of cost and manufacturing complexity. But what about complete fluid immersion? That is one of the best ways to deal with the problem at least at scale, but again introduces additional complexity: we will have to fish the computer out of fluid every time we need to service it.

Another issue in making 3D stacks of microchips is that they will need to share wires between them. These wires need to reliably connect each microchip, but today fabrication processes limit the yield of 3D integration to about 10 layers, after which the rate of failed connections limits the viability and efficiency of 3D stacks.

Even using fluid immersion cooling, the real limitation remains how many wires we can place in a unit area. It is possible to run more stacked microchip at a lower speed if we can share information between them. This means we need many wires to connect the microchips, ideally 1000s per millimeter square. Today this limitation is we can place a wire every 25-micron square, more than 1000 wires per millimeter square. The formula to calculate how much data we can share between microchips is approximately:

Data interchange = number of wires * frequency of operation of each wire

Of course, how much data interchange (also called memory bandwidth) needed is a function of individual applications. The issue remains: memory and processor are separated and not often 3D integrated.

The modern AI microchips use stacks of memory in their HBM memory, today 8 layers running ~1000 wires at 8 GHz. Processors can sometimes have another memory stacked on top: SRAM, but it is not yet typical.

Because of these limitations today AI microchips can provide only a limited number of Tera-operations per Watt (performance per Watt of electrical power). LLMs today require large number of processing elements and large amounts of memory, all connected by very large memory bandwidth. That is because LLM are often large neural networks using 8 or 70 or hundreds or thousands of GB in weights only. All those weights need to be available on the AI processor every time we ask it to compute a new token, therefore putting enormous constraints to recent memory microchips.

One would want today to speed LLM 100 if not thousands of times, but the limitation of microchips, their fabrication, arrangement, wires and thermal profiles makes it impossible to achieve more than a few efficiency multipliers for the next few years. In addition, there is no other technology that can come to help the need for more efficiency in AI hardware.

Outlook: do not expect your AI hardware to run much faster or more efficient for the foreseeable future

AI DATA CENTERS

The demand for AI data centers has surged significantly with the raise of foundational AI models, and it has increased further in the last 2–3 years. Electrical power delivery is now one of the main reasons that limits the expansions of data centers, because it takes time to adapt the national power grid to support the large amounts of energy required by AI data centers (AI-DC). 50–100 MW are now the norm, with large data centers requiring 500 MW or more. These amounts of power production are not easy to come by and are also limited by the shortage of power transformers (not to be confused with the neural network transformers that power foundational AI models), some of which require 18 months to be manufactured.

Data Centers continue to be a commodity in the foundational AI age. The value of buildings, wires, racks and maintenance is very low. The real value is the AI hardware and computers, specifically AI processors like GPUs. These specialized GPUs for AI now are moving up the chain: today we can buy special clusters and entire racks for data centers.

Who is making money in data centers? The makers of microchips for GPUs, computing, networking, and computer memory are the real winners. But also providers of software to manage data centers and their operation, including data and databases are also in more demand, albeit many open-source and free solutions exist and are openly available.

Outlook: AI data centers will continue to expand in the next 10 years, fueled by new multi-modal models and larger foundational models for robotics and real-life applications.

PRIVACY

With foundational AI models primarily being in the cloud, users of the technology have less chances to keep their data at home and are being forced to place more and more of it online. This opens the door to sovereign monitoring and surveillance, which is already commonplace in many retrograde countries.

Computers and storage have been moving to the cloud for a while, because it is easier to have someone else assemble and maintain if for you. It does come with a price! Usually the cost is 2–3x more than DIY computers on-site.

One complication with generative AI is the size of the models and the hardware requirements which can be above and beyond what one can afford on-site. For example, large 70B or 400B models can only be run respectively on a 1-GPU and 8-GPU systems with discrete large GPUs. Only smaller model 7B or less than 30 B can run on laptop-grade hardware. This clearly makes the cloud move inviting, again at the expense of more price.

But not everyone is comfortable to send data to a cloud provider. Small and medium enterprises with a considerable know-how are not keen on using the cloud. And so are many consumers and citizens that are considering privacy of their own data a priority.

One movement that is taking hold is to host all data and some foundational AI models locally on your own hardware. Laptops, desktop PCs and small networked devices for data are being converted to local AI powerhouse, sitting together with the data. These devices can implement local RAG system, local knowledge and search, and can also implement coding AI tools that use the company own data locally. They can also be used by consumer to batch-label a large collection of video and photos, and for other personal data needs, such as filling forms and documents based on your own private legal documents.

Not to say that cloud data and privacy do not go together. Most cloud provider offer industry excellence in data and computer security. But there are lingering issues: one is that by concentrating data and computers, the cloud providers are also target or many more cybersecurity infiltration attempts; second is that computer security is never guaranteed by default, as new exploits are discovered daily and promptly used by malicious agents. That is also not to say that every local home setup is secure. But if provided with the right safeguards and software that enforces good security measures, it can be more secure than a giant well-known cloud provider.

Outlook: the cloud will try to expand, but a small contingency of local and private devices is making their way and will become progressively more important.

TRAINING AI FOUNDATIONAL MODELS

High-quality training material is fundamental to training AI foundational models. All the first generation of LLMs (2022–2023) were trained on very large collection of text from the Internet, magazines, publications, and all the data that we could easily access. But recently, it became apparent that smaller models trained on high-quality material were as good as larger models trained on much more data. Today models that are 400 or even 70 B compete with models much larger of 1–2 years ago.

But what is high-quality training material? It is basically textbook-quality information of well-known facts rather than the confabulations and dialogues of random people talking about facts on the internet. It is like school versus street knowledge. Imagine we take all the training material we use for human students from when they are baby to well past University. Imagine we feed all the expert textbooks on all subjects we want our LLM to be familiar in. If we apply a curriculum, step-by-step, year-by-year for all this content, we have very well distilled quality material that is superior to what one can get by absorbing it though the environment over the years. Similarly, using school grade material to train LLMs makes them learn much faster and be more accurate than training with data from discussion forums. Do not get me wrong, some discussion forums are full of useful information also, but they need to be used only after pre-training with high quality training material. That is very much equivalent to how humans learn!

So where do we get this high-quality material? For k-12 school material, it is easily available in many free textbooks and courses, including student exercises and their solutions. And then there are the more advanced textbooks from college to grad-school to expert manuals. Publications are not high-quality material because they are not yet validated by practice and wide use, and because scientific papers do not reveal all the disadvantages of techniques, but rather they are biased to the advantages for publications. Textbooks instead average our knowledge over many years and they mostly only report techniques and ideas that are validated. More importantly textbooks can provide more information on what works and what does not, under which conditions scientific techniques are valid or not, and thus provide high-quality knowledge.

This topic really makes you think what is knowledge and how is knowledge acquired. LLMs behave like humans when acquiring knowledge, and research demonstrated that a curriculum approach from simple concept to more complex topics is the way to go. Using the right information, as opposed to all possible information also leads to better learning. Humans can learn fact from experience, from word of mouth, gossip, and urban legends, but that knowledge is not always correct or grounded.

Outlook: high-quality training material to train LLMs is the new gold.

FUTURE

What does the future of foundational AI have in reserve for us? Here are some ideas based on the content of this article:

  • Foundational models will get smaller and powerful
  • High-quality training data is of paramount importance
  • Cloud will continue to grow, but embedded, private, local solutions will grow faster
  • Robotics will advance slowly for a few years. Similar prospects for autonomous vehicles
  • Your data privacy will become more important and force a push away from the cloud
  • AI hardware will promise to get more efficient, but will face physical barriers
  • Build the infrastructure not the apps

We have a part in this future, both to guide applications and / or to build them. Enough reading: time to act!

PS:

this essay was written without generative AI. Yes it shows…

about the author

I have more than 20 years of experience in neural networks in both hardware and software (a rare combination). About me: Medium, webpage, Scholar, LinkedIn.

If you found this article useful, please consider a donation to support more tutorials and blogs. Any contribution can make a difference!

Originally published at https://euge-blog.github.io on January 3, 2025.

--

--

No responses yet