People have become entranced with generative artificial intelligence products. Whether powering a chat bot, image or video generator, or some other software intended to replace or augment human effort, the enthusiastic reception such products receive shows the faith people are putting into them.
Since the introduction of these systems, however, there have been strong criticisms of the results. One of the classic examples is the existence of so-called hallucinations. That is when the statistical nature of storing and retrieving chains of words provides utterly wrong answers.
The companies have tried to improve accuracy and results by constantly scaling up: more data and more computing power and then adjusting results. "It may be taken for granted that as models become more powerful and better aligned by using these strategies, they also become more reliable from a human perspective, that is, their errors follow a predictable pattern that humans can understand and adjust their queries to," write researchers from the University of Cambridge, Cambridge, England; Leverhulme Centre for the Future of Intelligence, University of Cambridge, Cambridge, UK; and Valencian Research Institute for Artificial Intelligence (VRAIN), Universitat Politècnica de València, Valencia, Spain.
Their study in the journal Nature finds something different. They examined a series of models and found that over time, "scaled-up, shaped-up models do not secure areas of low difficulty in which either the model does not err or human supervision can spot the errors." The models, in short, become less reliable the more time and energy put into developing them.
Also, no matter how systems are developed, the models give answers that would seem to make sense but that are still wrong more often. It's akin to the phrase that trying to do more of the same thing doesn't guarantee improvement. In a sense, you can also get more of the same mistakes by magnifying the ways you were getting them before.
The researchers say that this trend of expanding what has been done only to continue trends of errors signals a need for a "fundamental shift" in the design and development of AI systems.