The Bitter Problem: Massive Models and Knowledge Discovery

February 26, 2026 by Anthony Rosa

Before I started developing artificial intelligence (AI) and machine learning (ML) applications, I had certain perspectives about the technology. After sitting down, learning, and programming those applications in contexts ranging from the Department of Defense to New Testament Greek, I began to view the technology in a different way, one that is commonly and independently reached by AI/ML developers.

First, I began to see the issue with describing the technology, in its current form, as “intelligence.” ML techniques such as a k-means clustering algorithm are not intelligent in any way, though they still traditionally fall under the AI-domain. Other AI technologies that appear intelligent, such as LLMs, are actually prediction machines. LLMs effectively mimic language and factual information through prediction, but lack preceding thoughts or causal inference. None of this is original and becomes clear to most developers once they begin to actually develop these systems from scratch.

Another fact I came to realize is that part of the magic of AI is being able to make associations in high-dimensional space. Humans simply cannot think in tens-of-thousands, or millions of dimensions (let alone four!), but computers can. This inhuman ability contributes to the opaqueness of AI systems and often leads to discoveries not readily apparent or comprehensible to the human mind.

For example, AI identified Baricitinib, a rheumatoid arthritis drug, as a treatment for COVID-19; Deepmind "discovered an unexpected connection between different areas of mathematics by studying the structure of knots"; and, of course, the creators of AlphaFold won a Nobel Prize for predicting “the 3D structure of proteins from their amino acid sequences.”

This brings me to the point of this blog post. Years ago, Rich Sutton opined that “one thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great.” Today, Boris Cherny is espousing Sutton’s conclusion, encouraging people to forego scaffolding and intervention, and simply wait for and bet on the next model to improve. Sutton and Cherny, in their own ways, came to believe in the power of general purpose methods/models.

Much like the other discoveries, I too came to this conclusion, before I had even read Sutton’s paper or listened to Cherny. Once I started ascribing the transformational potential of AI to its ability to make connections in high-dimensional space, I realized that incredibly large models that computationally scale have a much better chance at succeeding in information discovery than specialized models. My basic argument is that as the referential knowledge source and AI model grow, there is more potential to draw connections to things not readily apparent to the human mind; after all, if these connections were apparent, humans would have already discovered them!

However unlikely, it may be the case that baseball dirt and breast cancer are related, or some obscure cellular process can be manipulated through some odd performative function to cure disease. These seemingly unrelated objects can only become correlated in large models. Thus, the priority for the advancement of humanity should overarchingly be in working on these large general purpose models.