The year ahead

6 min readJan 21, 2024

Read it first on my Substack

Building on last year’s post, I’d like to once again set some core themes for my interests in the year ahead. 2023’s themes focused on generative AI and new social media platforms, both of which continue to have a moment. I think the coming year will sharpen our reckoning with large language models, especially as they’re more deeply injected into the already destabilized attention economy. But before discussing this year’s themes in detail, let’s reflect on last year’s:

Simulation

This focused on two ideas:

Using sophisticated (and potentially LLM-powered) agent-based models to better understand digital systems
Generating synthetic, realistic datasets to conduct large-scale research without privacy concerns

We didn’t see much of the latter (although there was a scandal in which a startup founder generated millions of fake email addresses to trick JP Morgan into an acquisition). And I think my framing of the former was slightly off the mark. Having ABMs replace observational data as research tools may still be on the horizon — it will take time to reach a consistent, valid framework for mapping agent behavior to theoretical advancements. But the broader paradigm of AI as functional agent made big strides last year. Recent advancements like web browsing and multimodality make large language models capable of much richer interactions, and we’re starting to see experiments with those capabilities in software (as search engine, as personal assistant) and in hardware (as operating system, as interactive layer between user and applications).

Generic sequence encoding

This was my attempt to think through potential novel applications for LLM-type model architecture. I wrote:

Encoding is the hidden engine of AI. It’s the bridge that connects the dazzling model architectures we interact with to the real-world data they require. Finding new ways to encode that real-world data will help our models do more — not just generate text, but deploy it in intelligent ways. Anything with structure, sequence, and symbolism is a worthy target.

In one sense, I think this is happening — multimodality suggests a general approach to understanding and responding to any type of incoming information, not just written language. But I still think this is an underexplored area, in that many types of information can be encoded as sequential text at scale. Chess is my go-to example here — transformers can be fine tuned on Portable Game Notation to play quite well. You can imagine similar encoding schemes for datasets like user website activity or stock market movement.

New attention markets

Last year marked a period of deep instability for attention markets. Twitter fully imploded after I finished writing the last piece, and Substack is now in the middle of its own self-inflicted unraveling. These trends have inspired some movement — Ghost, Buttondown, and Beehiiv are having a moment, Bluesky and Mastodon less so. But instead of another platform or two rising to act as the replacement, it feels like there are simply fewer places online with large, vibrant communities. The shift to decentralized, smaller-scale communication is a massive change, so perhaps we’ll see more established communities and norms as the new landscape has time to develop. But for now, the instability across platforms seems to have created fewer large-scale attention markets on the whole.

For 2024, I’m narrowing down to just a couple focal points:

Finding a home for generative AI

We’ve now seen large, “foundation”-type models tackle an enormous range of tasks — writing code, taking exams, displacing customer service agents, and so on. The capabilities of these models are only expanding, as multimodality allows them to interact with audio, images, and video.

But now that these models are (to varying extents) capable of a wide range of tasks, attention has turned to figuring out what new classes of software they enable, and how they fit into existing systems.

In one approach, large language models represent such a major transformation that they require an entirely new system. This is the AI assistant paradigm represented by ChatGPT or Bard — an entity that you engage with directly, and that in turn provides information or takes actions on your behalf. The Rabbit R1 takes this same approach, positioning AI as its own interface to other software and the primary layer that you will interact with.

This approach has clear benefits — it harnesses the flexibility of large language models, and having one ultra-powerful AI that can do everything provides good marketing. But using the largest, most complex models available for everyday tasks also costs an extraordinary amount of resources. Training and deploying the models requires massive compute, and the volume of human feedback, custom prompting, and safety and privacy guardrails that this flexibility requires are only realistically achievable by major institutional players. The rest of us can be downstream beneficiaries, through APIs and chat interfaces, but we can’t build these systems on our own.

Because of these constraints, I think many researchers and practitioners will instead deploy LLMs conservatively, as components of existing systems. This approach works especially well for existing machine learning tasks, where LLMs can act as better-performing, drop-in replacements for existing algorithms. Think of common natural language processing tasks. We’ve had models for workflows like text summarization, classification, and entity extraction for decades, but LLMs often perform these tasks better, especially in cases that require extracting structured data from unstructured documents. LLMs can act as glue in existing systems — between incompatible inputs and outputs, between structured and unstructured data — with flexible processing and modeling built in.

To me, the second approach is more immediately exciting for the use cases of researchers and practitioners. Plugging LLMs into existing systems leverages their strengths, while minimizing inefficiencies (i.e., tasks that other algorithms can handle well). This year, I want to further explore the kinds of scenarios where LLMs are well-suited to act as glue in existing systems, as well as frameworks that make this kind of integration straightforward.

Thrifty language modeling

Related to the above are pressing concerns about LLM efficiency — How can we justify the massive energy costs these models require?

We’ve just discussed how the full power of a “foundation” model is often not necessary, especially in specialized or well-defined tasks. The flexibility of a model like GPT-4 is important when the task it’s being asked to perform is unknown, but it’s also an inherently inefficient system — you don’t need the parameters that primarily handle writing code or parsing court cases when you’re asking for cooking advice. This is further evidenced by cases where specialized, more efficient models (such as LoRas) perform better on a subset of tasks.

More broadly, the resource demands of language modeling beg the question of where simpler, smaller models will work just as well (if not better). GPT-4 can classify documents by topic, but maybe a fine-tuned text classifier will get the job done for a fraction of the compute cost. Or maybe a series of specialized models, each built to handle a step in a complex pipeline then stitched together, can match the multi-step reasoning demonstrated by LLMs. There are likely opportunities to shrink down language model training and inference for many high-impact use cases, which are being obfuscated by the GPT-4/Gemini hype cycle.

These points have a clear focus on language modeling, but hopefully in a way that is clear-eyed about the potential of LLMs. LLMs pose many threats — they run the risk of flooding the internet with low-quality SEO text, auto-generated product listings, social media bots. But I’m not particularly interested in the writing output of tools like ChatGPT (and I’m especially uninterested in efforts to replace screenwriters/journalists/creatives with LLMs). I do see places where the flexibility and natural language capabilities of these model architectures can have a real impact on machine learning and the study of digital systems, and I’d like to explore that further.