You’ve probably seen some of these headlines:
"An A.I.-Generated Picture Won an Art Prize. Artists Aren’t Happy." - New York Times
”Artificial Intelligence Bot Wrote Scientific Paper in 2 Hours” - Insider
“The Google engineer who thinks the company’s AI has come to life” - Washington Post
These are click-baity articles with hyperbolic and exaggerated elements. But they’re also representative of the extraordinary capabilities natural language processing (NLP) has unleashed.
In this post, I’m going to give a brief overview of why I’m so excited about NLP and the possibilities large language models present.
What is NLP?
Natural language processing is a subfield of artificial intelligence focusing on granting computers the ability to understand human language (both speech and text).
It’s a research area that has exploded following the advent of the transformer, arguably one of the most transformative technologies created within the last decade.
While conventional thinking was that AI would begin by automating more menial tasks, with NLP what we’re also seeing is the ability for a machine to also conduct creative work.
Natural language processing spans everything from helping Alexa understand your question to improving computer vision tasks such as optical character recognition (OCR). However, the specific area I’m really excited by is generative large language models (LLMs).
Large language models are neural networks trained on hundreds of gigabytes of text, and they’re changing the game. Starting with writing.
My PC is a poet: Generative text
If you’ve read anything previously about natural language processing you’ve probably heard about GPT-3. This language model, the successor to GPT-2 (and 10X bigger), stands for generative pre-trained transformer model.
At the highest level of abstraction, GPT is essentially autocomplete. But because it’s been trained on a dataset of 175 billion parameters making up much of the codex of the internet (all of Wikipedia is only 0.6% of that data), it's able to give really, really great responses to prompts. With fine-tuning and additional parameter optimization, these responses can become borderline spectacular.
Ok, so it’s “autocomplete on crack”. Why should I care?
Turns out, that’s extremely useful when said model is capable of understanding context. People have used GPT-3 and models like it to create tools that summarize detailed articles, generate marketing/legal copy, answer your educational questions, write code, and author new chapters of ‘Harry Potter’. And this is all the tip of the iceberg.
Generative language models are not sentient. They're not even very intelligent.
But, they simulate intelligence extremely well and can be used in any situation in which human text was previously needed. And that's essentially everywhere: writing articles, stories, scripts, games, and papers. Creating chatbots and social media content… even fake news!
Notably, GPT-3 is not perfect, as Sam Altman as previously acknowledged.
There are many other popular large language models available to the public for generating text, such as GPT-Neo, GPT-J, and BERT. And this isn't even touching on the gigantic internal models being developed at FAANGs (MAAAMs)?
AI authoring text is incredibly cool, and has tons of potential. But isn’t an image worth a thousand words?
My PC is a painter: Generative images
Turns out, AI is a better artist than I ever was; all the images in this article were created by LLMs. I generated several using Stable Diffusion, an open-source version of image generation models such as DALLE.
Generative image models accept a prompt such as “Pokemon cards of the 1960’s” and output art, as can be seen in this incredibly lengthy thread creating new Pokemon back to 4000 BCE.
Like with text generation, there are many different creation tools under development; Midjourney and Craiyon are other popular options that have different levels of creativity and visual styles.
Many of the works people have used these technologies to create are astounding, such as this animation of a neon cyberpunk city:
AI-generated images may fundamentally reshape the graphic design industry. It’s possible that these new and powerful creative tools will enable existing designers to do more in the same amount of time. Alternatively, instead of hiring a professional or freelancer, any non-artist may soon be able to generate dozens of versions of the logo or asset they require at minimal costs.
However, the idea that AI lacks or supplants the creativity of human artists is misleading. Generative image models still need humans to supply prompts. While being technically skilled with Adobe products may become less essential, human creativity will still be needed.
What’s the catch?
This all sounds too good to be true, right? In some ways, it is.
There are several challenges we have yet to overcome for large language models to reach their full potential:
Speed. Gigantic models are powerful, but they're also slow. Faster response times will be necessary as we aim to apply these technologies in more production settings.
Hallucinations. When given a prompt it doesn’t understand, AI will often respond assertively with content that is certified bull-shark. Importantly, it’s difficult to check the validity of such text. This is a critical issue for LLMs that makes it difficult to use them for fact-checking, chatbots, and essay writing.
Plagiarism. Because the most notable LLMs are trained on billions of parameters, they have plenty of examples to draw from. While many of the generated responses are legitimately original, there are also some that can be found verbatim on the web. This makes LLMs problematic when used in academic settings. We must construct improved mechanisms for detecting and handling AI plagiarism.
Undesirable content. For most of the web’s history, the majority of content has been limited to what a human could author. LLMs remove that restriction. Suddenly, people can create as many “fake news” articles as they like, or more realistic-appearing bot spam. Also, rule 34 - people are already using this tech to create porn. Content generation unrestricted by time may be a net-negative for society. Filtering out low quality, racist, sexist, homophobic, and otherwise toxic content must be a high priority.
Bias. LLMs are trained on data from the internet. Unfortunately, that includes a lot of content warped by our human subjectivity. This leads to LLMs returning results that reflect a lack of diversity, as well as being capable of generating extremist content. Even software developers with the best intentions will unconsciously make choices that cause AI to portray similar world-views. Going forward, it will be important to keep our biases in mind when building and identify bad actors.
Copyright/trademark. When someone uses AI to write an article, or create a painting, or sing a song, to whom does it belong? Can the person who generated it claim ownership? There are a lot of new legal questions that have yet to be answered by our aging laws. For example, if I train a model on your paintings and then it makes more that look similar, do I need to compensate you for that? Or if I use samples of Taylor Swift's music to generate more Pop/Country, do I owe her a songwriting credit? My current view (shared by an awesome professor who specializes in copyright) is that as of now, AI-generated images will be considered derivative works of whatever they were trained on.
Alignment. Are LLMs aligned with human values and following human intent? We will need to continue training AI systems using human feedback to ensure that the AI systems we're building bring about more positive than negative consequences.
People who are much, much more intelligent than I are working their hardest on the issues I discussed above. Here are some of the ways researchers are currently thinking of approaching these challenges:
Using higher-quality data. It's clear from examples including InstructGPT that the quality of training data and model output are correlated. Models trained on domain-specific reasoning become better at performing that task, such as math problems in the case of Minerva.
Improving use of context. One of the biggest constraints of using LLMs is their limited context; in the case of GPT-3, it's ~4000 tokens or around 3000 words. “Chaining" together context is a way researchers are aiming to make LLMs more effective for complex tasks. This turns the output of one step into the input for the next. There are several ways people are approaching this, including subgoal search, selection-interference, least-to-most prompting, and just straight up increasing the context size.
Prompt engineering. Simple alterations to the prompts fed to LLMs can have a massive impact on the quality of responses. However, due to the black-box nature of many models, it is often difficult to predict what these may be. As the founder of Replit joked on Twitter, this can get a little ridiculous at times - adding phrases like "let's take this step by step", or "I hope it's correct".
Retrieval augmentation. Recent research - such as with WebGPT, lambda, and REALM - show the possibilities for increasing accuracy through retrieving contextual documents from external datasets as part of a model's execution. Citing the sources an LLM is using as its context could also be a path to increasing the transparency of opaque models.
NLP is accelerating rapidly. Literally every other day I’m seeing something new and awesome - this last week that’s DiffusionBee and Amazon’s new language model.
Between text, images, and audio, NLP models are being developed in many areas. It’s possible that we’ll see more models that combine all of these capabilities; imagine a world in which any person can create a new movie at will, using AI to generate the script, visuals, lyrics, and sound.
Importantly, new open source models now exist that you can take advantage of to build transformative tools for the next generation. Stable diffusion, in particular, is upending the status quo and making image generation way more accessible (for better or worse).
It’s not just hype; while there are many aspects that need to be improved, it’s substance.
Web3 has potential and there are awesome people building in that space. But I encourage you to explore and learn more about NLP!
Thanks for reading :)