What Is ChatGPT Doing … and Why Does It Work?

What Is ChatGPT Doing … and Why Does It Work?

It’s Just Adding One Word at a Time

Where Do the Probabilities Come From?

Take a sample of English text, and calculate how often different letters occur in it.

“Surely a Network That’s Big Enough Can Do Anything!”

There are things that can be figured out by formal processes, but aren’t readily accessible to immediate human thinking.

Inside ChatGPT

The goal is to continue text in a reasonable way, based on what it’s seen from the training it’s had

It operates in three basic stages

Every part of this pipeline is implemented by a neural network, whose weights are determined by end-to-end training of the network

The Training of ChatGPT

The result of large-scale training, based on a huge corpus of text-on the web, in books, etc.-written by humans

A neural net with 175 billion weights can make a “reasonable model” of text humans write

Given all this data, how does one train a neural net from it?

Machine Learning, and the Training of Neural Nets

Numerical analysis provides a variety of techniques for finding the minimum in cases like this

The point is that the trained network “generalizes” from the particular examples it’s shown.

How does neural net training actually work?

Essentially what we’re always trying to do is to find weights that make the neural net successfully reproduce the examples we’ve given

To find out “how far away we are” from getting the function we want-and then to update the weights in such a way as to get closer

What Is a Model?

Any model you use has some particular underlying structure-then a certain set of “knobs you can turn” (i.e. parameters you can set) to fit your data.

The underlying structure of ChatGPT-with just a few parameters-is sufficient to make a model that computes next-word probabilities “well enough” to give us reasonable essay-length pieces of text

The Practice and Lore of Neural Net Training

Over the past decade, there have been many advances in the art of training neural nets.

The tasks we’re trying to get neural nets to do are human-like ones-and neural nets can capture quite general “human-like processes.”

For example, in converting speech to text, it was thought that one should first analyze the audio of the speech, break it into phonemes, etc., but what was found is that it is usually better just to try to train the neural net on the “end-to-end problem”.

The Concept of Embeddings

An embedding is a way to try to represent the “essence” of something by an array of numbers with the property that “nearby things” are represented by nearby numbers.

How can we construct such an embedding?

Models for Human-Like Tasks

For ChatGPT, we have to make a model of human-language text of the kind produced by a human brain.

Neural Nets

A neural net is a connected collection of idealized “neurons”-usually arranged in layers-with a simple example being:

Source

Get in