atexthub

AlphaFold - The Single Most Important AI Breakthrough

To celebrate the 5th anniversary of #AlphaFold, I was invited by Google DeepMind to interview Nobel Prize Winner and Distinguished Scientist, John Jumper. Note that we have no business ties with them. Thank you so much to John for being so kind and insightful, and to the film crew as well - they all did an incredible job. AlphaFold: https://deepmind.google/science/alphafold/

Tue Dec 02 2025 - Written by: Two Minute Papers

“Oh, AlphaFold. It’s easy.” It almost felt too easy. It felt like too many ideas were working. It felt it was going up. And I remember talking to Tim, the engineering lead, going, “This is really feeling too easy. We’re having too much success. This problem can’t be this easy. Are we leaking the test set?” Right? You know, are we doing the classic machine learning sin?

A Conversation with John Jumper

Fellow scholars, I don’t really like to be on camera, but there is a big reason I am here today. You see, I met Nobel Prize winning chemist John Jumper last year, and we talked for an hour. And in that hour, I learned more than I thought I would learn in a year. It was unbelievable. And today, I have the opportunity to give you this amazing gift, too. So, with that said, hey, John.

Hello.

I’m really, really grateful to have you here today. I have goosebumps, which I have carefully hidden under this lab coat.

So what is AlphaFold and why is it important?

So AlphaFold is a neural network, which makes it relatively appropriate for this podcast, but it is a deep learning system that predicts the result of a specific scientific experiment. And to tell you about that, I should tell you about the domain that it’s in: proteins.

So proteins are the nanomachines that basically drive your cell. A couple thousand atoms each. They’re coded for by your DNA. When we say that DNA is an instruction manual for the cell, a lot of what it’s telling you is how and when to build proteins. And so three letters of your DNA map to individual one of 20 chemical groups. Those chemical groups are basically just like little collections of atoms, you know. And there’s a machine, another protein in the body that reads the DNA in a relatively complicated process and kind of builds out the proteins one step at a time, joining links in a chain or a rope. So it takes this chemical group, attaches that one, attaches that one, attaches that one, basically the same way each time, and builds out a string of maybe 300 of these is a reasonably typical length.

And then what happens when your cell builds this thing is of course it’s not a machine. Most of them are not machines that function just as floppy ropes. Some parts of it are greasy. Some parts of it are positively charged. Some parts are negative. So it will fold up. It will make helices. It will make sheets. It will pack into a relatively compact 3D object that is kind of the assembled working machine. So these are machines that build themselves, joined in 1D. And of course our DNA is 1D and our world is 3D. So this is kind of how the body solves this. It builds these things. They fold up into this incredibly intricate shape. And this happens for about 20,000 human proteins. There are hundreds of millions, billions known proteins across all organisms.

And one part of what I described is really, really easy to measure. It’s really easy to read our DNA thanks to the genomics revolution. You can think of it as pennies to read the sequence of a protein, the DNA that becomes the linked amino acids. It takes a year to get the structure of a protein with really hard experiments and they often fail; it’s just extraordinarily difficult. If you want to put an economic value on it, maybe $100,000.

So scientists do this experiment where they start from DNA, but they really want to understand how this machine works. So they need to see a picture of it. And so they determine the structure experimentally. They use enormous synchrotrons the size of small villages in order to do this. People have done it a lot. There’s been enormous societal investment because it’s really important to understand this to understand disease, to do drug development. There are about now 200,000 known protein structures, about 140,000 when we did AlphaFold.

And we developed a deep learning system that goes from amino acid sequence, DNA sequence, to the structure of a protein in five or 10 minutes instead of a year, and does this with accuracy close to—not quite as good but very close to—experimental accuracy, and it’s been used enormously. So we’ve predicted the structure of about 200 million proteins. Every protein from an organism whose full genome has been sequenced. Scientists are using it for drug development, to understand the body, everything else.

And I think from a machine learning point of view, it’s both kind of the first problem really, really transformed by AI. It’s an extraordinarily practical system that scientists are using. I think it’s something like three million scientists have used our database of predictions. People make predictions every day with this. And it’s also this kind of promise that we’re going to use AI not just to do things that humans can do or solve human problems, but to do kind of superhuman level. There are no humans that are good at getting the structure of a protein by eye. They do it with experiment. That we can use this to transform science, that we can build new tools that fundamentally advance our science.

The Feeling of Success

Now, I remember asking you last year, how did it feel when it first started working?

The time I really remember is when it first started, or really what would happen is, AlphaFold is built iteratively. It’s not yesterday we didn’t have AlphaFold, today we did. It was maybe two years and probably 30, 40 different kind of individual ideas that worked along the way; some grand ideas, some small ideas, but each one kind of inching up the performance. And I remember maybe a year into building AlphaFold 2, the one that was really very successful.

It almost felt too easy. It felt like too many of ideas were working. It felt it was going up. And I remember talking to Tim, the engineering lead, going, “This is really feeling too easy. We’re having too much success. This problem can’t be this easy. Are we leaking the test set?” Right? You know, are we doing the classic machine learning sin?

And he was sitting there going, “I don’t think we are.” And we went back, we double checked, we zeroed coordinates in our eval set to make sure we weren’t actually leaking. We couldn’t really ever find a leak, but it felt too easy. It felt like nature shouldn’t yield this easily to our efforts.

And I remember I wasn’t really totally sure until actually we did some structure predictions for SARS-CoV-2 proteins related to COVID, and then the experiment came out afterwards. That we were really, really sure, okay, we were really not leaking anything. But it was wild.

Wow, that’s crazy. But that is also the hallmark of a pro-scientist, you know, because during a research project, you miss a thousand balls and when you finally hit one, you know, you don’t ask questions, you celebrate. But that’s not what you did, you know, you picked apart the performance immediately instead. So that’s amazing. I mean, pro athletes, when they miss, they’re always interrogating, fixing, thinking. Like you, this is a craft; machine learning is a craft and you have to be a craft person to do it.

Mhm.

The Nature of Progress

All right. Now, the score didn’t jump from zero to 100 in just one magic trick. So this was a sum of many brilliant little puzzle pieces and each of these contribute a little to the score. You add another puzzle piece, you get another few points. And what I’m wondering is, this sounds like linear progress. You know, you’re climbing step by step. So why is it so surprising when you get to the peak?

So you know, much like Moore’s law was a successive succession of ideas and breakthroughs that in total gave the appearance of inevitability. And that inevitability in the case of Moore’s law was driven by exponential growth and investment as well. When you do this, you never know if you’re going to get the next win, right? In fact, we have charts of progress and they don’t actually go like this, right? The ideas that we list maybe go like that, but the actual progress went flat, flat, flat. Oh, what about this idea? Idea, idea, idea. Flat, flat, flat, flat. Idea, idea, idea.

And in fact, the flat versus up… At the time, DeepMind was kind of on six-month cycles. So every six months you formally continued your project and you presented your results to the whole company. And I remember the first three months we would always try our wildest ideas and it would mostly not work and we would get very scared. And then about halfway through we’d like, “Okay guys, we got to get serious, we need to not have no progress.” And then suddenly some idea would hit and then a bunch of ideas would hit. So it was always alternation of elation and terror. It’s only when you make it really blurry and you squint, you zoom out: oh, it went up linearly. Yeah, it’s like overnight successes 10 years in the making, right?

Yeah, it’s that sort of thing.

Intuition and Surprises

All right. Can you build an intuition on what proteins would look like when folded up into a 3D structure? And also, did you have a protein structure where you looked at the 3D result and said that cannot be right and it turned out to be right? Does that happen?

Okay, I’ll tell two stories on this. I mean, an intuition—you mean can I build an intuition not on an individual protein? So sometimes you can say, oh, this looks really similar to this other protein and therefore I bet it’s going to have about similar structure. So that’s like what humans can do and that’s what people call homology modeling. It’s a very fancy name for saying well the sequence is similar probably the structure is similar. So you can do that. And sometimes you can notice individual motifs. There were all these papers that would list all these motifs. Like helices are a very common element in proteins. And I remember a paper on, well, the last element of a helix is going to be one of these three amino acids, the one before that’s going to be some… you know, so there’s some regularities and human rules that they’ve cataloged and you can kind of use that. But ultimately it only works a little bit and doesn’t give you the kind of precision you need to do drug development at all.

In terms of things that actually surprised… actually a real surprise came from machine learning. I shouldn’t have been surprised but I was. I mean, there were two big surprises. One was sometimes we would have proteins with giant voided cavities in the middle, or a protein that was like C-shaped. And you know, the atoms in proteins are really up against each other; it’s a very dense object. And I said it doesn’t look right. But the model was extremely confident. And we looked in the experimental structure and then immediately realized what would happen.

So, AlphaFold 2, the original AlphaFold 2, was trained only on single proteins. But often when a protein is solved, sometimes multiple copies of itself will appear, what’s called a homomer. So maybe three copies actually sometimes densely intertwine with each other to make the actual folded thing; it’s not one copy, it’s the three copies together, a trimer. Or there would be some other protein of a completely different type that it would wrap around, and they only appear together in the body.

And sometimes AlphaFold would realize these patterns and leave these giant voids that look totally wrong, or this spiral which is just floating in air, and I’d be like, “Well that’s wrong.” But it’s extraordinarily confident. And then I would find out, oh, it realized that in fact this protein comes in three copies and so this spiral is one third of that, and if you overlay it it’s perfect. So that even though we didn’t tell AlphaFold about this context, it had learned rules that sometimes there are these geometric patterns, which I can explain.

I think the other big surprise was actually when we ran AlphaFold across random proteins in humans. We would see some bits that looked beautiful and structured and some really ugly, long, arcing ribbons. “Oh no, that’s wrong.” And I remember we looked at that and we wouldn’t see this very much when we predicted proteins that were experimentally solved. We said, “Oh no, are proteins that have been experimentally solved more or special, and actually AlphaFold isn’t good on the things we hadn’t solved?”

And then Katherine on the team, a little later that day, looks in this UniProt database of various experimental facts about proteins, which will tell you certain regions that are known, for example experimentally, to be disordered. And she starts to realize that where AlphaFold is making these ridiculous, long, arcing predictions that can’t possibly be correct—and they aren’t, proteins—it was very low confidence, and those regions were disordered. And what AlphaFold was in fact telling us is this region doesn’t have a structure, kind of implicitly. So what we found out is that the lowest AlphaFold confidence was actually pretty much a state-of-the-art predictor of whether a protein was disordered. And so we would find all these things that we kind of knew about proteins but we didn’t feel, because disorder doesn’t appear in this database of protein structures. We would find all these things out just kind of looking at AlphaFold and being surprised.

Mhm. Amazing. Amazing.

Favorite Applications

Now I’ll not ask what its most impactful application is because it has now hundreds of thousands of research works building on it in just about five years, which is unbelievable. So which one is your favorite?

I think I have two favorites. One was this giant protein complex, hundreds of protein chains, called the nuclear pore. The nuclear pore is actually the giant gates for the nucleus. The nucleus stores your DNA, right? It’s where your nucleic material is and the rest of the cell is outside the nucleus. And so you need a gatekeeper that decides who can enter and leave and kind of opens and contracts.

And I remember thinking, you know, this is enormous. AlphaFold does… it’s a thousand times bigger than what AlphaFold can do. So we, you know, maybe later we’ll come up with some machine learning that will help with these kind of problems. And then this paper comes out. The first one I saw was out of the Beck lab saying we solve the structure of the nuclear pore that we knew something like 30% of before. Now we know 60, 70%, and a lot of the rest is actually disordered, because we combined very low resolution experimental techniques, cryo-ET, with AlphaFold for the individual pieces and running different AlphaFolds and finding all the little kind of joins and compartments, and then we could finally build the model of the nuclear pore.

And in fact, that and some very related papers were a special issue of Science all about the structure of the nuclear pore, and three out of the four made huge use of AlphaFold. I remember searching through these papers and maybe 150 mentions of the word AlphaFold in work that we didn’t do. That all we did was make the software tool that scientists use to make amazing discoveries. And I just felt like, you know, the Nobel is extraordinary. And now I’m waiting for the Nobel of someone who used AlphaFold and their own creativity to discover the next thing.

Yeah. The second order Nobel is the one that I can’t wait for.

And I think the other one was people discovered all these uses of AlphaFold that we didn’t expect to really work. So they would run thousands and thousands of AlphaFold predictions and just see which one that AlphaFold liked.

So the one I really loved, there was a paper on fertilization. How does egg and sperm come together? And there are proteins on egg and there are proteins on sperm that kind of join together and they recognize each other and they start fertilization. But it was known that there was a protein in humans that was missing; something didn’t make sense. And there were actually two labs that did this. They took this protein on the egg and 2,000 proteins, every one that appears on the surface of sperm, and just ran 2,000 AlphaFold predictions. And they found one specific protein that AlphaFold thought stuck up against this egg protein.

And then they go to the lab and they say, knock this protein out, and egg and sperm will come together but not start fertilization. They’ll make mutations in the individual regions in which these come together and they’ll find out that blocks fertilization. So they’ve established biochemistry now. This thing, they had no idea which of these 2,000 to look at, and AlphaFold said look at just this one. And sure enough, that was the protein that was essential in this. And I love this notion that we would never do this with experiment; you would never send out 2,000 labs to make 2,000 structures and see which one comes back, that we can do new types of science because of the scale we’ve achieved.

Yeah. Incredible. Any unexpected use cases?

Unexpected Strengths and Weaknesses

All right. So one that really surprised me—I can tell you an unexpected weakness and then an unexpected strength of AlphaFold.

So the unexpected weakness is if you take a protein and you break it, you do something that’s going to cause it to be unstable. Like one very strong rule of proteins is that positively or negatively charged amino acids don’t appear in the greasy middle part of a protein, right? They don’t like grease. And so aspartic acid is a very small charged amino acid; it doesn’t really appear in the center of proteins. So if you take a protein and you mutate one of the inner amino acids to an aspartate, AlphaFold won’t really change its structure. Even though this doesn’t make sense, and there’s reasons you can explain it, we say AlphaFold is not extremely point mutation sensitive. It’s answering a slightly different question. So we said okay that’s some future work.

And so there are a lot of people who do protein design and were using AlphaFold to check their designs and say which ones does their design method work? Does it produce sequences that AlphaFold thinks folded to the structure they were trying to make? And I remember thinking that’s probably not going to work because AlphaFold isn’t mutation sensitive. It doesn’t have a sensitive enough understanding of the interactions. But I was totally wrong about that. And people found that it was actually really good when it came to designing proteins at figuring out which ones might work.

One paper that came out a few months after AlphaFold said that when designing proteins to bind to each other, they get a tenfold increase in success rate if they only make the things that AlphaFold thinks binds. And it’s become really dominant actually, that AlphaFold filtering is one of the secrets of modern protein design. Even though we tried, we were designing a natural protein system, we got kind of this enormous design improvement for free.

Mhm.

The Future Impact

Now, just to showcase the influence of AlphaFold, in my opinion… let me hold on to my papers for this one to make sure I word this properly.

Oh, yeah.

In 20 years, nearly every person with access to modern healthcare will benefit from a tool, diagnostic or drug influenced by AlphaFold. What do you think?

I think that’s pretty fair. I think that it is now a tool of modern biology. And I will say that there are other tools. Every biological discovery today in some way benefits from DNA sequencing, right? DNA synthesis, right? These are tools that underpin the kind of technology of modern biology, and AlphaFold is very certainly one of those. People teach it to grad students; it’s a standard part of the graduate curriculum. “We will learn how to do some things and I will show you how to use AlphaFold because you will probably use it in your research.”

And then people make all these discoveries and these discoveries compound and grow. That’s the wonderful part of working in research is that you have this enormous spreading out of the work you do. It’s not just… it’s wonderful. I think sometimes it’s wonderful to be a doctor, to be someone who very definitely and obviously decides the right treatment for a patient and make someone healthy. But I also love the thought of being a researcher, that I can build a tool that will help a hundred thousand people, that will help a million, that will help a billion be healthy in the fullness of time as it helps bring forward science. You know, I like to think that AlphaFold maybe made structural biology, which is one of the major fields of biology, five or 10% faster, right? And that’s extraordinary.

Confidence and Being Wrong

Is it a possibility that, you know, AlphaFold gives you a confidence score too, not just a prediction. Can it be confidently incorrect?

Yes. A very simple analogy: if the weather report says there’s a 90% chance of rain today and it doesn’t rain, was it wrong? Some people will say yes, but that’s not obviously correct. You’re supposed to be wrong one time in 10. So we can say AlphaFold’s confidence is calibrated. So that’s what we can really say is that, you know, average accuracy is 0.9 on a certain scale called LDDT. So if our confidence says 0.9 then on average it will be there, but some of them will be very bad.

And actually we know a very interesting failure mode of very high confidence. Sometimes it’s just wrong. But more commonly, for example, a protein will have two structures and AlphaFold will produce one with high confidence, but you really wanted the other one. And so confidence more reflects, does this structure make sense as one state of the protein, but it doesn’t necessarily say it’s every state of the protein or the one you care about.

Lightning Round

All right, let’s have a lightning round. I ask you something and try to answer in one sentence.

Oh, that’s hard for me.

How did AlphaFold 2 improve on the first one?

We did machine learning research at the intersections of protein and ML, not taking ML off the shelf and applying it to proteins.

AlphaFold 3?

We expanded it to do the protein cinematic universe and we adjusted the architecture to make it work.

AlphaProto?

It developed new techniques to design more efficiently using AlphaFold and other ideas.

Favorite two-minute papers episode?

Oh, AlphaFold. It’s easy. Kidding. Yes.

Closing

All right, John. I’ve learned so much again. Huge honor. Thank you so much.

It was a pleasure. Thank you.

You Might Also Like

Building the PERFECT Linux PC with Linus Torvalds
Linus Tech Tips

Building the PERFECT Linux PC with Linus Torvalds

It is finally here, the computer build you have (and possibly the whole world) been waiting for. The Linus Tech Tips and Linus Torvalds Collab PC build! Linus Torvalds talks through Linux development, parts selection, and even gives a glimpse into some cool projects he works on in his spare time. This project was made with a lot of hard work from our team and of course Linus Torvalds generous time. Discuss on the forum: https://linustechtips.com/topic/1627666-building-the-perfect-linux-pc-with-linus-torvalds/ Check out the parts from the build: AMD Ryzen Threadripper 9960X: https://geni.us/dNscax GIGABYTE TRX50 AERO D Motherboard: https://geni.us/Oj7y3Ax Samsung SSD 9100 PRO 2TB SSD: https://geni.us/iaGudc9 Noctua NH-U14S TR5-SP6 Cooler: https://geni.us/BqA5IF Intel Arc B580 GPU: https://geni.us/NoCqABH Fractal Design Torrent E-ATX Case: https://geni.us/FpyaBB Seasonic PRIME TX-1600 1600W 80+ Titanium PSU: https://geni.us/ghd9iU ASUS ProArt Display PA32QCV 31.5" 6K HDR Monitor: https://geni.us/YHAk

99% of Beginners Don't Know the Basics of AI
Jeff Su

99% of Beginners Don't Know the Basics of AI

Curious about #AI but don't know where to start? In this article, I break down 5 key takeaways from Google's AI Essentials course for beginners, share the pros and cons, and help you decide if this certification is worth your time.

The End of Nodes in Blender?
Ducky 3D

The End of Nodes in Blender?

Blender 5.0 has released a few new modifiers like the array, instance on elements, and scatter on surface. These new modifiers make complex animations possible without having to start learning geometry nodes. Motion graphics in Blender is heavily reliant on geometry nodes, but with these new features, you can make some cool animations without nodes. Is this the end of nodes in Blender? Definitely not, but it definitely makes us less reliant on them when starting out.