“o3” by Zach Stein-Perlman
EA Forum Podcast (All audio) - En podcast av EA Forum Team - Torsdagar
![](https://is1-ssl.mzstatic.com/image/thumb/Podcasts126/v4/f7/3f/12/f73f1299-dc7f-f763-5dec-60470a421548/mza_285343761639087474.jpg/300x300bb-75.jpg)
Kategorier:
See livestream, site, OpenAI thread, Nat McAleese thread. OpenAI announced (but isn't yet releasing) o3 and o3-mini (skipping o2 because of telecom company O2's trademark). "We plan to deploy these models early next year" (source). "o3 is powered by further scaling up RL beyond o1" (source); I don't know whether it's a new base model. o3 gets 25% on FrontierMath, smashing the previous SoTA. (These are really hard math problems.) Wow. (The dark blue bar, about 7%, is presumably one-attempt; unfortunately OpenAI didn't say what the light blue bar is, but I think it doesn't really matter and the 25% is for real.[1]) o3 also is easily SoTA on SWE-bench Verified and Codeforces. It's also easily SoTA on ARC-AGI, after doing RL on the public ARC-AGI problems + when spending $4,000 per task on inference (!).[2] OpenAI has a "new alignment strategy"; looks like Constitutional AI (and just about [...] The original text contained 4 footnotes which were omitted from this narration. The original text contained 6 images which were described by AI. --- First published: December 20th, 2024 Source: https://forum.effectivealtruism.org/posts/aNdg7ctFP9zFcowNd/o3 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.