How Software Delivery Really Changed at OpenAI

18 Feb 26

Benjamin Igna

mins

read

I love talking to engineers. Not the business-media-trained kind who give you polished soundbites, but the ones who just tell you how it is. Recently, I got to sit down with senior engineers at OpenAI at the Pragmatic Summit in San Francisco. What they described wasn’t some futuristic vision. It’s how they work right now. And honestly, if someone had told me this six years ago, I would have thought they were making it up. Codex has gone from being a helpful coding tool to something the team literally treats as a colleague. Engineers assign it work in their task tracker, expect it to deliver results overnight, and start their mornings reviewing what it produced. More than a million developers now use Codex weekly, and usage has grown 5x since January. Inside OpenAI, nearly every engineer uses it and the teams merge 70% more pull requests each week compared to a year ago. But the numbers are just the surface. What’s really interesting is how it’s changing the day-to-day work. Let me break it down...

Agents Are Teammates Now. Seriously.

This is the biggest mental shift. The engineers I spoke to don’t think of Codex as autocomplete or even as an “assistant.” They think of it as a team member. One leader described the progression in the last six months: tool → extension → agent →teammate. And they mean it literally. Engineers name their agents, assign them tasks in Linear, and fully expect them to deliver, just like any other person on the team. Each engineer now effectively has a team of four or five AI agents working alongside them in parallel. That reframes the whole cost discussion. Instead of thinking

“how many tokens am I burning,”

you start thinking

“what’s the salary of a teammate who works 24/7 and never calls in sick?”

When you do that math, the economics look very different.

There’s a neat parallel here: The OpenAI engineering team onboards Codex the same way they’d onboard a new hire. They feed it context about team norms, product principles, and coding conventions through agents.md files in the repo. One internal team reported that Codex wrote over 90% of its own application code. A small team of just three engineers merged roughly 1,500 pull requests in five months – about 3.5PRs per engineer per day.

The Morning Starts With a Code Review, Not Code

The Codex team routinely kicks off longer-running tasks at the end of the workday, feature implementations, bug fixes, data analysis and lets agents work on them overnight. The agents self-test before surfacing their results. When the engineer comes in the next morning, there’s a pull request waiting. Think about what that means for the work itself. You’re not spending your best morning hours grinding through boilerplate. You’re spending them on the high-judgment stuff: reviewing agent output, spotting edge cases, deciding what to build next, shaping the product. The creative and strategic work moves to the foreground. The mechanical work fades into the background. And it’s not just overnight runs. During meetings, the team’s data analyst kicks off Codex threads to answer ad-hoc questions in real time. Someone raises a question not covered by the dashboard, and within 20 minutes – still in the same meeting – an agent comes back with the answer. They routinely handle five or six questions this way in a single session. It’s like having a small army of consultants working behind the scenes while you talk.

Solve the Bottleneck, Then Solve the Next One

When code generation speeds up by 4–5x, coding stops being the bottleneck. What takes its place? Code reviews. Once you handle that, the bottleneck shifts again – to CI/CD, integration-testing, deployment. The Codex team has been living this cascade. They solve one constraint and immediately start working on the next. It’s genuinely exciting, as one of them put it, because you’re constantly tackling new problems instead of grinding on the same old ones. For engineering leaders, this is the key takeaway:

"...don’t just optimize for code output and call it a day. If you adopt AI coding tools but your code review process is already stretched, you’ll just create a bigger pile of unreviewed PRs. Plan for the cascade. Invest in review automation, better testing pipelines, and deployment tooling before the bottleneck hits."

New Grads Aren’t Getting Left Behind

One of the most common fears I hear is: what about junior engineers? If seniors can just spin up agents, what’s the point of hiring new grads? I asked about this directly. So OpenAI is actively hiring early-career engineers and running a robust internship program – about 100 newgrads are joining this summer. The leadership’s bet is that AI-native engineers, people who learn these tools from day one, will develop a fundamentally different kind of leverage. And the early signs are backing this up. They also mentioned a new grad who joined the Codex team six months ago and has been, in his words, “absolutely crushing it” – with more independent output than engineers with years of experience.

That said, everyone I spoke to was emphatic that fundamentals still matter. System design, product intuition, the ability to reason about trade-offs – none of that goes away. Vijaye Raji, CTO of Applications, who has 25 years in the industry, drew a parallel to every previous paradigm shift he’s seen:

"...from assembly to C++, from desktop to mobile, from server-side toJava Script. Every time, people worried that the new abstraction would make real skills obsolete. Every time, the people with strong foundations thrived."

Knowledge Sharing at the Speed of Discovery

At Open AI they're discovering these new workflows at roughly the same speed as everyone else, they just happen to get the new models a little sooner. So internal knowledge sharing is critical. They run regular hackathons, show-and-tells, and active Slack channels where teams share what’s working. One product manager on the Codex team organized a one-hour bug bash, then used Codex itself to collect feedback from participants, compile it into a structured doc, and file bug reports and feature requests. The whole loop from testing to documented feedback – happened in about an hour. For any organization trying to adopt AI tooling, this is a reminder: The technology alone isn’t enough. You need rituals and channels that let good ideas spread quickly. The teams that share early and often will pull ahead.

Software Delivery for The Rest Of Us

You don’t need to be OpenAI to start applying these patterns. Here’s what I’d take away:

🦞 Treat agents as team members, not autocomplete.

‍Give them real tasks, real context, and real expectations. Think about onboarding them the way you would a new hire.

🦞 Look for the next bottleneck before it hits.

‍It’s almost certainly in code review or deployment, not in writing code.

🦞 Don’t sideline your juniors.

‍Give them AI tools on day one. Their fresh perspective might be your biggest advantage.

🦞 Invest in knowledge-sharing rituals.

‍Hackathons, Slack channels, show-and-tells. Make sure good practices spread fast.

🦞 Rethink the cost question. ‍

Don’t ask how many tokens you’re using. Ask what the productivity of a 24/7 teammate is worth. Then look at your feature backlog and ask which items are now nearlyfree to implement.

‍

As Vijaye put it,

"...this moment feels different from the dot-com bubble, from mobile, from social – all of which he lived through. It’s happening on a massive scale and at a speed that’s hard to wrap your head around. And we get to be here for it."