AI Software Engineering is here, and it’s fundamentally reshaping how we build, test, and run software. The journey from assistive code completion tools to autonomous coding agents marks a fundamental paradigm shift, and navigating it is the key challenge for engineers and engineering leaders today.
It’s a noisy space, filled with both incredible hype, and the tangible, early success that promises significant value for the future. For leaders, the challenge is in separating one from the other.
In this post we detail our framework for navigating this change. Before we go into this, let’s highlight a few key learnings:
- Software engineering is changing. As engineers employ multiple agentic coders, less time will be spent writing and debugging code, more time will be spent understanding the domain, designing end-to-end solutions, orchestrating work, and reviewing generated solutions.
- We are seeing that engineers who adopt these tools are moving faster than engineers who do not. The ability to adapt to this new way of working going forward is essential.
- Engineering skills are in high demand as always. As an organisation we always want to achieve more than we can, given staffing limitations. These agentic tools can increase productivity, but only when used by a skilled software engineer who knows when to use them (and when not to), and how to validate the output for correctness, quality, scale, and to drive them effectively.
Let’s explore this evolution and what it means for the future of software engineering.
The AI Powered Engineering Maturity Framework
In late 2024, we (The Commonwealth Bank of Australia) started investing in a dedicated AI Powered Software Engineering initiative. The bank had previously rolled out GitHub Copilot to all engineers, with mixed results in utilisation and efficacy. At this stage, Copilot was simply an AI powered auto-complete engine. We knew this was going to be an evolving space, and we needed to understand where we were, and where the industry was headed.
Hence, we developed the following maturity framework, tracking the evolution of AI agents for software engineering. The design goals of this framework were to:
1. Create a common language for us to use which can track not only our internal adoption, but the industry as a whole.
2. Clearly separate the pragmatic steps we could take to achieve real value today (yellow), from the experimental possibilities of the future (blue).
Level 1: Code Completion & Chat
At this foundational level, AI serves primarily as an assistant to human engineers, offering predictive auto-completion of code and chat functionality. These tools enhance developer productivity by suggesting code snippets based on context. The engineer remains firmly in control, using AI as a predictive coding aid.
Level 2: Human Directed Local Agents
This level represents a significant step forward. Here, engineers direct AI agents in the IDE on their local machines, to autonomously design, write, test, and debug code using multi-step workflows.
Tools such as Aider, Cursor, Windsurf, Cline, Roo Code, Claude Code, and GitHub Copilot Agent enable more complex interactions, with the AI handling larger chunks of the development process while still operating under human direction.
These agents can use MCP (Model Context Protocol) tools to open the browser, commit to GitHub, pull down Jira stories, talk to Datadog, and more. They also integrate with RAGs (Retrieval-Augmented Generation systems) to connect to your proprietary data sources and company context, meaning they will continue to perform better the more effort you put into connecting them to your ecosystem.
These tools generally come in two flavours — CLI or IDE based, and two pricing models — an API usage subscription model or Bring-your-own API. In our testing, we have found that the tools that allow you to use your own API generally perform better than those which bundle API pricing. This is expected because they do not prioritise restricting context size and token usage. The downside to these tools is that they can be much more expensive.
Level 3: Human Supervised Remote Agents
At Level 3, we see AI agents running in the cloud, capable of being triggered by workflow events such as a new bug report or feature request. These agents can design and build pull requests for human review without interaction. GitHub Copilot Coding Agent (previously Padawan), and Open AI’s Codex represent this evolving category, where the AI begins to operate with a degree of independence while humans maintain supervisory control.
A leap in value can be seen here once multiple agents are performing tasks in parallel, whilst the engineer is free to work on other tasks.
Level 4: Autonomous Engineer
This level introduces the fully automated Autonomous Engineer, an integrated member of the development team. These systems can originate work based on product definitions and roadmaps, joining agile ceremonies and communication channels like Slack. They can contribute work without human supervision. It’s not currently possible to achieve this without human supervision or direction, so we consider this as experimental.
Level 5: Autonomous Teams
The final level represents multiple autonomous agents collaborating on projects, each specialising in different aspects of software delivery. This level represents a fundamental rethinking of software development, with agentic AI teams delivering alongside human teams. As above, given current tech, it’s not possible to achieve this level without human supervision and direction, so we consider it experimental.
Build-Time Agents vs Run-Time Agents
It’s important to make the distinction between agents that generate static, deterministic code for production use, which we call “build-time agents”, and agents that run directly in production to solve customer problems, which we call “run-time agents”.
Run-time agents can provide solutions to problems that we could not readily solve before, such as dynamic customer service bots, real-time transcription and summarisation, intelligent document analysis, etc.
That said, there is significant hype regarding run-time agentic systems. The idea here is to plug an agent into your databases, APIs, and dynamic UI, as a dynamic replacement for traditional code. The implementation and governance of this is difficult and the value is yet to be seen at scale.
I personally do not advocate for this approach for the majority of problem spaces. Given most of the problems we solve as engineers are deterministic, we shouldn’t rush to replace code with run-time agents. Instead, significant increases in end-to-end delivery can be realised by replacing manual delivery with AI generated product and engineering specs, code, and tests written by build-time agents.
Why do I think that build-time agents and static code should be the answer for most of what we do?
- Cost — GPUs and models are expensive to run, code running in containers is cheap and efficient.
- Deterministic — If you run code you don’t need to worry about hallucinations or the code doing something that you didn’t expect it to. You also don’t need to build supervisors or governance processes; the code just does what it is supposed to do.
- Performant — Code is fast. No matter what we do, AI agents will always be slower to solve a deterministic problem than a coded algorithm.
- Existing Practices — Why throw the baby out with the bath water? Build-time agents help us to improve our existing development practices. Run-time agents require a complete rethink of how we build, test, govern, and scale technology safely
At the bank, our strategy is two-fold. We are rapidly adopting build-time agents and new delivery workflows in our teams. We are also developing run-time agents in a select number of cases where Gen AI can solve these unique problems.
The Reality of Working with Build-Time AI Agents
While the productivity gains are real, working with build-time AI agents isn’t always smooth sailing. There’s a learning curve, and in practice I see it takes several weeks for engineers to figure out how to prompt effectively and determine when to trust the output versus when to step in.
The agents excel at certain types of work such as boilerplate code, test generation, refactoring, and implementing well-defined features. They can get stuck however on ambiguous requirements or complex business logic. I’ve watched engineers spend more time trying to explain a nuanced problem to an agent than it would have taken to just code it themselves. The key is recognising these situations quickly and knowing when to switch approaches.
In saying that, we have found that it is best for a leading team in each domain (or project) to first build rules and agent modes/configurations to clearly direct and constrain the agent, along with connecting the appropriate knowledge bases (RAG) and tools (MCP) so that the rest of the team can hit the ground running, without facing the same roadblocks.
We are seeing that the developer workflow is changing. Instead of thinking “How do I code this?” you start thinking “How do I explain this problem clearly enough for the agent to solve it?” It’s almost like having a very capable junior-level developer who needs precise instructions but can execute at superhuman speed once they understand the task. The engineers who’ve adapted best treat it as a collaborative partner rather than a magic code generator. They have learned to iterate on prompts, validate outputs carefully, and maintain the same engineering rigor they’d apply to any code review.
The frustration usually comes when engineers expect the agent to read their mind or handle poorly defined requirements. But when the problem is clear and the constraints are well-articulated, watching these tools work feels genuinely transformative.
What does this mean for software engineers?
As you can see, the progression of stages in our framework leads to ever more automated coding practices, requiring less human involvement. This progression raises profound questions about the future role of software engineers. Rather than writing code, will engineers increasingly shift towards focusing on problem definition, domain expertise, system architecture, and quality assurance?
At this stage the answer is both yes and no. These tools are productivity boosters when used in the right hands for the right purpose. It takes time to get to know how and when to use these tools effectively. Regardless of the progress in this space, I firmly believe that there will always be a need for experienced software engineers to develop, train, validate and optimise these agentic coding systems, along with all of the other things we do as software engineers including defining the problem, solution, system design for scale, refactoring given new requirements, quality assurance, and production monitoring and analysis.
As anyone who has used ChatGPT can attest, it is very confident, even when it is incorrect. You need the domain and technical expertise to ensure that the output is correct, and the code scalable and of the right quality.
That said, it is clear that software engineering is evolving. We expect to see less manual coding and more prompt-driven development (not just vibing), and more operational work automated so we can focus on customer value. In this future, perhaps code generated by agents will be considered the “new machine code”, where engineers care less about the specific implementation details, and more about the end-goal of delivering customer value?
All I can say for certain is that it’s an exciting time to be a software engineer, and the engineers who adapt to using the new tools will be in higher demand than ever.
CommBank publishes our Engineering Job and Skills framework here — we will be writing a longer post in the future about the evolution of the Software Engineering role and how we see it evolve with the introduction of Gen AI tools, but we remain convinced that Software Engineers will continue to be at the heart of designing and building new technology.
What does this mean for leaders?
This level of change isn’t just a tools problem; it’s a people, process, and culture problem. As leaders we need to think about how these changes impact our organisation from the perspectives of team structures, performance management, risk & governance, and training & up-skilling.
We must be mindful of the fact that there is a barrier to entry, and there will be hiccups along the way of adopting these tools. Allocating the time and providing trust & safety to teams experimenting with them is key.
It’s also important to note that as capabilities advance, the organisational change becomes immense. We need to be careful that we are validating each step for disruption compared to efficiency, and putting in place plans to ensure our organisation can adopt these new development workflows without too much disruption to current delivery.
Adoption — Where are we?
At the bank, our engineers are choosing to use GitHub Copilot Agent, Roo Code, Cline along with GitHub Copilot for autocomplete. The engineers who use these tools are showing up to 3x increase in the number of merged pull-requests, compared to the cohort who have not adopted these tools. Whilst these numbers are imperfect, they do provide some insight into the efficiency gains that we might expect as we further increase utilisation across the group.
An increase in coding output will then put immense pressure on code review and delivery processes, which also need to be updated to support an increase in delivery velocity. We are embarking on solving some of these problems in our organisation next.
What’s next?
When I started writing this post, GitHub Copilot Padawan was in pre-preview, and this was the only major party releasing Level 3 tools. A few weeks later we now have GitHub Copilot Coding Agent, OpenAI’s Codex, Google’s Jules and a few others.
Over the next 6 months we’ll be trialling and adopting these Level 3 tools in our organisation. Level 2 tools will continue to evolve and compete with each other until they are all relatively similar, meaning it will come down to the best pricing for the underlying models.
My prediction is that by this time next year, Level 3 agents will be performing most of the bugfix, patching and work we define as “engineering toil”, managed and supervised by human engineers, and the most sought-after engineering skill will not be proficiency in a programming language, but the demonstrated ability to understand the domain, architect complex prompts and workflows, and manage coding agents to solve business problems.
We continue to watch this space closely and are ready to adapt as the industry evolves.
The following sources were used as research for this article: https://research.aimultiple.com/ai-agent-tools/, https://prompt.16x.engineer/blog/ai-coding-l1-l5, https://jellyfish.co/blog/best-ai-coding-tools/, https://blog.n8n.io/best-ai-for-coding/
I hope that this has been an interesting read and a useful guide for you if you are new to the space. The field of AI-powered software engineering is rapidly evolving, and staying informed about these developments is crucial for any software professional looking to remain relevant and effective.
If you want to find out more about working as an AI Engineer at CommBank, feel free to message me on LinkedIn.
Brent McKendrick,
Distinguished Engineer — The Commonwealth Bank
Brent is a Distinguished Engineer at The Commonwealth Bank of Australia, where he is responsible for leading the AI Powered Engineering initiative, and modernising the group’s engineering platforms & practices. Brent previously led Checkout Engineering at Uber, and lead engineering at Zip Co from seed to unicorn status.