人工智能软件工程的演变

人工智能软件工程的演变
The Evolution of AI Software Engineering

原始链接: https://medium.com/commbank-technology/the-evolution-of-ai-software-engineering-75a8a5a02c14

## 人工智能正在重塑软件工程：成熟度框架人工智能正在迅速改变软件开发，从简单的代码补全发展到强大的自主代理。这种转变给工程师和领导者带来了机遇和挑战。关键在于了解该技术目前的水平并为它的发展做好准备。澳大利亚联邦银行开发了一个五级成熟度框架来应对这种变化：**第一级**关注人工智能作为编码助手（如自动完成）；**第二级**利用工程师指导的人工智能代理执行编码和测试等任务；**第三级**使用基于云的代理，由工作流程触发，用于自动生成拉取请求；**第四级和第五级**设想完全自主的工程师和团队——目前仍在实验中。重点应该放在生成代码的“构建时代理”上，而不是由于成本、稳定性和性能优势，替换现有代码的“运行时代理”上。成功的采用需要熟练的工程师，他们能够有效地提示代理、验证输出并将它们集成到现有工作流程中。工程师将越来越关注问题定义、系统架构和质量保证，而人工智能将处理更多的编码任务。领导者必须解决支持这种发展所需的人员、流程和文化变革，优先进行培训和实验。早期结果表明，采用这些工具的团队的生产力显著提高——合并的拉取请求数量增加高达 3 倍。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录 AI 软件工程的演变 (medium.com/commbank-technology) 15 分，ghuntley 发表于 23 小时前 | 隐藏 | 过去 | 收藏 | 1 条评论 thrown-0825 发表于 19 小时前 [–] 这篇文章过度使用“工程”这个词，削弱了它的含义。回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

AI Software Engineering is here, and it’s fundamentally reshaping how we build, test, and run software. The journey from assistive code completion tools to autonomous coding agents marks a fundamental paradigm shift, and navigating it is the key challenge for engineers and engineering leaders today.

It’s a noisy space, filled with both incredible hype, and the tangible, early success that promises significant value for the future. For leaders, the challenge is in separating one from the other.

In this post we detail our framework for navigating this change. Before we go into this, let’s highlight a few key learnings:

Software engineering is changing. As engineers employ multiple agentic coders, less time will be spent writing and debugging code, more time will be spent understanding the domain, designing end-to-end solutions, orchestrating work, and reviewing generated solutions.
We are seeing that engineers who adopt these tools are moving faster than engineers who do not. The ability to adapt to this new way of working going forward is essential.
Engineering skills are in high demand as always. As an organisation we always want to achieve more than we can, given staffing limitations. These agentic tools can increase productivity, but only when used by a skilled software engineer who knows when to use them (and when not to), and how to validate the output for correctness, quality, scale, and to drive them effectively.

Let’s explore this evolution and what it means for the future of software engineering.

The AI Powered Engineering Maturity Framework

In late 2024, we (The Commonwealth Bank of Australia) started investing in a dedicated AI Powered Software Engineering initiative. The bank had previously rolled out GitHub Copilot to all engineers, with mixed results in utilisation and efficacy. At this stage, Copilot was simply an AI powered auto-complete engine. We knew this was going to be an evolving space, and we needed to understand where we were, and where the industry was headed.

Hence, we developed the following maturity framework, tracking the evolution of AI agents for software engineering. The design goals of this framework were to:

1. Create a common language for us to use which can track not only our internal adoption, but the industry as a whole.

2. Clearly separate the pragmatic steps we could take to achieve real value today (yellow), from the experimental possibilities of the future (blue).

Press enter or click to view image in full size

Level 1: Code Completion & Chat

At this foundational level, AI serves primarily as an assistant to human engineers, offering predictive auto-completion of code and chat functionality. These tools enhance developer productivity by suggesting code snippets based on context. The engineer remains firmly in control, using AI as a predictive coding aid.

Level 2: Human Directed Local Agents

This level represents a significant step forward. Here, engineers direct AI agents in the IDE on their local machines, to autonomously design, write, test, and debug code using multi-step workflows.

Tools such as Aider, Cursor, Windsurf, Cline, Roo Code, Claude Code, and GitHub Copilot Agent enable more complex interactions, with the AI handling larger chunks of the development process while still operating under human direction.

These agents can use MCP (Model Context Protocol) tools to open the browser, commit to GitHub, pull down Jira stories, talk to Datadog, and more. They also integrate with RAGs (Retrieval-Augmented Generation systems) to connect to your proprietary data sources and company context, meaning they will continue to perform better the more effort you put into connecting them to your ecosystem.

These tools generally come in two flavours — CLI or IDE based, and two pricing models — an API usage subscription model or Bring-your-own API. In our testing, we have found that the tools that allow you to use your own API generally perform better than those which bundle API pricing. This is expected because they do not prioritise restricting context size and token usage. The downside to these tools is that they can be much more expensive.

Level 3: Human Supervised Remote Agents

At Level 3, we see AI agents running in the cloud, capable of being triggered by workflow events such as a new bug report or feature request. These agents can design and build pull requests for human review without interaction. GitHub Copilot Coding Agent (previously Padawan), and Open AI’s Codex represent this evolving category, where the AI begins to operate with a degree of independence while humans maintain supervisory control.

A leap in value can be seen here once multiple agents are performing tasks in parallel, whilst the engineer is free to work on other tasks.

Level 4: Autonomous Engineer

This level introduces the fully automated Autonomous Engineer, an integrated member of the development team. These systems can originate work based on product definitions and roadmaps, joining agile ceremonies and communication channels like Slack. They can contribute work without human supervision. It’s not currently possible to achieve this without human supervision or direction, so we consider this as experimental.

Level 5: Autonomous Teams

The final level represents multiple autonomous agents collaborating on projects, each specialising in different aspects of software delivery. This level represents a fundamental rethinking of software development, with agentic AI teams delivering alongside human teams. As above, given current tech, it’s not possible to achieve this level without human supervision and direction, so we consider it experimental.

Build-Time Agents vs Run-Time Agents

It’s important to make the distinction between agents that generate static, deterministic code for production use, which we call “build-time agents”, and agents that run directly in production to solve customer problems, which we call “run-time agents”.

Run-time agents can provide solutions to problems that we could not readily solve before, such as dynamic customer service bots, real-time transcription and summarisation, intelligent document analysis, etc.

That said, there is significant hype regarding run-time agentic systems. The idea here is to plug an agent into your databases, APIs, and dynamic UI, as a dynamic replacement for traditional code. The implementation and governance of this is difficult and the value is yet to be seen at scale.

I personally do not advocate for this approach for the majority of problem spaces. Given most of the problems we solve as engineers are deterministic, we shouldn’t rush to replace code with run-time agents. Instead, significant increases in end-to-end delivery can be realised by replacing manual delivery with AI generated product and engineering specs, code, and tests written by build-time agents.

Why do I think that build-time agents and static code should be the answer for most of what we do?

Cost — GPUs and models are expensive to run, code running in containers is cheap and efficient.
Deterministic — If you run code you don’t need to worry about hallucinations or the code doing something that you didn’t expect it to. You also don’t need to build supervisors or governance processes; the code just does what it is supposed to do.
Performant — Code is fast. No matter what we do, AI agents will always be slower to solve a deterministic problem than a coded algorithm.
Existing Practices — Why throw the baby out with the bath water? Build-time agents help us to improve our existing development practices. Run-time agents require a complete rethink of how we build, test, govern, and scale technology safely

At the bank, our strategy is two-fold. We are rapidly adopting build-time agents and new delivery workflows in our teams. We are also developing run-time agents in a select number of cases where Gen AI can solve these unique problems.

The Reality of Working with Build-Time AI Agents

While the productivity gains are real, working with build-time AI agents isn’t always smooth sailing. There’s a learning curve, and in practice I see it takes several weeks for engineers to figure out how to prompt effectively and determine when to trust the output versus when to step in.

The agents excel at certain types of work such as boilerplate code, test generation, refactoring, and implementing well-defined features. They can get stuck however on ambiguous requirements or complex business logic. I’ve watched engineers spend more time trying to explain a nuanced problem to an agent than it would have taken to just code it themselves. The key is recognising these situations quickly and knowing when to switch approaches.

In saying that, we have found that it is best for a leading team in each domain (or project) to first build rules and agent modes/configurations to clearly direct and constrain the agent, along with connecting the appropriate knowledge bases (RAG) and tools (MCP) so that the rest of the team can hit the ground running, without facing the same roadblocks.

We are seeing that the developer workflow is changing. Instead of thinking “How do I code this?” you start thinking “How do I explain this problem clearly enough for the agent to solve it?” It’s almost like having a very capable junior-level developer who needs precise instructions but can execute at superhuman speed once they understand the task. The engineers who’ve adapted best treat it as a collaborative partner rather than a magic code generator. They have learned to iterate on prompts, validate outputs carefully, and maintain the same engineering rigor they’d apply to any code review.

The frustration usually comes when engineers expect the agent to read their mind or handle poorly defined requirements. But when the problem is clear and the constraints are well-articulated, watching these tools work feels genuinely transformative.

What does this mean for software engineers?

As you can see, the progression of stages in our framework leads to ever more automated coding practices, requiring less human involvement. This progression raises profound questions about the future role of software engineers. Rather than writing code, will engineers increasingly shift towards focusing on problem definition, domain expertise, system architecture, and quality assurance?

At this stage the answer is both yes and no. These tools are productivity boosters when used in the right hands for the right purpose. It takes time to get to know how and when to use these tools effectively. Regardless of the progress in this space, I firmly believe that there will always be a need for experienced software engineers to develop, train, validate and optimise these agentic coding systems, along with all of the other things we do as software engineers including defining the problem, solution, system design for scale, refactoring given new requirements, quality assurance, and production monitoring and analysis.

As anyone who has used ChatGPT can attest, it is very confident, even when it is incorrect. You need the domain and technical expertise to ensure that the output is correct, and the code scalable and of the right quality.

That said, it is clear that software engineering is evolving. We expect to see less manual coding and more prompt-driven development (not just vibing), and more operational work automated so we can focus on customer value. In this future, perhaps code generated by agents will be considered the “new machine code”, where engineers care less about the specific implementation details, and more about the end-goal of delivering customer value?

All I can say for certain is that it’s an exciting time to be a software engineer, and the engineers who adapt to using the new tools will be in higher demand than ever.

CommBank publishes our Engineering Job and Skills framework here — we will be writing a longer post in the future about the evolution of the Software Engineering role and how we see it evolve with the introduction of Gen AI tools, but we remain convinced that Software Engineers will continue to be at the heart of designing and building new technology.

What does this mean for leaders?

This level of change isn’t just a tools problem; it’s a people, process, and culture problem. As leaders we need to think about how these changes impact our organisation from the perspectives of team structures, performance management, risk & governance, and training & up-skilling.

We must be mindful of the fact that there is a barrier to entry, and there will be hiccups along the way of adopting these tools. Allocating the time and providing trust & safety to teams experimenting with them is key.

It’s also important to note that as capabilities advance, the organisational change becomes immense. We need to be careful that we are validating each step for disruption compared to efficiency, and putting in place plans to ensure our organisation can adopt these new development workflows without too much disruption to current delivery.

Adoption — Where are we?

At the bank, our engineers are choosing to use GitHub Copilot Agent, Roo Code, Cline along with GitHub Copilot for autocomplete. The engineers who use these tools are showing up to 3x increase in the number of merged pull-requests, compared to the cohort who have not adopted these tools. Whilst these numbers are imperfect, they do provide some insight into the efficiency gains that we might expect as we further increase utilisation across the group.

An increase in coding output will then put immense pressure on code review and delivery processes, which also need to be updated to support an increase in delivery velocity. We are embarking on solving some of these problems in our organisation next.

What’s next?

When I started writing this post, GitHub Copilot Padawan was in pre-preview, and this was the only major party releasing Level 3 tools. A few weeks later we now have GitHub Copilot Coding Agent, OpenAI’s Codex, Google’s Jules and a few others.

Over the next 6 months we’ll be trialling and adopting these Level 3 tools in our organisation. Level 2 tools will continue to evolve and compete with each other until they are all relatively similar, meaning it will come down to the best pricing for the underlying models.

My prediction is that by this time next year, Level 3 agents will be performing most of the bugfix, patching and work we define as “engineering toil”, managed and supervised by human engineers, and the most sought-after engineering skill will not be proficiency in a programming language, but the demonstrated ability to understand the domain, architect complex prompts and workflows, and manage coding agents to solve business problems.

We continue to watch this space closely and are ready to adapt as the industry evolves.

The following sources were used as research for this article: https://research.aimultiple.com/ai-agent-tools/, https://prompt.16x.engineer/blog/ai-coding-l1-l5, https://jellyfish.co/blog/best-ai-coding-tools/, https://blog.n8n.io/best-ai-for-coding/

I hope that this has been an interesting read and a useful guide for you if you are new to the space. The field of AI-powered software engineering is rapidly evolving, and staying informed about these developments is crucial for any software professional looking to remain relevant and effective.

If you want to find out more about working as an AI Engineer at CommBank, feel free to message me on LinkedIn.

Brent McKendrick,
Distinguished Engineer — The Commonwealth Bank

Brent is a Distinguished Engineer at The Commonwealth Bank of Australia, where he is responsible for leading the AI Powered Engineering initiative, and modernising the group’s engineering platforms & practices. Brent previously led Checkout Engineering at Uber, and lead engineering at Zip Co from seed to unicorn status.

Brent Mckendrick

人工智能软件工程的演变 The Evolution of AI Software Engineering

人工智能软件工程的演变
The Evolution of AI Software Engineering