Benefits for LWN subscribersThe primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!
By Jonathan Corbet
August 7, 2025
Arguably, the current round of debate began with this article on a presentation by Sasha Levin at the Open Source Summit North America in June; his use of an LLM to generate a kernel patch came as a surprise to some developers, including the maintainer who accepted that patch. Since then, David Alan Gilbert has posted a patch proposing requirements for the disclosure of LLM use in kernel development. Levin has posted a series of his own focused on providing configurations for coding assistants and guidelines for their use. Both of these submissions have provoked discussions ranging beyond their relatively narrow objectives.
Gilbert suggested the use of a new patch tag, Generated-by, to identify a tool that was used to create a kernel patch; that tag would be expected not just for LLM-generated patches, but also patches from long-accepted tools like Coccinelle. Levin, instead, suggests using the existing Co-developed-by tag, but takes pains to point out that an LLM should not add the Signed-off-by tag that normally is required alongside Co-developed-by. Either way, the suggestion is the addition of information to the tags section of any patch that was generated by an LLM-based tool.
A step back
While much of the discussion jumped directly into the details of these
patches, some developers clearly feel that there is a more fundamental
question to answer first: does the kernel community want to accept
LLM-developed patches at all? Vlastimil Babka responded
that Levin's patch set was "premature
", and that there was a need to
set the rules for humans to follow before trying to properly configure
LLMs:
So without such policy first, I fear just merging this alone would send the message that the kernel is now officially accepting contributions done with coding assistants, and those assistants will do the right things based on these configuration files, and the developers using the assistants don't need to concern themselves with anything more, as it's all covered by the configuration.
Lorenzo Stoakes said
that "an official kernel AI policy document
" is needed first, and
suggested that it would be best discussed at the Maintainers Summit (to be
held in December). He agreed with Babka that merging the patches in the
absence of such a policy would be equivalent to a public statement that
LLM-generated patches are welcome in the kernel community.
A number of developers expressed concerns that these tools will be used to
generate patches that are not understood by their submitters and which may
contain more than the usual number of subtle bugs. David Hildenbrand worried
that he would end up dealing with contributors who simply submit his
questions to the tool that generated the patch in the first place, since
they are unable to explain the code on their own. He also pointed out the
policy
adopted by the QEMU project, which essentially bans LLM-generated
contributions in that project. Al Viro described LLM-based tools
as "a force multiplier
" for the numerous developers who have, for
years, been submitting machine-generated patches that they don't
understand.
Mark Brown, instead, suggested that these tools will be used regardless of the kernel policy:
I'm also concerned about submitters just silently using this stuff anyway regardless of what we say, from that point of view there's something to be said for encouraging people to be open and honest about it so it can be taken into consideration when looking at the changes that get sent.
Levin's point of view is that
the current policy for the kernel is that "we accept agent generated
contributions without any requirements beyond what applies to regular
humans
"; his objective is to work out what those extra requirements
should be. It should also be noted that some developers clearly feel that
these tools are helpful; Kees Cook, for example, argued against any sort
of ban, saying it would be "not useful, realistic, nor enforceable
".
Elsewhere, he has commented that
"the tools are finally getting interesting
".
Disclosure
If the kernel project were to ban LLM-generated code, then the rest of the
discussion would be moot, but that would appear to be an unlikely outcome.
If one assumes that there will be (more) LLM-generated code entering the
kernel, a number of questions come up, starting with disclosure of tool
use. Both Gilbert and Levin propose the addition of patch tags to document
this use. A couple of developers disagreed with that idea, though;
Konstantin Ryabitsev said that
this information belongs in the cover letter of a patch series, rather than
in the tags. That is how code generated by tools is described now, and he
did not see a reason to change that practice. Jakub Kicinski argued that the
information about tools was "only relevant during the review
", so
putting it into patch changelogs at all "is just free advertising
"
for the tools in question.
The consensus view, though, would appear to be in favor of including tool
information in the patch itself. Cook, who initially favored keeping tool
information out of the tags, later acknowledged that it would
be useful should the need come to track down all of the patches created by
a specific tool. Steve Rostedt said that this
information could be useful to find patterns of bugs introduced by a
specific tool. Laurent Pinchart noted
that formalized patch tags would be useful for tracking down any
copyright-related problems as well. Gilbert commented that disclosure
"lets the people who worry keep of track what our mechanical overlords
are doing
".
If one takes the position that tool use must be disclosed, the next
question is inevitably: where should the line be drawn? Levin asked whether the use of a
code-completion tool requires disclosure, for example. Others have
mentioned using compiler diagnostics to find problems or the use of
language-sensitive editors. There is clearly a point where requiring
disclosure makes no sense, but there does not, yet, appear to be a
consensus on where that point is. One possible rule might be this one suggested by
Rostedt: "if AI creates any algorithm for you then it must be
disclosed
".
Meanwhile, Levin's first attempt to disclose LLM usage with a Co-developed-by tag drew an amused response from Andrew Morton, who seemingly had not been following this conversation. Hildenbrand responded that a new tag, such as Assisted-by, would be more appropriate; Ryabitsev has also made that suggestion.
Copyright and responsibility
The copyright status of LLM-generated code is of concern to many developers; if LLM-generated code ends up being subject to somebody's copyright claim, accepting it into the kernel could set the project up for a future SCO-lawsuit scenario. This, of course, is an issue that goes far beyond the kernel community and will likely take years of court battles worldwide to work out. Meanwhile, though, maintainers will be asked to accept LLM-generated patches, and will have to make decisions long before the legal processes have run their course.
Levin pointed to the generative-AI guidance from the Linux Foundation, saying that it is the policy that the kernel community is implicitly following now. In short, this guidance suggest that developers should ensure that the tool itself does not place restrictions on the code it generates, and that said code does not incorporate any pre-existing, copyrighted material. Levin suggested using this document as a starting point for judging the copyright status of submissions, but that guidance is only so helpful.
Michal Hocko asked how
maintainers can be expected to know whether the conditions suggested in
that "quite vague
" guidance have been met. Levin's answer reflects a theme that came
up a few times in the discussion: that is what the Signed-off-by
tag applied by the patch submitter is for. By applying that tag, the
submitter is indicating that the patch is a legitimate contribution to the
kernel. As with any other patch, a contributor needs to be sure they are
on solid ground before adding that tag.
That reasoning extends beyond just copyright status to responsibility for
the patch at all levels. Rostedt suggested
documenting that a signoff is also an indication that the submitter
understands the code and can fix problems with it. Viro said that, for any patch
regardless of origin, "there must be somebody able to handle active
questioning
" about it. Levin added that: "AI doesn't send
patches on its own - humans do
", so it is the human behind the patch
who will ultimately be responsible for its contents.
The reasoning makes some sense, but may not be entirely comforting to
nervous maintainers. The people submitting LLM-generated patches are not
likely to be in a better position to judge the copyright status of that
work than maintainers are. Meanwhile, maintainers have had to deal with
patches from contributors who clearly do not understand what they are doing
for many years; documenting that those contributors must understand the
output from coding tools seems unlikely to slow down that flood.
Hildenbrand expressed
his concern this way: "We cannot keep complaining about maintainer
overload and, at the same time, encourage people to bombard us with even
more of that stuff
". Based on what has been seen in other areas, it
would not be surprising to see an order-of-magnitude increase in the flow
of low-quality patches; indeed, Greg Kroah-Hartman said that it is
already happening.
More discussion
The end result is that the question of how to incorporate LLM-based development tools into the kernel project's workflow is likely to feature prominently in community discussions for some time. While these tools may bring benefits, including finding patterns that are difficult for humans to see and the patient generation of test code, they also have the potential to bring copyright problems, bugs, and added maintainer stress. The pressure to use these tools is not going away, and even the eventual popping of the current AI bubble seems unlikely to change that.
Within a few milliseconds of the posting of the call for topics for the
2025 Maintainers Summit, there were two separate proposals (from Stoakes
and Jiri
Kosina) on the issue of AI-based tools in the kernel workflow; they
have sparked discussions that will surely have progressed significantly by
the time this article is published. One does not, it seems, need an LLM to
generate vast amounts of text. This conversation is, in other words,
just beginning.