Provisional Guidance for Users of LLM-Based Code Generators
I’m sure there will be links like “Court Rules AI Art Can’t Be Copyrighted” aplenty. They will be wrong. The court didn’t rule that AI art can’t be copyrighted. It ruled that copyright requires human authorship, surprising approximately zero copyright lawyers…or people who have read the Wikipedia page.
If you’re looking for a “simple legal rule” so that you can game it, nitpick its terms, or run right up to its line, you’re looking for trouble. Don’t blame me when you find it. But if you’re a realistic player just looking for a sense of odds so you can place wiser bets, the amount of output you accept from an LLM into your codebase at once, and the extent to which it makes what look like implementation choices, rather than simply invoking APIs or established boilerplate, probably represents your best intuitive heuristic. Your working sense of whether it looks like code completion, template-based code generation, or what coders used to have to unavoidably think through and type for themselves, before Copilot and the like came around, can serve as first-pass proxy for legal peril.
If it’s what everybody else checks in to use the same APIs, that’s unlikely creative expression that anyone can claim to own and see infringed. The more specific, creative routines that go within that boilerplate? Yes, potentially. The rigging, patterns, and boilerplate everybody else is filling in, too? Not so much.
the newer a novel, commercially relevant phenomenon, the less specifically-worded, algorithm-like rules determine outcomes at law, and the more important the purposes behind more generally worded rules become. Lawyers call abstractly stated, syllogism-like rules “black letter law” and the more generalized purposes “policies”. When how to apply black letter law isn’t clear, we cite and fight about policies in arguing how to read in context.
When you prompt and take big chunks of code from LLMs that rate high on the intuitive completion-generation-authorship scale, document your code input state, prompts, and further edits. Create a written record of your innocent use of LLMs.
If you were going to code a key part of a project ten years ago, and worried you’d be accusing of plagiarism, the natural advice would’ve been to document your process. Don’t just phone it in with an “Implemented $foo” commit message. Write a nice long one, and maybe blog work in progress or keep a “lab notebook”, too.