TL;DR
On this page
A freelance AI engineer whose $15,000 LLM integration is in iteration round nine, because the contract said acceptance was "based on eval performance" without ever defining the eval, has discovered the clause that separates AI contracts from every other software contract. When acceptance is tied to a metric nobody wrote down, there is no point at which the work is finished, and "make it better" becomes an unpaid loop. The fix is not more hours. It is a contract that names the eval, the threshold, and the round count before the first commit.
pro tip
Generic software contracts assume deterministic output and a clear "it works" moment. AI work has neither. A freelance AI engineer contract needs an eval-acceptance gate that defines acceptance as a measurable metric threshold on a named dataset, an iteration cap that turns extra eval-and-revise rounds into change orders, model IP terms covering the fine-tuned weights, prompts, outputs, and eval harness, and a compute cost pass-through that caps API spend per round. The base model vendor's AS-IS terms give you nothing to lean on, so the contract stands alone.
The base contract layer is in freelance contract essentials, and the general IP framework is in the freelancer IP ownership guide. The eval methodology you committed to in the AI engineer proposal becomes the acceptance language here, the billing side is in the AI engineer invoice guide, and project sizing context is in the 2026 AI engineer rate report.
Why AI contracts break where software contracts hold · The eval-acceptance gate · The iteration cap · Model IP: five assets generic contracts miss · Why the vendor's terms don't cover you · Compute cost pass-through
Why AI Contracts Break Where Software Contracts Hold
A traditional software contract can lean on a binary: the feature works to spec, or it does not. AI work removes that binary in two ways. Outputs are nondeterministic, so the same input can produce different responses, and "correct" is a statistical property rather than a single right answer. And the deliverable is layered: a base model owned by a third party, a fine-tuned layer trained on the client's data, a set of prompts, an eval harness, and the outputs themselves, each a separate IP question.
The market has noticed. Per Morgan Lewis, AI deals increasingly include commitments around "accuracy thresholds (which, although time intensive to create, are increasingly important)," and "where AI is embedded in core business processes and outcomes matter, buyers are increasingly unwilling to accept a warranty-free posture." Their recommendation is to address "output ownership, training data use limitations, and data return and deletion rights explicitly, tailored to the actual use case rather than carried over from a generic SaaS template." For the freelance engineer, that last phrase is the whole point: a copied SaaS or generic dev contract will not carry the clauses AI work needs.
The Eval-Acceptance Gate
This is the load-bearing clause. Acceptance has to be a thing that can be measured and tested, not a feeling. Per ContractKen, an acceptance clause "transforms subjective satisfaction (I'll know it when I see it) into objective, measurable standards," with the example "the system must process 10,000 transactions per second with less than 200ms latency." The hard rule: "if a criterion cannot be tested, it should not be in the acceptance matrix," and phrases like "the system shall be user-friendly" or "performance shall be satisfactory" are explicitly flagged as "not testable criteria."
For an AI build, a testable acceptance gate names four things:
- The metric. Accuracy, F1, precision, recall, ROUGE, exact-match rate, p95 latency, or a task-specific score. Name the one that defines success.
- The dataset. The fixed, held-out test set the metric runs against, agreed before development. Acceptance against a moving target is no acceptance at all.
- The threshold. The number that counts as a pass (for example, "context recall of at least 0.85 on the agreed 200-item test set").
- The pass rule for nondeterminism. Because outputs vary, acceptance is usually a pass rate across the test set ("at least 90% of cases meet the rubric"), not exact output matching on every run.
Real contracts already frame acceptance around testable dimensions. Per a sample on Law Insider, acceptance is structured around "Timeliness," "Completeness," and "Technical accuracy: The Services and Deliverables are accurate as measured against commonly accepted standards." The AI engineer's job is to make "accurate" concrete by attaching the metric, dataset, and threshold. Once the eval passes, acceptance is automatic, which protects the engineer as much as the client.
The Iteration Cap and Change-Order Trigger
The eval gate defines what passing means. The iteration cap defines how many tries are included before more tries cost money. Without it, even a well-defined eval can spawn endless "can we push it higher" rounds.
Write the cap as a fixed number of eval-and-revise cycles inside the project price. One round is: run the agreed eval, report the metric, deliver one revision pass. State plainly what happens at the boundary:
- When the eval meets the contracted threshold, acceptance is automatic and the milestone is paid.
- When the client wants a higher threshold than the contract specified, that is new scope and a change order at the agreed rate, not a free round.
- When the included rounds are used and the threshold is still not met for reasons inside the engineer's control, the engineer continues at no charge; when it is not met because the data or the goalposts changed, it is a change order.
This is the AI-specific version of standard scope control. The general mechanics are in the scope-of-work guide and how to handle scope creep, and the same scope-lock logic appears in the web developer contract guide. The difference for AI is that the trigger is a metric, so the change-order line is unusually clean: the eval number is either at threshold or it is not.
Model IP: Five Assets Generic Contracts Miss
A generic contract assigns "the work product" and stops. AI work has five distinct IP assets, and each needs a line.
- Fine-tuned weights. The layer trained on the client's data. Per Pertama Partners, a standard clause reads: "Any model fine-tuning, customizations, or improvements developed using Customer Data (Customer Model Improvements) shall be owned by Customer."
- Prompts and outputs. Per ContractNerds, the agreement "should expressly identify the customer as the owner of all IP rights in the AI inputs (which the agreement may call prompts) and the output." Prompt libraries are real deliverables and need an explicit owner.
- Improvements, defined broadly. Per DarrowEverett, the contract should "define improvements comprehensively to cover fine-tuning, prompt engineering, derivative datasets, and model outputs," with a sample clause that "all Improvements trained on Licensee Data shall be owned exclusively by Licensee and shall not be used for other customers."
- The base model, carved out. The foundation model belongs to its vendor and is only licensed. The contract assigns the fine-tuned layer and the prompts; it cannot assign what the engineer does not own. Say so, so the client does not believe they bought OpenAI.
- The eval harness and test set. The scripts, graders, and held-out data that prove acceptance are a separate asset from the model. Decide who keeps them, because whoever owns the eval owns the ability to re-verify the system later.
Per ContractNerds, whether the fine-tuned model is "1) owned by the customer, 2) jointly owned between the customer and developer, or 3) does the customer merely obtain a license" has no default answer, so "clear contractual language is essential to define what is owned and by who." The general assignment-versus-license framework is in the freelancer IP ownership guide; this list is the AI-specific extension of it.
Why the Base Model Vendor's Terms Don't Cover You
A tempting shortcut is to assume the LLM provider's terms backstop the project. They do not. Per ContractNerds' audit of LLM vendor agreements, vendors "mostly disclaim all warranties with an AS-IS statement," and "most LLMs go with a limitation of liability cap set at an amount of fees paid prior to the date of the claim or $100-$200." Worse for the downstream party, "most vendors will require that the customer indemnifies them for a broad category of claims arising from use of the service, not limited to IP infringement," and "this is never subject to a dollar cap."
The consequence for a freelance AI engineer is direct: nothing useful flows down from the model provider. The acceptance obligation, the IP assignment, and the liability terms in your engagement have to be written by you, in your contract, because the foundation you are building on disclaims responsibility for the result. That is the real reason an AI contract cannot be a recycled SaaS template.
Compute and API Cost Pass-Through
Unbounded iteration is not just unpaid time; it is unpaid spend. Each eval-and-revise round burns API tokens or GPU hours, and on a long iteration loop those costs are real. The contract should treat compute as a separate, capped line:
- State whether API and compute costs are billed at cost or with a stated markup, and pass through the provider invoices for at-cost billing.
- Cap compute spend per iteration round, with overage requiring written approval, so a runaway eval cannot quietly multiply the bill.
- Tie compute billing to the milestone structure in the AI engineer invoice guide, so the client sees compute as its own line rather than a hidden cost inside the fee.
Naming compute as a pass-through line does the same job for cost that the iteration cap does for time: it puts a boundary where an AI project would otherwise have none.
Copy-Paste Clause Checklist
AI engineer contract protection checklist
Build the full contract with these clauses in the free FreelanceDesk contract generator, or start from the best free contract templates roundup and add the eval-acceptance and model-IP language.
References
- Negotiating AI Provisions in Commercial and Technology Contracts: Where the Market Is Heading, Morgan Lewis
- Acceptance Criteria in Contracts: Meaning and Examples, ContractKen
- Acceptance Criteria Clause Samples, Law Insider
- Securing Innovation: Key IP Contract Clauses for AI Deployment and Development, ContractNerds
- Navigating the LLM Contract Jungle: A Lawyer's Findings From an LLM Terms Audit, ContractNerds
- AI IP Ownership in Contracts: Protecting Your Rights, Pertama Partners
- Key IP Licensing Considerations in AI Technology Agreements, DarrowEverett
