TL;DR
On this page
A 2026 freelance AI engineer invoice has a different shape than a generic freelance invoice. The line items have to handle compute pass-through (LLM inference cost, embedding cost, vector DB hosting, GPU hours), the milestones have to align with eval-acceptance gates rather than calendar dates, and the payment terms have to absorb the iteration risk that comes with production AI work. This post is the AI-engineering-specific template covering all three.
The general invoice basics live in how to write a freelance invoice. The hourly + project + retainer rate research that justifies the dollar amounts is in AI engineer freelance rates 2026. The companion proposal that locks deliverables and eval methodology before the invoice goes out is in AI engineer proposal that wins.
Why AI Engineer Invoices Are Different
A copywriter invoice has 1-2 lines. A web developer invoice has 1-12 lines depending on milestones and pass-throughs. An AI engineer invoice often has 6-15 lines because compute pass-through is itself decomposed into multiple cost lines that mature buyers expect to see broken out.
| Profession | Typical lines per invoice | Unique cost element |
|---|---|---|
| Copywriter | 1-2 | Word count and revisions |
| Web developer | 1-12 | Scope changes, hosting passthroughs, platform fees |
| AI engineer | 6-15 | LLM inference + embedding + vector DB + GPU hours + eval gates |
| Videographer | 6+ | Day rate, kit, crew, post, license |
The decomposition matters because of how mature AI buyers think about cost. Per Finout's 2026 FinOps for AI guide, the recommended FinOps practice is to "break down the cost of the RAG pipeline into its parts. For example, in our cost reports we separate 'LLM inference cost', 'Embedding cost', and 'Vector DB cost'." When your invoice mirrors that decomposition, it slots directly into the client's cost-tracking workflow. When your invoice rolls everything into a single "compute" line, it triggers procurement questions and slows payment.
The 4 Billing Models for AI Engineering in 2026
There are four production billing models. Each has a use case and a specific invoice format.
| Model | Use case | Invoice format |
|---|---|---|
| Hourly | Exploration, eval debugging, on-call | Line per task with hours and rate |
| Fixed-bid project | Locked-scope deliverable (e.g., RAG MVP) | One project line + change-order lines + compute pass-through |
| Milestone billing | Multi-stage delivery work over $10K | One invoice per milestone with milestone fee + per-milestone compute |
| Retainer (monthly) | Ongoing dev + maintenance + on-call | Fixed monthly retainer line + overage hours line |
Most senior AI engineering engagements in 2026 are hybrids: fixed-bid milestones for the planned scope plus an explicit hourly rate for change requests outside the bid plus pass-through compute on top. Per Second Talent's 2026 freelance ML engineer hourly rate, the senior median hourly anchor is 185 dollars and specialists in distributed training and MLOps charge 275-450 dollars; per Second Talent's 2026 freelance LLM developer hourly rate, LLM specialists earn 75-700 dollars with senior median 210 dollars and fine-tuning/RLHF specialists 350-700 dollars. Set your change-request hourly rate at the same number you would charge for a standalone hourly engagement, not below.
Sample Hourly Invoice (RAG Eval Debugging)
The hourly format works for exploration, eval debugging, prompt iteration, and incident response. Each task gets its own line.
| Description | Hours | Rate | Amount |
|---|---|---|---|
| Eval failure diagnosis (citation-grounding regression) | 4.5 | $185 | $832.50 |
| Retrieval pipeline tuning (chunk size + reranker config) | 6.0 | $185 | $1,110 |
| New eval set construction + baseline scoring | 5.0 | $185 | $925 |
| Production cutover monitoring | 2.0 | $185 | $370 |
| Subtotal (engineering) | - | - | $3,237.50 |
| LLM inference cost pass-through (Anthropic Claude API, period dashboard attached) | - | - | $187.40 |
| Embedding cost pass-through (Cohere embeddings, period dashboard attached) | - | - | $42.10 |
| Total due | - | - | $3,467 |
Notes on the format:
- Each engineering line names the actual deliverable, not "consulting" or "engineering work" generically. Per Plutio, vague line items are the number-one cause of payment disputes.
- Compute pass-through gets its own subtotal block below the engineering subtotal so the client can see what is your time vs what is their cloud spend.
- Attach the provider dashboard screenshots as PDF backup. Most disputes on pass-through happen because the client doesn't believe the cost figure; the screenshot ends the question.
Sample Milestone Invoice (RAG MVP, 33/33/34 Split)
For a $30,000 RAG MVP project split across three milestones, each milestone produces one invoice. The split is 33% / 33% / 34% per Plutio's 2026 invoice payment terms guide, which recommends 3-stage milestone splits for projects in the $10,000-$50,000 range. This is the second of three invoices.
| Description | Amount |
|---|---|
| Milestone 2 fee - Retrieval relevance threshold passed (per Section 4.2 of MSA) | $9,900 |
| Approved change order CR-002: legal-doc OCR pipeline added | $2,400 |
| LLM inference cost pass-through (training + eval runs, Anthropic dashboard attached) | $612 |
| Embedding cost pass-through (initial corpus + delta indexing) | $148 |
| Vector DB hosting (Pinecone, March 2026, dashboard attached) | $89 |
| GPU hours pass-through (eval batch runs, Modal dashboard attached) | $76 |
| Subtotal | $13,225 |
| Less: Deposit credit (25% of total project value applied to milestones 1+2) | -$3,500 |
| Total due | $9,725 |
| Payment terms: Net 15. Late fee 1.5% per month after due date. | - |
Notes on the format:
- The milestone fee references the contract section number that defines the deliverable. This makes the invoice self-documenting against the MSA and reduces "what did I sign up for again?" disputes.
- Approved change orders get their own line with the CR number. Never bury a change order inside the milestone fee; it triggers procurement audits.
- Each compute pass-through line names the provider AND notes that the dashboard is attached. Mature buyers will not reimburse pass-through without backup.
- Deposit credit is applied as a negative line. The cleanest pattern is to apply 50% of the deposit to milestone 1 and 50% to milestone 2 (so milestone 3 is paid in full with no deposit credit and the client gets the final delivery before the final dollar moves).
Sample Retainer Invoice (Standard AI Maintenance + Dev)
Retainers became standard in 2026 for production AI work where the engineer is on-call for incidents and continuous improvement. The retainer covers a defined hour bank; overages bill at the standard hourly rate.
| Description | Amount |
|---|---|
| Monthly retainer - Standard tier (40 hours included, $9,000) | $9,000 |
| Overage hours (3 hours @ $185/hr) | $555 |
| LLM inference cost pass-through (March 2026 production usage, dashboard attached) | $1,247 |
| Embedding cost pass-through (continuous indexing for new docs) | $214 |
| Vector DB hosting (Pinecone, March 2026) | $89 |
| Total due | $11,105 |
| Payment terms: Net 15. Auto-renews monthly unless cancelled by 15th of prior month. | - |
Notes:
- Retainer fee is constant month over month. Overages are the only variable on the engineering side.
- For specialist retainers (LLM/MLOps/safety), the same structure applies but the retainer fee scales: $15K-$40K/month for specialist on-retainer arrangements with safety, MLOps, or LLM evaluation expertise on-call for incident response.
- Cancellation language matters. The "by 15th of prior month" clause prevents a client from cancelling on the 30th and avoiding the next month's invoice; this is the AI-engineering equivalent of the standard SaaS cancellation cutoff.
Compute Pass-Through Line Items (Reference)
| Cost component | Typical billing unit | Where to source the dashboard for backup |
|---|---|---|
| LLM inference (input + output) | Per million tokens at provider rate | OpenAI, Anthropic, Google, AWS Bedrock dashboards |
| Embedding generation | Per million tokens | Cohere, OpenAI, Voyage AI dashboards |
| Vector DB storage + queries | Per pod-hour or per query | Pinecone, Weaviate Cloud, pgvector hosting |
| GPU hours (training/fine-tune) | Per GPU-hour | Modal, RunPod, Lambda Labs, AWS SageMaker, GCP |
| Cloud egress (model serving) | Per GB egress | AWS, GCP, Azure billing dashboards |
| Monitoring / observability | Per signal volume or event count | Langfuse, Helicone, Datadog AI integrations |
For current per-token rates by model, link your client to OpenAI API pricing directly rather than quoting a number that will be stale within a quarter. Per Finout, there can be 30x-200x cost variance between an unoptimized AI deployment and a well-optimized one, so the per-token rate is the floor; the architecture decisions are where the real money is. That is one reason mature buyers value engineers who can show pass-through line items decomposed by component (it signals you understand where their money goes).
A useful framing from Finout: 80 percent or more of the true cost in agentic deployments lies in hidden areas (compliance, dev hours, maintenance) rather than direct cloud bills. Your invoice's engineering hours line is part of that 80 percent. Naming it explicitly (rather than letting the client compare your engineering hours line to a much smaller compute pass-through line and feel sticker shock) helps frame the value.
Deposit Schedule by Project Size
Per Plutio's 2026 invoice payment terms guide, deposit norms scale with project size and client trust.
| Project size | New client deposit | Established client deposit | Notes |
|---|---|---|---|
| Under $2,000 | 50% (with Due on Receipt or Net 7 terms) | 25% | Plutio explicit recommendation |
| $2,000-$10,000 | 50% with milestones | 25% with milestones | Plutio: 25-50% range with milestone billing |
| Over $10,000 | 25% with milestones | 25% with milestones | Plutio: 25% deposit recommended for large projects |
For AI engineering specifically, the deposit also covers the cost of standing up your dev environment (provider accounts, eval infrastructure, monitoring tooling) before client revenue starts flowing. Without a deposit, you absorb the setup cost AND front the early compute spend on your card. Both add up fast on a non-trivial AI project.
Late Fee + Kill Fee Clauses
Two clauses that should appear on every AI engineering invoice or attached MSA:
- Late fee. 1.5 percent per month on overdue balances per SolidGigs' 2026 freelance payment terms guide, or the maximum allowed by your local laws if higher. Activates on the day after the due date; compounds monthly. The clause exists less for revenue and more as a behavioral lever; clients pay your invoices first when there is a real cost to delay.
- Kill fee. 25 percent of remaining unpaid project value due within 14 days of termination per Plutio's 2026 invoice payment terms guide. The kill fee compensates for lost opportunity cost and ramping-up time when a client cancels mid-engagement. For AI projects this matters more than typical software work because AI engagements often involve provider accounts, eval infrastructure, and tooling investments that don't carry over to the next client.
The practical phrasing for the kill-fee clause: "If Client terminates this agreement after work has begun, a kill fee of 25 percent of the remaining unpaid project value is due within 14 calendar days of termination."
What This Means for Sending Your Next AI Engineer Invoice
Three takeaways for an AI engineer about to send the next invoice:
- Decompose pass-through into the components mature buyers expect. LLM inference, embedding, vector DB, GPU hours as separate lines. Mirror the FinOps cost-decomposition framework. Attach provider dashboards as backup.
- Tie milestone fees to acceptance criteria, not calendar dates. Calendar-date milestones invite disputes when work legitimately slips. Acceptance-criteria milestones (eval threshold met, retrieval relevance passed, production cutover live) are auditable and defensible.
- Build deposit + late fee + kill fee into the contract template once, then never argue about them again per project. The financial mechanisms that make the invoice enforceable belong in the MSA. The invoice references them; it doesn't relitigate them.
The companion rate research that justifies the hourly numbers is in AI engineer freelance rates 2026. The proposal format that locks scope before this invoice goes out is in AI engineer proposal that wins. The general invoice fundamentals are in how to write a freelance invoice. The retainer-vs-hourly framing for related consulting work is in consultant invoice retainer hourly value. The web-development comparison invoice is in web developer invoice 2026. The deeper payment-terms playbook is in freelance payment terms and the late-paying-client playbook is in late-paying clients.
To send this invoice without rebuilding the line-item structure each time, use FreelanceDesk's invoice generator which preserves the compute pass-through structure as a saved template.
References
- Plutio: Invoice Payment Terms for Freelancers 2026
- LedgerUp: Net 15 Payment Terms
- SolidGigs: Freelance Payment Terms to Get Paid
- Finout: FinOps in the Age of AI — LLM Workflows, RAG, AI Agents, Agentic Systems
- OpenAI API Pricing
- Second Talent: Freelance ML Engineer Hourly Rate US 2026
- Second Talent: Freelance LLM Developer Hourly Rate US 2026
