Skip to main content
Invoicing

AI Engineer Invoice Template 2026: Hourly, Project, Retainer, and Compute Pass-Through Billing

Updated 12 min read

TL;DR

AI engineer invoice 2026: six line items - engineering hours or milestone fee, LLM inference (input + output tokens), embedding cost, vector DB hosting, GPU hours, eval acceptance. Deposit 50 percent new / 25 percent established per Plutio. Net 15 default. 1.5 percent per month late fee. 25 percent kill fee on remaining unpaid. RAG and fine-tuning split across 3 milestones tied to acceptance criteria, not calendar dates. Compute pass-through mirrors the LLM inference + embedding + vector DB cost decomposition mature AI buyers expect.

A 2026 freelance AI engineer invoice has a different shape than a generic freelance invoice. The line items have to handle compute pass-through (LLM inference cost, embedding cost, vector DB hosting, GPU hours), the milestones have to align with eval-acceptance gates rather than calendar dates, and the payment terms have to absorb the iteration risk that comes with production AI work. This post is the AI-engineering-specific template covering all three.

The general invoice basics live in how to write a freelance invoice. The hourly + project + retainer rate research that justifies the dollar amounts is in AI engineer freelance rates 2026. The companion proposal that locks deliverables and eval methodology before the invoice goes out is in AI engineer proposal that wins.

Why AI Engineer Invoices Are Different

A copywriter invoice has 1-2 lines. A web developer invoice has 1-12 lines depending on milestones and pass-throughs. An AI engineer invoice often has 6-15 lines because compute pass-through is itself decomposed into multiple cost lines that mature buyers expect to see broken out.

ProfessionTypical lines per invoiceUnique cost element
Copywriter1-2Word count and revisions
Web developer1-12Scope changes, hosting passthroughs, platform fees
AI engineer6-15LLM inference + embedding + vector DB + GPU hours + eval gates
Videographer6+Day rate, kit, crew, post, license

The decomposition matters because of how mature AI buyers think about cost. Per Finout's 2026 FinOps for AI guide, the recommended FinOps practice is to "break down the cost of the RAG pipeline into its parts. For example, in our cost reports we separate 'LLM inference cost', 'Embedding cost', and 'Vector DB cost'." When your invoice mirrors that decomposition, it slots directly into the client's cost-tracking workflow. When your invoice rolls everything into a single "compute" line, it triggers procurement questions and slows payment.

The 4 Billing Models for AI Engineering in 2026

There are four production billing models. Each has a use case and a specific invoice format.

ModelUse caseInvoice format
HourlyExploration, eval debugging, on-callLine per task with hours and rate
Fixed-bid projectLocked-scope deliverable (e.g., RAG MVP)One project line + change-order lines + compute pass-through
Milestone billingMulti-stage delivery work over $10KOne invoice per milestone with milestone fee + per-milestone compute
Retainer (monthly)Ongoing dev + maintenance + on-callFixed monthly retainer line + overage hours line

Most senior AI engineering engagements in 2026 are hybrids: fixed-bid milestones for the planned scope plus an explicit hourly rate for change requests outside the bid plus pass-through compute on top. Per Second Talent's 2026 freelance ML engineer hourly rate, the senior median hourly anchor is 185 dollars and specialists in distributed training and MLOps charge 275-450 dollars; per Second Talent's 2026 freelance LLM developer hourly rate, LLM specialists earn 75-700 dollars with senior median 210 dollars and fine-tuning/RLHF specialists 350-700 dollars. Set your change-request hourly rate at the same number you would charge for a standalone hourly engagement, not below.

Sample Hourly Invoice (RAG Eval Debugging)

The hourly format works for exploration, eval debugging, prompt iteration, and incident response. Each task gets its own line.

DescriptionHoursRateAmount
Eval failure diagnosis (citation-grounding regression)4.5$185$832.50
Retrieval pipeline tuning (chunk size + reranker config)6.0$185$1,110
New eval set construction + baseline scoring5.0$185$925
Production cutover monitoring2.0$185$370
Subtotal (engineering)--$3,237.50
LLM inference cost pass-through (Anthropic Claude API, period dashboard attached)--$187.40
Embedding cost pass-through (Cohere embeddings, period dashboard attached)--$42.10
Total due--$3,467

Notes on the format:

  • Each engineering line names the actual deliverable, not "consulting" or "engineering work" generically. Per Plutio, vague line items are the number-one cause of payment disputes.
  • Compute pass-through gets its own subtotal block below the engineering subtotal so the client can see what is your time vs what is their cloud spend.
  • Attach the provider dashboard screenshots as PDF backup. Most disputes on pass-through happen because the client doesn't believe the cost figure; the screenshot ends the question.

Sample Milestone Invoice (RAG MVP, 33/33/34 Split)

For a $30,000 RAG MVP project split across three milestones, each milestone produces one invoice. The split is 33% / 33% / 34% per Plutio's 2026 invoice payment terms guide, which recommends 3-stage milestone splits for projects in the $10,000-$50,000 range. This is the second of three invoices.

DescriptionAmount
Milestone 2 fee - Retrieval relevance threshold passed (per Section 4.2 of MSA)$9,900
Approved change order CR-002: legal-doc OCR pipeline added$2,400
LLM inference cost pass-through (training + eval runs, Anthropic dashboard attached)$612
Embedding cost pass-through (initial corpus + delta indexing)$148
Vector DB hosting (Pinecone, March 2026, dashboard attached)$89
GPU hours pass-through (eval batch runs, Modal dashboard attached)$76
Subtotal$13,225
Less: Deposit credit (25% of total project value applied to milestones 1+2)-$3,500
Total due$9,725
Payment terms: Net 15. Late fee 1.5% per month after due date.-

Notes on the format:

  • The milestone fee references the contract section number that defines the deliverable. This makes the invoice self-documenting against the MSA and reduces "what did I sign up for again?" disputes.
  • Approved change orders get their own line with the CR number. Never bury a change order inside the milestone fee; it triggers procurement audits.
  • Each compute pass-through line names the provider AND notes that the dashboard is attached. Mature buyers will not reimburse pass-through without backup.
  • Deposit credit is applied as a negative line. The cleanest pattern is to apply 50% of the deposit to milestone 1 and 50% to milestone 2 (so milestone 3 is paid in full with no deposit credit and the client gets the final delivery before the final dollar moves).

Sample Retainer Invoice (Standard AI Maintenance + Dev)

Retainers became standard in 2026 for production AI work where the engineer is on-call for incidents and continuous improvement. The retainer covers a defined hour bank; overages bill at the standard hourly rate.

DescriptionAmount
Monthly retainer - Standard tier (40 hours included, $9,000)$9,000
Overage hours (3 hours @ $185/hr)$555
LLM inference cost pass-through (March 2026 production usage, dashboard attached)$1,247
Embedding cost pass-through (continuous indexing for new docs)$214
Vector DB hosting (Pinecone, March 2026)$89
Total due$11,105
Payment terms: Net 15. Auto-renews monthly unless cancelled by 15th of prior month.-

Notes:

  • Retainer fee is constant month over month. Overages are the only variable on the engineering side.
  • For specialist retainers (LLM/MLOps/safety), the same structure applies but the retainer fee scales: $15K-$40K/month for specialist on-retainer arrangements with safety, MLOps, or LLM evaluation expertise on-call for incident response.
  • Cancellation language matters. The "by 15th of prior month" clause prevents a client from cancelling on the 30th and avoiding the next month's invoice; this is the AI-engineering equivalent of the standard SaaS cancellation cutoff.

Compute Pass-Through Line Items (Reference)

Cost componentTypical billing unitWhere to source the dashboard for backup
LLM inference (input + output)Per million tokens at provider rateOpenAI, Anthropic, Google, AWS Bedrock dashboards
Embedding generationPer million tokensCohere, OpenAI, Voyage AI dashboards
Vector DB storage + queriesPer pod-hour or per queryPinecone, Weaviate Cloud, pgvector hosting
GPU hours (training/fine-tune)Per GPU-hourModal, RunPod, Lambda Labs, AWS SageMaker, GCP
Cloud egress (model serving)Per GB egressAWS, GCP, Azure billing dashboards
Monitoring / observabilityPer signal volume or event countLangfuse, Helicone, Datadog AI integrations

For current per-token rates by model, link your client to OpenAI API pricing directly rather than quoting a number that will be stale within a quarter. Per Finout, there can be 30x-200x cost variance between an unoptimized AI deployment and a well-optimized one, so the per-token rate is the floor; the architecture decisions are where the real money is. That is one reason mature buyers value engineers who can show pass-through line items decomposed by component (it signals you understand where their money goes).

A useful framing from Finout: 80 percent or more of the true cost in agentic deployments lies in hidden areas (compliance, dev hours, maintenance) rather than direct cloud bills. Your invoice's engineering hours line is part of that 80 percent. Naming it explicitly (rather than letting the client compare your engineering hours line to a much smaller compute pass-through line and feel sticker shock) helps frame the value.

Deposit Schedule by Project Size

Per Plutio's 2026 invoice payment terms guide, deposit norms scale with project size and client trust.

Project sizeNew client depositEstablished client depositNotes
Under $2,00050% (with Due on Receipt or Net 7 terms)25%Plutio explicit recommendation
$2,000-$10,00050% with milestones25% with milestonesPlutio: 25-50% range with milestone billing
Over $10,00025% with milestones25% with milestonesPlutio: 25% deposit recommended for large projects

For AI engineering specifically, the deposit also covers the cost of standing up your dev environment (provider accounts, eval infrastructure, monitoring tooling) before client revenue starts flowing. Without a deposit, you absorb the setup cost AND front the early compute spend on your card. Both add up fast on a non-trivial AI project.

Late Fee + Kill Fee Clauses

Two clauses that should appear on every AI engineering invoice or attached MSA:

  1. Late fee. 1.5 percent per month on overdue balances per SolidGigs' 2026 freelance payment terms guide, or the maximum allowed by your local laws if higher. Activates on the day after the due date; compounds monthly. The clause exists less for revenue and more as a behavioral lever; clients pay your invoices first when there is a real cost to delay.
  2. Kill fee. 25 percent of remaining unpaid project value due within 14 days of termination per Plutio's 2026 invoice payment terms guide. The kill fee compensates for lost opportunity cost and ramping-up time when a client cancels mid-engagement. For AI projects this matters more than typical software work because AI engagements often involve provider accounts, eval infrastructure, and tooling investments that don't carry over to the next client.

The practical phrasing for the kill-fee clause: "If Client terminates this agreement after work has begun, a kill fee of 25 percent of the remaining unpaid project value is due within 14 calendar days of termination."

What This Means for Sending Your Next AI Engineer Invoice

Three takeaways for an AI engineer about to send the next invoice:

  1. Decompose pass-through into the components mature buyers expect. LLM inference, embedding, vector DB, GPU hours as separate lines. Mirror the FinOps cost-decomposition framework. Attach provider dashboards as backup.
  2. Tie milestone fees to acceptance criteria, not calendar dates. Calendar-date milestones invite disputes when work legitimately slips. Acceptance-criteria milestones (eval threshold met, retrieval relevance passed, production cutover live) are auditable and defensible.
  3. Build deposit + late fee + kill fee into the contract template once, then never argue about them again per project. The financial mechanisms that make the invoice enforceable belong in the MSA. The invoice references them; it doesn't relitigate them.

The companion rate research that justifies the hourly numbers is in AI engineer freelance rates 2026. The proposal format that locks scope before this invoice goes out is in AI engineer proposal that wins. The general invoice fundamentals are in how to write a freelance invoice. The retainer-vs-hourly framing for related consulting work is in consultant invoice retainer hourly value. The web-development comparison invoice is in web developer invoice 2026. The deeper payment-terms playbook is in freelance payment terms and the late-paying-client playbook is in late-paying clients.

To send this invoice without rebuilding the line-item structure each time, use FreelanceDesk's invoice generator which preserves the compute pass-through structure as a saved template.

References

  1. Plutio: Invoice Payment Terms for Freelancers 2026
  2. LedgerUp: Net 15 Payment Terms
  3. SolidGigs: Freelance Payment Terms to Get Paid
  4. Finout: FinOps in the Age of AI — LLM Workflows, RAG, AI Agents, Agentic Systems
  5. OpenAI API Pricing
  6. Second Talent: Freelance ML Engineer Hourly Rate US 2026
  7. Second Talent: Freelance LLM Developer Hourly Rate US 2026

Frequently Asked Questions

Tired of recreating documents from scratch?

Save clients, templates, and brand kit in one place. $49 once. Your data never leaves your browser.

Get 45 Templates + Unlimited Docs for $49