Business & AI May 2026

AI Project Estimation: Why It's Hard and How to Do It

By Bartosz K. — Published: 7 May 2026 — Updated: 15 May 2026 — 10 min read

Contents

Why AI Projects Are Hard to Estimate
Research vs. Engineering Uncertainty
The Data Risk
Scope Creep in AI Projects
Better Approaches to Estimation
The Discovery Phase
Milestone-Based Contracting
What to Budget For
Red Flags in AI Proposals

"How long will this take?" is the question every client asks and every AI practitioner dreads. Software projects are notoriously difficult to estimate. AI projects are substantially harder. The causes are structural — built into the nature of the work itself — not simply a result of inexperienced teams or poor planning. Understanding why this is the case helps both clients and practitioners build more realistic expectations and more resilient project structures.

Why AI Projects Are Hard to Estimate

Traditional software projects involve building specified functionality. The requirements may be incomplete or changing, but at each stage, the team knows what they need to build. The estimation challenge is primarily about understanding scope and translating it into work breakdown.

AI projects are fundamentally different because a core part of the work — the modelling — is exploratory. You do not know in advance whether the approach will work, how much data you will need, or what performance level is achievable until you have tried it. The project contains an irreducible research component, and research does not estimate well.

There are several distinct sources of uncertainty in AI projects:

Technical feasibility uncertainty — will the approach work at all? Will it reach the required performance threshold?
Data uncertainty — is the available data sufficient? Is it clean enough? What labelling is required and how much will it cost?
Integration uncertainty — how hard will it be to connect the AI system to existing infrastructure and workflows?
Requirement uncertainty — what does "good enough" actually mean? This is often harder to define for AI outputs than for traditional software features.

Research vs. Engineering Uncertainty

A useful distinction is between the research phase and the engineering phase of an AI project.

The research phase involves experimenting with approaches, validating that the problem is solvable with available data, and identifying an approach that meets the performance requirements. This phase is inherently unpredictable. You may solve it in a week. You may discover after a month that the data is insufficient and the approach needs to change. You may find that the originally specified performance target is not achievable and a negotiation is required.

The engineering phase — once a working approach is validated — is more predictable. Building pipelines, deploying models, and integrating with applications involves engineering challenges that are substantially more similar to traditional software. These can be estimated with reasonable confidence once the research phase has converged.

The key implication: time-boxing the research phase and staging the project is more realistic than attempting to estimate the full project upfront.

The Data Risk

Nothing delays an AI project more reliably than data problems. Data issues come in several categories:

Insufficient data. There is simply not enough labelled training data to train a model that meets the performance requirement. Collecting more data takes time and money; data augmentation and transfer learning can help but have limits.

Poor data quality. Data that exists but is incorrect, inconsistent, or missing in critical fields. Discovery of data quality issues mid-project is extremely common. The solution (cleaning, re-labelling, data quarantine) is always more time-consuming than anticipated.

Data access delays. The data exists but cannot be accessed quickly. Data governance, privacy reviews, technical extraction work, and stakeholder approvals can add weeks or months to a project timeline.

Label acquisition. Supervised learning requires labelled examples. For specialised domains (medical images, legal documents, industrial defects), labelling requires domain experts and is expensive and slow.

A data risk assessment should be one of the first activities in any AI project. The questions to answer: what data exists, where is it, can we access it, is it labelled, is it sufficient, and is it representative of the production scenario?

Scope Creep in AI Projects

AI projects are particularly vulnerable to a specific kind of scope creep: performance target inflation. A project scoped at "detect defects with 90% accuracy" gets approved. The team achieves 89%. The business decides 90% is no longer acceptable; they want 95%. This is not a trivial increment — each percentage point at high accuracy typically requires disproportionately more data, compute, and engineering effort.

Similarly, "while you're at it" additions accumulate. The system designed to classify defects gets asked to also estimate severity, localise the defect, and generate a report. Each addition is individually reasonable; collectively they represent a project several times the original scope.

The defence is to agree on performance targets and success criteria in writing before the project begins, and to treat changes to these targets as formal scope changes with timeline and budget implications.

Better Approaches to Estimation

Given the inherent uncertainty, how should AI projects be estimated and managed?

Range estimates, not point estimates. "This will take 3 months" is almost certainly wrong. "This will take 2–5 months depending on data quality and whether our initial approach converges" is honest. Present ranges and explain what determines which end of the range is more likely.

Phase-gated projects. Structure the project as a sequence of phases with go/no-go decision points. The first phase (discovery and feasibility) produces enough information to estimate the next phase accurately. This is more honest than estimating the entire project before the research has started.

Pre-commit to learning, not output. In the research phase, the deliverable is not a working model — it is a clear understanding of what is feasible and what it will take to build it. This reframes the research phase as valuable regardless of whether the answer is "we can build this" or "here is why this will not work as specified".

Time-box experiments. Allocate a fixed time (one or two weeks) to explore a specific technical approach. If it has not shown enough promise by the end of the time-box, pivot rather than extend indefinitely.

The Discovery Phase

A well-run discovery phase — typically 2–4 weeks — dramatically reduces uncertainty for the rest of the project. It should produce:

A clear problem definition with measurable success criteria
A data audit: what data exists, its quality, volume, and accessibility
A technical feasibility assessment: what approaches are candidates, what performance is realistically achievable given the data
A preliminary architecture design
A phased project plan with ranges and explicit assumptions
Identified risks and mitigation strategies

The discovery phase investment (typically a few person-weeks) pays back many times over in avoided surprises during the main project.

Milestone-Based Contracting

Fixed-price contracts for full AI projects are problematic because the contractor bears all the risk of the research uncertainty. Time-and-materials contracts put all the risk on the client. A better structure is milestone-based:

Milestone 1: Feasibility demonstrated (model beats baseline on representative data)
Milestone 2: Performance target achieved on test set
Milestone 3: System integrated and running in staging
Milestone 4: Production deployment with monitoring in place

Each milestone is associated with a budget and a go/no-go decision. The client can exit at any milestone if the project is not delivering expected value. The contractor is incentivised to reach milestones rather than bill hours.

What to Budget For

Many AI project budgets focus on model development and underestimate everything else. A realistic budget allocation for a production AI project typically looks something like:

Data collection and labelling — often 20–40% of total effort, frequently underestimated
Model development and evaluation — the part most people budget for
Data pipeline engineering — building production-grade data ingestion and processing
Integration with existing systems — APIs, databases, UIs, authentication
Serving infrastructure — deployment, scaling, monitoring setup
Testing and QA — both technical quality and business acceptance
Ongoing operations — monitoring, retraining, incident response post-launch

Red Flags in AI Proposals

When evaluating an AI project proposal from a vendor or internal team, watch for:

No discovery phase proposed. Any vendor who quotes a fixed price for a full AI project without first understanding your data is guessing.
No definition of done. If the proposal does not specify what accuracy, latency, or other metric defines success, there is no objective way to know if the project has succeeded.
No mention of monitoring or retraining. A vendor who is not planning for how the model will be maintained after deployment is not thinking about production.
Point estimates with no ranges. Precise estimates on research-heavy work without ranges suggests either overconfidence or optimism bias.
No discussion of data quality. If the proposal does not address how data quality will be assessed and what happens if it is insufficient, this is a serious omission.

Key Takeaways

AI projects are hard to estimate because they contain irreducible research uncertainty — this is structural, not a planning failure.
The research phase and engineering phase should be estimated separately; only the engineering phase can be estimated reliably upfront.
Data risk is the most common source of delay — assess data quality and accessibility as the first project activity.
Use phase-gated projects with milestone payments; avoid fixed-price contracts for the full project scope.
Budget explicitly for data labelling, integration, serving infrastructure, and ongoing operations — not just model development.