Should we build AI in-house or use a partner?

Both are valid. Build in-house when AI is genuinely your product, you can attract senior AI engineers, and the timeline can absorb a hiring cycle. Use a partner when you need to ship in quarters not years, when the first workflow is an experiment, or when you want to de-risk the build before committing to a full team.

How do I tell a real AI software partner from a demo shop?

Ask to see production systems they've shipped — not just videos. Ask who owns evaluation. Ask how they handle cost, observability, and failure modes. A demo shop sells the happy path. An engineering team can talk comfortably about what breaks and how they fix it.

What should the first engagement look like?

Small, time-boxed, and tied to a measurable outcome. A 4–8 week pilot on one concrete workflow, with a clear success metric and a clean handoff plan, tells you more about the partner than any proposal could.

Who owns the code and the data?

You do. In every serious engagement we've run, the client owns the repository, the data, the vector indexes, and the model deployments. If a vendor wants to hold the keys, walk away.

Do we need to commit to a specific model or provider?

Ideally no. Build so you can swap OpenAI, Anthropic, Azure, Bedrock, or a self-hosted open-source model with a config change. Tying product logic to one provider is a risk you rarely need to take.

How to Choose an AI Software Development Partner

Section

The stakes

Picking an AI software development partner is not like picking a generic dev shop. The failure modes are different. A web-app build that's 30% wrong is annoying. An AI build that's 30% wrong is a confidently incorrect system making decisions your business will trust without thinking.

The difference between a partner who ships production AI and a partner who ships an impressive demo is largely invisible in a sales cycle. This post is the checklist we'd want a prospect to use on us — honest, specific, and quick enough to run in an afternoon.

Section

Technical signals that matter

When you're vetting a partner, look past the marketing and ask to see:

Real production systems. Not pre-recorded demos, not dashboards with fake numbers — actual apps running with actual users. If they have NDAs, that's fair, but they should be able to talk specifically about architecture, traffic, cost per request, and incidents they've handled.

Their take on evaluation. If they can't describe how they measure whether an AI feature is working, you will find out the hard way that it isn't. The good answer involves a golden set, automated runs on every change, and dashboards the team actually watches. The bad answer is "we tested it manually."

Observability and cost controls. Ask how they detect a regression in production, how they track per-user and per-feature cost, and what happens when a model is slow or returning garbage. "We'd add logging" is not a plan; "we use this stack, here's a sanitised screenshot" is.

Comfort with model-provider neutrality. A serious partner will have shipped across OpenAI, Anthropic, Azure, Bedrock, and often a self-hosted open-source model. They'll have opinions about which fits which use case. A partner who only knows one provider is a risk.

Retrieval and RAG maturity. Ask how they chunk documents, handle OCR, combine vector and keyword search, and evaluate retrieval relevance. If they say "we just use OpenAI's file upload," they haven't shipped anything non-trivial.

Agentic systems in production. Not "we've built an agent in a notebook" — "we run agents that take actions for real users, with these guardrails, at this volume." Agentic workflows are where the majority of business value is, and where the most people have been burned.

Section

Business signals that matter

Technical skill is necessary but not sufficient. You also need a team that works the way your organisation works.

They push back. A partner who agrees with every spec you send is a partner who will happily build the wrong thing. You want someone who reads your brief and comes back with "we'd scope this smaller, here's why" or "this constraint doesn't make sense, can we revisit?".

Clear pricing and scope. Fixed-price for well-defined phases, time-and-materials for exploratory work, and an explicit statement of what "done" means. Anything vaguer is a budget overrun waiting to happen.

Honest about what they don't do. Every team has gaps — too small to run a 24/7 on-call, no mobile team, no designer on staff. Partners who pretend to do everything are less reliable than partners who name their boundaries.

Client references you can actually call. Not logos on a website — people. A 20-minute call with a past client will teach you more than ten proposals.

Communication cadence. Ask how they run projects: weekly demos, written updates, shared task boards, Slack access. If the pitch is "we'll send you an invoice every month," that's not a partnership, it's a black box.

Section

Red flags

A short list of things that have, in our experience, correlated with projects going wrong:

Vendor lock-in by default. Code they host, models they host, no access to your own vectors or logs. This is a power play, not a service.
"Just trust the model." Teams that don't want to talk about failure cases are teams that haven't hit them yet.
AI buzzword bingo. If every sentence has "autonomous," "transformative," and "revolutionary" without a single concrete number, you're looking at a demo shop.
No senior engineer in the conversation. If the sales team can't bring a hands-on engineer to a technical call, that's who you'll actually be working with.
Unwillingness to do a paid pilot. A good partner will scope a small, fixed-price piece before asking for a multi-quarter commitment.

Section

Good questions to ask before signing

Copy these into the next vendor call:

"Show me the repo structure of a production AI app you've shipped."
"What does your evaluation pipeline look like? Can I see a sanitised eval report?"
"What's the biggest incident you've had in an AI system, and what did you change?"
"Walk me through how you'd handle our data and access control, step by step."
"What would you cut from our scope if the budget were halved?"
"Who will be doing the actual work? Can I meet them?"
"What's your handoff plan when the engagement ends?"

The answers tell you more than any deck.

Section

The right first engagement

Assume you find three candidates that look strong. The best way to compare them is a small, paid pilot — not a bake-off and not a long proposal cycle.

A good pilot is:

One workflow. A real one, with real stakeholders, not a toy.
Four to eight weeks. Long enough to produce something usable, short enough that failure is affordable.
Success criteria on paper. "The agent handles 60% of billing emails end-to-end with zero incorrect auto-sends." Not "an AI thing that's cool."
Clean handoff. At the end, you own the code, the docs, the eval set, and the deployment. Even if you continue with the partner, you could walk away.

We run engagements exactly like this, and we encourage clients to run the pilot with at least one other shop in parallel if they're uncertain. The comparison is cheaper than a bad multi-year commitment.

Section

When to build in-house instead

Sometimes a partner is the wrong answer. Build the team yourself if:

AI is your product, not a feature — your company's moat depends on getting better at it than competitors.
You can credibly hire senior AI engineers (hard in 2025-26, but possible in some markets).
Your timeline can absorb 6–12 months of team assembly before shipping.
You have the discipline to run an in-house ML ops practice long term.

If three of those aren't true, a partner gets you to production faster, and you can in-house later once the pattern is proven.

Section

Where to go from here

Choosing well is 80% of the project's outcome. Spend a week doing real diligence — it's a tiny investment next to the cost of a failed engagement.

If you'd like us to be one of the partners you evaluate, we'd welcome the chance. Start with our AI software development service to get a feel for how we scope work, or get in touch with the problem you want to solve.

Partner-selection FAQ

Frequently asked questions

What leaders ask us most when deciding whether to build AI in-house or with a partner.

How to Build AI Web Applications with Next.js

A pragmatic guide to building AI web applications on Next.js: architecture, streaming, auth, retrieval, evaluation, and the patterns that actually hold up in production.

Read article

Apr 23, 2026

n8n vs. Custom AI Automation: Which Should You Choose?

When does n8n (or Zapier, Make) solve your automation problem, and when do you actually need custom AI automation? A practical decision framework based on production experience.

Read article

Apr 23, 2026

RAG for Business: Turning Documents Into AI-Powered Intelligence

Retrieval-augmented generation (RAG) lets an LLM answer grounded questions over your private documents. Here's how it works, what goes wrong, and how to ship it.

Read article

How to Choose an AI Software Development PartnerA checklist you can use before signing anything

The stakes

Technical signals that matter

Business signals that matter

Red flags

Good questions to ask before signing

The right first engagement

When to build in-house instead

Where to go from here

Frequently asked questions

How to Build AI Web Applications with Next.js

n8n vs. Custom AI Automation: Which Should You Choose?

RAG for Business: Turning Documents Into AI-Powered Intelligence

Let's Build Something Amazing Together