AI Due Diligence Checklist for M&A | Phoenix AI - Phoenix AI Solutions

The $2.3M AI Acquisition That Became a Write-Off

Every AI acquisition needs a comprehensive AI due diligence checklist. Without one, you risk buying technical debt disguised as innovation. Here's why:

In 2024, a UK mid-market financial services firm acquired a "cutting-edge AI analytics platform" for $2.3M. The vendor had impressive demos, blue-chip logos on their site, and revenue projections showing 10x growth.

Six months post-acquisition, they discovered:

The "AI model" was a wrapper around OpenAI's API with no proprietary technology
The codebase had zero tests, no documentation, and was held together with duct tape
Key data pipelines broke weekly, requiring constant firefighting
The entire system relied on a single developer who left after acquisition
Cloud costs were 4x projections due to inefficient architecture
GDPR compliance documentation was entirely fabricated

Total write-off. The acquirer spent another $800k trying to salvage something usable, then shut it down.

This wasn't bad luck. It was bad due diligence.

Most mid-market buyers approach AI acquisitions with the same checklist they'd use for traditional SaaS companies. Revenue multiples, customer retention, team size. But AI systems hide technical debt and operational risk that won't show up in financial statements until it's too late.

This guide gives you a practical AI due diligence framework — the specific questions to ask, technical assessments to run, and red flags that predict disaster. Whether you're evaluating an acquisition target, vetting an AI vendor, or assessing a technology partner, this checklist separates real AI capabilities from expensive snake oil.

Why AI Due Diligence Is Different from Traditional Tech Assessment

Traditional software due diligence focuses on code quality, scalability, and infrastructure. AI systems have all those concerns plus three additional failure modes:

1. Model performance degrades silently

Unlike traditional software that either works or throws errors, AI models can produce increasingly bad outputs without triggering alerts. A recommendation engine slowly drifting toward irrelevance. A fraud detection system with rising false positives. Degradation happens gradually until someone notices revenue is down or customer complaints spike.

2. Data infrastructure is invisible on balance sheets

The value of an AI company isn't just the model — it's the data pipelines, labelling processes, QA workflows, and retraining infrastructure. A company might have a sophisticated model but manually label training data in spreadsheets. That's not a scalable business, it's a consulting firm pretending to be a product.

3. Compliance risk is higher and harder to audit

AI systems often process personal data, make consequential decisions, and operate in regulatory grey areas. GDPR Article 22 (automated decision-making), AI Act requirements, sector-specific rules — the compliance surface area is massive. Companies often have impressive documentation that doesn't reflect what the system actually does. For comprehensive AI code review and security assessment, Phoenix Shield provides M&A technical due diligence services.

For mid-market buyers, these hidden risks turn acquisitions into liabilities. You think you're buying AI capability. You're actually buying technical debt, operational fragility, and regulatory exposure. For independent technical validation of AI vendor claims and code quality assessment, Phoenix Shield provides expert due diligence services for M&A and vendor evaluation.

The solution: due diligence that goes deeper than demos and deck reviews.

5 Core Areas to Evaluate in AI Due Diligence

1. Codebase Quality & Maintainability

What you're assessing: Can this system be maintained, extended, and debugged by someone other than the original developer?

Most AI startups prioritize shipping fast over building maintainable systems. That's fine for a prototype. It's a disaster for an acquisition where you need to integrate, scale, and support production systems.

What to look for:

Version control hygiene: Are they using Git properly? Check commit history for meaningful messages (not "fixed stuff" or "test123"). Look for feature branches, not cowboy commits to main. If there's no branching strategy, expect chaos.
Testing infrastructure: What percentage of code has automated tests? Look for unit tests (functions work correctly), integration tests (systems work together), and end-to-end tests (user workflows work). No tests = every change risks breaking production.
Documentation quality: Can a new developer understand the system in a week? Look for architecture docs, API documentation, runbooks for common issues, and decision logs explaining why things were built this way. If knowledge lives in one person's head, you're buying a single point of failure.
Dependency management: How many external libraries are they using? Are dependencies pinned to specific versions or using wildcards that could break on updates? Check for deprecated libraries or unmaintained packages. Each dependency is a potential security vulnerability or maintenance burden.
Code complexity: Use static analysis tools to measure cyclomatic complexity (how many paths through the code), code duplication, and function length. High complexity means bugs hide easily and changes take longer.

Red flags:

Git history shows a single author for 90%+ of commits (bus factor of 1)
Last significant refactoring was over a year ago (accumulating technical debt)
Comments in code apologizing for hacks or saying "TODO: fix this properly"
Different parts of the system use different languages/frameworks for no clear reason

Questions to ask:

"Walk me through your testing strategy. What's your test coverage percentage?"
"If a critical bug hit production at 3am, what's the runbook? Who can fix it?"
"Show me your most complex module. Why is it built this way?"
"What technical debt are you aware of and planning to address?"

Deep dive: Request a code review session with their senior engineer. Ask them to explain a recent complex change. Watch for whether they can navigate the codebase confidently or need to grep around searching for things.

For organisations evaluating whether to bring in external expertise for this assessment, automated code quality analysis tools specifically designed for AI systems can significantly accelerate the process.

2. Data Infrastructure & Pipelines

What you're assessing: How do they collect, clean, store, and use data? Is this a scalable system or manual spreadsheet wrangling?

Data is the moat for AI companies. But "we have proprietary data" means nothing if that data lives in someone's laptop, gets updated manually, or has no quality control.

What to look for:

Data sources & collection: Where does training data come from? Is it scraped from the web (legal risk), user-generated (privacy risk), purchased from vendors (license terms?), or internally generated? How much manual intervention is required to collect new data?
Data quality processes: How do they handle missing values, outliers, duplicates, and errors? Is there automated validation or does someone eyeball CSV files? Check for data profiling (summary statistics over time), anomaly detection (catch unexpected changes), and versioning (can they reproduce old results?).
Labelling & annotation workflows: If the model requires labeled data, how is labelling done? In-house team? Crowdsourcing platform? Subject matter experts? What's the inter-rater agreement (do labellers agree on labels)? How do they handle disagreements?
Pipeline monitoring: Do they track data drift (distribution of inputs changing over time), schema changes (new fields appearing, old ones disappearing), or latency (how long data takes to flow through the system)? No monitoring means failures are discovered by customers, not engineering.
Data storage & access: Where is data stored? Who has access? How is sensitive data protected? Check for encryption at rest, access logs, and data retention policies. Many AI companies have frighteningly open data access with no audit trail.

Red flags:

Training data refresh cycle measured in months (model can't adapt to changing conditions)
Data scientists manually downloading CSVs and uploading processed files (no pipeline automation)
No process for removing data if a customer requests deletion (GDPR violation waiting to happen)
"We have a data lake" but no data catalog or schema documentation (swamp, not lake)

Questions to ask:

"Show me your data pipeline from raw input to model training. What's automated vs manual?"
"How do you ensure data quality? Walk me through your validation process."
"What happens if a data source becomes unavailable or changes format?"
"How long does it take to retrain the model with fresh data?"

Deep dive: Ask for a data lineage diagram showing how data flows through their system. If they can't produce one, they don't understand their own infrastructure.

3. Model Performance & Validation

What you're assessing: Does the model actually work? How do you know? What happens when it's wrong?

Impressive demo accuracy means nothing if it's measured on cherry-picked test data or doesn't reflect real-world conditions. Many AI vendors optimize for demo wow factor over production reliability.

What to look for:

Performance metrics: What metrics do they track? For classification (accuracy, precision, recall, F1 score), regression (MAE, RMSE, R²), or ranking (NDCG, MAP). But more importantly: do they track business metrics? A model with 95% accuracy that causes customers to churn is a bad model.
Train/test/validation split: How do they prevent overfitting? Look for proper separation between training data (used to build the model), validation data (used to tune it), and test data (used to evaluate it). If they use the same data for all three, the numbers are meaningless.
Real-world performance tracking: What's the gap between offline metrics (lab testing) and online metrics (production)? A 2% gap is normal. A 20% gap means the model doesn't work in the wild.
Edge case handling: What happens with inputs outside the training distribution? Do they fail gracefully or produce nonsense? Test with adversarial examples, unusual inputs, and edge cases the model hasn't seen.
Model versioning & rollback: Can they deploy a new model and roll back if it performs worse? Do they do A/B testing or canary deployments? Or do they YOLO push to production and hope?

Red flags:

They only quote accuracy without precision/recall (hiding class imbalance issues)
Test set performance is suspiciously close to training performance (overfitting or data leakage)
No baseline comparison (how much better is the AI than simple rules or human performance?)
They can't explain when the model fails or what inputs cause problems

Questions to ask:

"What's your model's performance on production data vs your test set?"
"Show me examples where the model fails. Why does it fail in those cases?"
"How often do you retrain? What triggers a retrain?"
"If I send you adversarial inputs, will your model break?"

Deep dive: Request a week of production predictions with ground truth labels (if available). Calculate metrics yourself. Don't trust their reported numbers.

4. Team Capabilities & Knowledge Transfer

What you're assessing: Can you operate this system after acquisition? Or are you buying a person pretending to be a product?

The nightmare scenario: acquire an AI company, the key engineer leaves, and you discover the entire system was black magic that only they understood. Six months later you're rebuilding from scratch.

What to look for:

Team composition: How many people are researchers vs engineers vs DevOps vs data labellers? A team of 5 PhDs and no engineers can build impressive models but not production systems. A team of engineers with no domain experts can build robust systems that solve the wrong problem.
Knowledge distribution: Ask three different team members to explain how the system works. Do they give consistent answers? Or does everyone have a different mental model? Inconsistency means knowledge lives in silos.
Retention risk: Who are the linchpins? If one person left tomorrow, what breaks? Check LinkedIn for tenure (team that's been together for years vs recent hires), glassdoor for culture red flags, and vesting schedules (what incentive do they have to stay post-acquisition?).
Documentation culture: Not just "do docs exist" but "does the team actually use them?" Check last-updated dates on wiki pages, whether runbooks match current reality, and if troubleshooting guides exist for common issues.
Training & onboarding process: How long does it take to onboard a new engineer? If they say "2-3 weeks," that's a green flag (they have systems). If they say "6 months," you're inheriting complexity. If they say "we haven't hired recently," you're buying risk.

Red flags:

Founder/CTO is the only person who understands the model architecture
Team members can't explain why certain design decisions were made
High engineer turnover (check LinkedIn for people who left in the last year)
"We're planning to document this" means it's not documented and won't be

Questions to ask:

"If your top engineer left tomorrow, who could maintain the system?"
"Walk me through your onboarding process for new technical hires."
"What's the worst production incident you've had? How did you handle it?"
"Can junior engineers deploy to production or is that senior-only?"

Deep dive: Request time with the engineering team without leadership present. Ask them what frustrates them about the codebase and what they'd fix if they had time. The gaps between their wish list and current state tell you where the pain is.

For comprehensive technical due diligence support, see our guide on choosing the right AI implementation partner.

5. Compliance, Governance & Security

What you're assessing: Are they actually compliant or just pretending? What regulatory risk are you inheriting?

Many AI companies treat compliance as a checkbox exercise. They have a privacy policy copy-pasted from a template, a GDPR page that no one reads, and "AI ethics principles" that aren't reflected in the actual system. Then the regulator comes knocking.

What to look for:

Data processing documentation: For GDPR compliance, they need Records of Processing Activities (ROPAs) documenting what data they collect, why, how long they keep it, and who has access. If they can't produce this in 5 minutes, they don't have it.
Model explainability & auditability: Can they explain why the model made a specific decision? This matters for GDPR Article 22 (right to explanation of automated decisions) and AI Act requirements. "The neural network decided" isn't good enough.
Bias testing & fairness audits: Have they tested for discriminatory outcomes across protected characteristics (race, gender, age, disability)? Do they have mitigation strategies? Or is this something they plan to "look into eventually?"
Security practices: How is production infrastructure secured? Check for MFA on cloud accounts, principle of least privilege (not everyone has admin access), secrets management (API keys not hardcoded in repos), and security patching cadence.
Incident response procedures: What happens if there's a data breach, model failure causing harm, or regulatory inquiry? Do they have runbooks, legal counsel on retainer, and insurance? Or will they panic and make it worse?

Red flags:

Privacy policy was last updated 3 years ago (doesn't reflect current practices)
They can't demonstrate model decision-making process (black box system)
Everyone on the engineering team has production database access (no access controls)
"We're SOC 2 Type I compliant" (that's the easy one that doesn't require sustained compliance)

Questions to ask:

"Show me your GDPR Article 30 records. How do you handle data subject access requests?"
"If a regulator asked why your model rejected someone, could you explain it?"
"Walk me through your last security audit. What did you fix?"
"Do you have cyber insurance? What does it cover?"

Deep dive: Request penetration testing reports, compliance audit findings, and legal opinions on AI Act applicability. If they haven't done this work, assume you'll need to do it post-acquisition.

Critical Red Flags That Predict Disaster

Beyond the five core areas, certain warning signs should make you walk away or demand significant price discounts. These red flags predict post-acquisition pain:

Technical Red Flags

"We plan to open-source the core model" — Translation: there's no defensible moat, the IP is worth less than they claim.

"It's built on [latest hot AI framework]" — If they rewrote everything to use the newest shiny tech in the last 6 months, you're buying bleeding-edge instability, not proven production systems.

"We're migrating from [X] to [Y]" — Mid-migration systems are in limbo. Finish the migration or walk away. You don't want to inherit half-completed rewrites.

"Model performance is improving every quarter" — Sounds good until you realize they're manually tuning it every quarter because the system can't learn automatically. Not scalable.

"We use synthetic data for training" — Synthetic data is useful for augmentation. If it's the primary training source, the model hasn't learned from reality and may not generalize.

Operational Red Flags

"Our customers love the accuracy" — But they can't show you retention metrics, usage stats, or NPS scores. "Customers love it" with no data means they're guessing.

"We're pivoting to [new use case]" — Pivots close to acquisition time suggest current positioning isn't working. What else haven't they told you?

"The technical founder is transitioning to advisory role" — If the key person is checking out before the deal closes, they know something you don't.

"We can scale to 10x volume with current infrastructure" — Ask them to prove it. Load test results, scalability analysis, or architectural diagrams showing how. "Should be fine" isn't due diligence.

Governance Red Flags

"We're compliant with all relevant regulations" — That's not how compliance works. They should name specific regulations (GDPR, AI Act, FCA rules) and show evidence of compliance programs, not vague assurances.

"The AI makes recommendations, humans make decisions" — A common dodge to avoid automated decision-making regulations. Check if "human in the loop" is meaningful review or rubber-stamping AI outputs.

"We'll provide access to training data post-acquisition" — If they don't have clear data ownership documentation now, you may not legally be able to use that data after acquisition.

"Our ethics policy ensures responsible AI" — Unless the policy has teeth (enforcement mechanisms, oversight, consequences for violations), it's PR not governance.

The AI Due Diligence Checklist: Scoring Rubric

Use this checklist to systematically assess AI acquisition targets or vendor partnerships. Score each category from 0-4:

0 = Critical failure (deal-breaker or major discount required)
1 = Major concerns (requires remediation plan)
2 = Acceptable (standard gaps, manageable risk)
3 = Good (meets best practices)
4 = Excellent (gold standard)

Codebase Quality (Weight: 20%)

Version control: Git hygiene, branching strategy, commit history quality [0-4]
Test coverage: Automated tests, CI/CD, coverage percentage >70% [0-4]
Documentation: Architecture docs, API docs, runbooks exist and are current [0-4]
Code maintainability: Low complexity, minimal duplication, clear structure [0-4]
Dependency management: Updated libraries, pinned versions, security scanning [0-4]

Subtotal: ___ / 20 points

Data Infrastructure (Weight: 25%)

Data sources: Reliable, legal, scalable collection methods [0-4]
Data quality: Automated validation, profiling, anomaly detection [0-4]
Pipeline automation: End-to-end automation, minimal manual intervention [0-4]
Monitoring & observability: Data drift detection, latency tracking, alerting [0-4]
Storage & security: Proper encryption, access controls, audit logging [0-4]

Subtotal: ___ / 20 points

Model Performance (Weight: 20%)

Metrics & validation: Proper train/test split, relevant metrics, baselines [0-4]
Production performance: Online/offline gap <5%, business metrics tracked [0-4]
Edge case handling: Graceful failures, adversarial testing completed [0-4]
Versioning & deployment: A/B testing, canary deployments, rollback capability [0-4]
Continuous improvement: Automated retraining, performance monitoring [0-4]

Subtotal: ___ / 20 points

Team & Knowledge (Weight: 15%)

Team composition: Balanced skills (research + engineering + ops) [0-4]
Knowledge distribution: Multiple people understand each system component [0-4]
Retention risk: Low turnover, reasonable vesting, positive culture signals [0-4]
Documentation culture: Docs are used and maintained, not shelf-ware [0-4]
Onboarding process: Structured onboarding, <30 days to productivity [0-4]

Subtotal: ___ / 15 points

Compliance & Security (Weight: 20%)

Data governance: GDPR ROPAs, data retention policies, deletion workflows [0-4]
Model explainability: Can explain decisions, audit trails exist [0-4]
Bias & fairness: Testing completed, mitigation strategies in place [0-4]
Security practices: MFA, least privilege, secrets management, patching [0-4]
Incident response: Runbooks exist, insurance in place, legal counsel available [0-4]

Subtotal: ___ / 20 points

Total Score: ___ / 100 points

Interpretation Guide

80-100 points: Green light. Strong technical foundation with manageable risk.

60-79 points: Yellow light. Acceptable but requires post-acquisition investment. Budget 20-30% of acquisition cost for technical cleanup.

40-59 points: Red light. Significant technical debt and operational risk. Only proceed if purchase price reflects the cost to rebuild major components.

Below 40 points: Walk away. You're buying technical liabilities, not assets. The cost to fix exceeds the value.

Additional Considerations

Deal-breakers (regardless of score):

No ability to reproduce model training process
Unclear data ownership or licensing
Key personnel planning to leave immediately post-acquisition
Active regulatory investigation or compliance violation
Critical infrastructure dependencies on founder's personal accounts

Score adjustments:

Add +10 points if they have external security audit in last 12 months
Subtract -10 points if more than 50% of code was written by people no longer with the company
Subtract -5 points for each critical production incident in last 6 months with no root cause analysis

When to Bring in External Expertise

For mid-market buyers, running thorough technical due diligence in-house is often impractical. Your technical team is busy shipping product. You need AI-specific expertise. And you need it fast — deals move quickly.

What to look for in a technical due diligence partner:

Automated code analysis: The ability to scan repositories for technical debt, security vulnerabilities, and maintainability issues in days, not weeks
Data pipeline auditing: Mapping data flows, identifying quality issues, and assessing GDPR compliance posture
Model performance verification: Independent testing of model claims, bias detection, and production-readiness assessment
Risk scoring & reporting: Executive summary with red flags, risk scores, and remediation cost estimates

You need a partner who can deliver specific, actionable findings tied to deal terms — not generic consulting reports. A typical assessment should take 2-3 weeks with a fixed scope, delivered before deal close.

Phoenix AI Solutions is developing Phoenix Shield to address this need — an automated AI system assessment tool designed for mid-market M&A timelines. Get in touch to discuss your due diligence requirements.

Summary: Due Diligence as Risk Mitigation

Most AI acquisition failures are preventable. The technical debt was there. The compliance gaps were visible. The operational fragility was obvious. Buyers just didn't look closely enough before signing.

The due diligence framework in this guide — five core assessment areas, red flag detection, and systematic scoring — turns vague "AI is hard to evaluate" anxiety into concrete, measurable criteria.

What you should do next:

Download this checklist and customize it for your specific acquisition or vendor evaluation context
Assemble your evaluation team: technical lead, data privacy counsel, domain expert from your business
Request artifacts: code repository access, production metrics, compliance documentation, incident reports
Run the assessment: score systematically, document findings, identify deal-breakers early
Use findings to negotiate: technical debt should reduce valuation or be addressed pre-close

For organisations without in-house AI expertise, consider bringing in specialised assessment support. Phoenix AI company offers technical due diligence services designed for mid-market M&A timelines. Contact us to discuss how we can help with AI due diligence for your next acquisition.

The $2.3M write-off we opened with? It could have been avoided with a 2-week technical assessment costing less than 2% of the deal value. The math is obvious. The question is whether you'll do the work before signing or after the disaster.

Related resources:

Mid-Market AI Consulting Buyers Guide — Complete framework for evaluating AI consultancies and implementation partners
How to Choose an AI Implementation Partner — Vendor evaluation criteria and 12-point scorecard
Best AI Consulting Firms in the UK — Independent comparison of leading UK AI consultancies
AI Implementation Cost Guide UK 2026 — Pricing benchmarks for AI projects and ongoing costs
Phoenix Shield Technical Due Diligence — Automated AI system assessment for M&A

Need help evaluating an AI acquisition target or vendor? Get in touch for a confidential consultation on your specific due diligence requirements.

AI Due Diligence Checklist for M&A | Phoenix AI

The $2.3M AI Acquisition That Became a Write-Off

Why AI Due Diligence Is Different from Traditional Tech Assessment

5 Core Areas to Evaluate in AI Due Diligence

1. Codebase Quality & Maintainability

2. Data Infrastructure & Pipelines

3. Model Performance & Validation

4. Team Capabilities & Knowledge Transfer

5. Compliance, Governance & Security

Critical Red Flags That Predict Disaster

Technical Red Flags

Operational Red Flags

Governance Red Flags

The AI Due Diligence Checklist: Scoring Rubric

Codebase Quality (Weight: 20%)

Data Infrastructure (Weight: 25%)

Model Performance (Weight: 20%)

Team & Knowledge (Weight: 15%)

Compliance & Security (Weight: 20%)

Interpretation Guide

Additional Considerations

When to Bring in External Expertise

Summary: Due Diligence as Risk Mitigation

Related Articles

What is Pheonix AI? (Correct: Phoenix AI Solutions)

How to Calculate Influencer Marketing ROI: Complete Guide + Calculator (2026)

AI Consulting ROI: Real Benchmarks from UK Mid-Market Implementations (2026 Data)

Interested in Phoenix Shield?