Migrating a Monolith: How to Convert 10,000 Files Without Losing Your Mind
You've made the decision. The monolith needs to move to a new language. Maybe it's a 15-year-old PHP application migrating to Python. Maybe it's a Java behemoth headed for Kotlin. Whatever the pair, you're staring at thousands of files, a dependency graph that looks like spaghetti, and a team that's never done this before.
The temptation is to treat this like any other project: estimate the work, divide it up, and start converting. That's how 10,000-file migrations fail. Here's how they succeed.
Triage: Not All Files Are Created Equal
Before converting a single file, map your dependency graph. You need to know which files depend on which, which modules are leaf nodes (no downstream dependents), and which are core utilities that everything imports.
# Generate a dependency graph for a Python project
pipdeptree --graph-output png > deps.png
# For Java, use jdeps
jdeps --dot-output deps/ -R my-app.jar
This graph determines your conversion order. Start with leaf nodes and shared utilities. Convert inward toward the core. Never convert a file before its dependencies have been converted and validated — otherwise you're building on sand.
Priority tiers:
- Leaf modules — standalone utilities, helpers, data models. No downstream dependents.
- Shared libraries — imported by many files but with few dependencies themselves.
- Business logic — the application core. Convert after its dependencies are stable.
- Entry points — main files, API handlers, CLI tools. Convert last.
Batching: Why You Can't Do Everything at Once
Converting 10,000 files in one shot sounds efficient. It's actually catastrophic. If something goes wrong in the conversion of a shared utility, the error cascades into every file that depends on it. You can't test anything because nothing is stable.
Instead, batch by dependency layer:
Batch 1: All leaf modules (maybe 2,000 files). Convert, verify, test in isolation. Fix issues. These become your stable foundation.
Batch 2: Shared libraries that depend only on Batch 1 outputs. Convert, verify, test against the now-stable leaf modules.
Batch 3-N: Work inward through the dependency graph.
Each batch should be small enough that a human can review the quality report in a sitting. If a batch produces 500 files with 95% syntax correctness, you have 25 files to fix manually. That's a manageable workload. 5,000 files at 95% means 250 files to fix. That's a week-long effort just for triage.
The Strangler Fig: Your Best Friend
Martin Fowler's Strangler Fig pattern applies perfectly here. Instead of replacing the monolith wholesale, you wrap it. New requests route to the converted modules; anything not yet converted falls through to the original.
Request → Router → [Converted Module] → Response
→ [Original Module] → Response (fallback)
This lets you convert incrementally, validate each module in production traffic, and roll back instantly if something breaks. It's slower than a big-bang cutover, but the risk profile is dramatically better.
Shopify used a variant of this approach in their modular monolith journey — they didn't change languages, but the incremental decomposition pattern is identical.
Cross-File Dependencies: The Hard Part
The single hardest problem in large-scale conversion is cross-file dependencies. A Python module that imports from 15 other modules needs all 15 to be available in the target language at the right paths with the right interfaces.
Three strategies:
Import mapping. Maintain a map from source-language imports to target-language imports. Update it as each file is converted. The conversion tool should use this map when generating import statements.
Interface stubs. For files not yet converted, generate stub interfaces in the target language. The converted code compiles against the stubs; the stubs get replaced with real implementations later.
Atomic module conversion. Convert all files within a module (directory/package) together as a unit. Don't split a module across batches. This eliminates cross-file dependency issues within the module.
Progress Tracking and Quality Gates
For a 10,000-file migration, you need dashboards, not spreadsheets.
Track per-batch: total files, conversion success rate, syntax correctness average, semantic accuracy average, files requiring manual intervention. Set quality gates: no batch moves to production until syntax correctness exceeds 95% and all critical-path files pass integration tests.
Platforms like B&G CodeFoundry handle up to 10,000 files per project at the Enterprise tier, with smart batching by token budget, automatic splitting of large files at logical boundaries, and GitHub integration that lets teams point at a repository and process it in layers.
The Mental Model
Think of a large migration as three phases:
- Mechanical translation (automatable): syntax conversion, import mapping, type translation. This is 60-70% of the total file count and the most tedious work.
- Semantic refinement (human-guided): verifying behavior, fixing edge cases, adapting patterns.
- Idiomatic polishing (human-driven): making the code feel native to the target language.
Automate phase 1. Budget human time for phases 2 and 3. And batch everything so you're never debugging 10,000 problems at once.
References: Martin Fowler, "Strangler Fig Application" (2004); Shopify's modular monolith blog series; Basecamp on incremental migration; Sam Newman, "Monolith to Microservices" (O'Reilly).