B&G CodeFoundry TeamApril 16, 20263 min read

Quality Scores Explained: How to Read an AI Code Conversion Report

You ran your project through a code conversion tool. You got back converted files, and next to each one there's a set of numbers: Syntax Correctness 94, Semantic Accuracy 87, Code Style 72. What do these actually mean? Which ones should worry you? And how do you use them to prioritize your team's review time?

What "Quality" Means in Code Conversion

Quality in code conversion isn't binary. A converted file can compile perfectly (high syntax correctness), produce wrong results (low semantic accuracy), and look nothing like code a native developer would write (low code style). All three dimensions matter, but they matter differently depending on your immediate goal.

If you need the code to run today: syntax correctness is your gating metric.

If you need the code to produce correct results: semantic accuracy is what you care about.

If you need the code to be maintainable long-term: code style determines how much refactoring your team will need to do.

Syntax Correctness: Does It Compile?

This is the most objective score. Feed the output file to the target language's compiler or parser. Count the errors. Score it.

90-100: The file compiles or parses cleanly. No immediate action needed.
70-89: Minor issues — missing imports, type mismatches, small syntax errors. Typically fixable in minutes.
Below 70: Significant structural problems. The conversion struggled with this file, possibly because of unusual source patterns or language-specific constructs without clean equivalents.

A well-designed system runs a repair loop before scoring. If the first attempt produces syntax errors, the system feeds those errors back to the LLM and tries again (typically up to two iterations). The score you see reflects the final output after repair.

Semantic Accuracy: Does It Do the Same Thing?

This is harder to measure and inherently more uncertain. The system analyzes whether the converted code preserves the original's control flow, data transformations, and API call patterns.

90-100: High confidence that the converted code preserves original behavior. Logic flow maps cleanly, operations translate directly.
70-89: Mostly preserved, but some constructs required interpretation. Review the flagged sections.
Below 70: Significant semantic gaps. The source language and target language handle the underlying concept differently enough that the automated conversion had to make assumptions.

Semantic accuracy below 70 doesn't mean the code is wrong — it means you should verify it carefully. The system is telling you "I'm not confident about this one."

Code Style: Is It Idiomatic?

A syntactically correct, semantically accurate file can still feel foreign. Java code converted to Python might use camelCase instead of snake_case, explicit getters/setters instead of properties, or verbose class hierarchies where simple functions would suffice.

90-100: The output reads like native code. A developer wouldn't guess it was converted.
70-89: Correct and functional, but with noticeable conversion artifacts. The logic is right; the style needs polishing.
Below 70: Works but reads like the source language wearing the target language's syntax. Will need significant refactoring for long-term maintenance.

Code style is the lowest-priority score for initial migration but the highest-priority for long-term codebase health.

Using Scores to Prioritize Review

With hundreds or thousands of converted files, you can't review everything equally. Use the scores to triage:

Review immediately: Files with semantic accuracy below 80. These are the ones most likely to contain behavior changes.

Review before merging: Files with syntax correctness below 90. These need fixes before they'll compile.

Review when convenient: Files with low code style but high syntax and semantic scores. They work correctly — they just need polish.

Skip (for now): Files scoring above 90 on all three dimensions. Spot-check a random sample to validate, then trust the scores.

Every conversion on B&G CodeFoundry produces these per-file quality scores, aggregated into a per-project quality report. The report becomes your review roadmap.

References: ISO/IEC 25010 software quality model; SonarQube quality gate patterns; Google's code review standards; industry practices for automated code quality assessment.