John Regehr, March 2 2026.
I didn’t take much of an interest in CCC, “Claude’s C Compiler,” at first. However, after seeing a hint of what happens when you fuzz it using Csmith and YARPGen, I got curious about how wrong this compiler actually is. The results at the Github issue—14 miscompiles out of 101 Csmith programs and 5 miscompiles out of 101 YARPGen programs—seemed pretty bad, but consistent with what I’d heard about this compiler, which is that it occupies an odd bit of territory where it’s more sophisticated than something we could get out of a compiler course project, but also that it doesn’t even rate at all on the scale where we consider production-grade artifacts like GCC and Clang/LLVM.
As a bit of background, Csmith and YARPGen are randomized compiler testing tools produced by my research group. They’re each responsible for detecting many hundreds of compiler defects, the most interesting of which are miscompilations, where a compiler silently produces output whose behavior diverges from the set of behaviors allowed by the relevant programming language standard, for some input program. YARPGen has (in effect) a built-in interpreter that allows it to predict the value that should be printed by a program that it generates. Csmith has no such functionality; for it to detect a miscompilation we use differential testing where we compare the behavior of executables generated by two compilers, or two modes of the same compiler (an optimizing and a non-optimizing compile, for example). Although I can’t prove it, I like to think that these tools (and others like them) have helped the production compilers that developers use every day become more robust and solid.
I connected YARPGen version 1 and CCC in a testing loop. As expected, CCC miscompiles a lot of inputs. Each time I found a miscompile I reduced it using C-Vise (which is mostly C-Reduce, but with the core rewritten in Python instead of Perl). It’s not really possible to deal with miscompile bugs triggered by large test cases—test case reduction shrinks miscompile triggers down to (typically) a few lines. Here an example of a program that CCC miscompiled:
int printf(const char *, ...);
unsigned long long seed;
unsigned a = 3357492005;
int b;
void hash(long long *seed, int v) { *seed ^= v; }
int main() {
b = a / (long)3;
hash(&seed, b);
printf("%llu\n", seed);
}Next—not being even a little bit of a Rust programmer—I asked Codex to fix each bug and also, of course, to add a regression test. I picked Codex (“gpt-5.3-codex high,” to be exact) not because I have any particular affinity for it, but rather it’s what my employer currently pays for, for whatever reason. Once it appeared to succeed, I went back and ran YARPGen some more. After 11 bug fixes, an overnight run of YARPGen (around 200,000 individual tests) could not get CCC to miscompile. So I moved on to Csmith, and it turns out that an overnight fuzzing run using Csmith (again, around 200,000 tests) could not get my fixed version of CCC to miscompile, either.
Here are the commits fixing the 11 bugs. Bug summaries are Codex’s.
“Bug in IR narrowing optimization for 64-bit bitwise ops. It
could incorrectly narrow an I64/U64 BinOp to the wrong 32-bit signedness
(for example, narrowing an I64 op to U32 when one operand came from
U32). That changes semantics when the value is later widened back to 64
bits, because sign-extension vs zero-extension differs, causing
miscompilation.”
https://github.com/regehr/claudes-c-compiler/commit/4d9913e7f53be66e6de30869e1a324020ce81777
“Before, -x in constant expressions always used signed-style wrapping_neg (negate_const), so unsigned results could be folded to the wrong numeric value/sign. Example: -8u should be modulo 2^N (for u32, 0xFFFF_FFF8), but old folding could behave like signed -8.” https://github.com/regehr/claudes-c-compiler/commit/32fe7f5e5fe08bb0b7bf3ee7e6bb90234356d29e
“The x86 peephole compare+branch fusion pass tried to turn cmp;
setcc; …; test %rax,%rax; je/jne into one jcc. It was too permissive
about “skippable” instructions between setcc and test. In particular, it
could still fuse even if %rax got reloaded from an unrelated stack slot
before the final test, so the branch depended on the wrong value. It
also accepted setcc to non-%al regs and overly broad movslq/movzb
patterns.”
https://github.com/regehr/claudes-c-compiler/commit/abeb8fbdce8c6f2c99557cf148efc9483b9c902a
“The narrow_cmps optimization pass narrowed 64-bit compares by
stripping widening casts too aggressively. It trusted the cast metadata
(from_ty) without verifying the cast source value actually had that
type. In cases like U16 value cast as I16 -> I64, narrowing could
drop a required sign-changing cast and turn a signed compare into
effectively unsigned behavior. It could also narrow compares to sub-int
widths (I8/I16/U8/U16), which can violate C integer-promotion semantics
for ordered comparisons.”
https://github.com/regehr/claudes-c-compiler/commit/00fbea89eb855a359eea6c2c976b0c2f2fbecd1e
“Shift-narrowing optimization in src/passes/narrow.rs (Phase 4) used to narrow I64 shift ops down to I32 too aggressively. Shl was treated as always safe, and AShr/LShr only checked sign/zero-extension of the LHS. It did not require the shift amount to be valid for the narrowed width.” https://github.com/regehr/claudes-c-compiler/commit/90905856a09bba6ab4df4aade850342078db7850
“This commit fixes a usual arithmetic conversions bug during IR
lowering for integer ops. Bug: lower_expr_with_type() used
get_expr_type(expr) as the cast source type. For integers on LP64,
get_expr_type can reflect a wider storage type (often I64) instead of
the expression’s true C semantic type (I32/U32). That caused wrong
implicit casts before arithmetic, especially signed/unsigned mixes
(here: int promoted to unsigned long before division).”
https://github.com/regehr/claudes-c-compiler/commit/c01bac0f988471855c5422cafe5d3d57e5ed2e58
“This commit fixes a sign-extension bug in explicit integer casts
during lowering (lower_cast). For integer casts, lowering used
get_expr_type(inner) (storage-oriented), which could report a widened
64-bit type instead of the true C semantic type (int/unsigned int). That
could skip or choose the wrong cast kind, so (long)(int_expr) on LP64
could behave like zero-extend/ no-op instead of sign-extend from 32-bit
int.”
https://github.com/regehr/claudes-c-compiler/commit/5b0447eabf19163c90484415d7a292df1781af66
“Unsound narrowing optimization in src/passes/narrow.rs (Phase
4). The pass could replace Cast(BinOp(…, I64), I64->T) with
BinOp(…,T) even when that dropped a required sign-changing cast. That
was especially unsafe for And, and also for some Shl cases with
signedness mismatch on the widened LHS. Result: wrong
sign/zero-extension behavior after optimization (regression output
changed).”
https://github.com/regehr/claudes-c-compiler/commit/b1c97854ffa7b9d3d5f53f93f0a089ca0b56f0f6
“This commit fixes an x86-64 cast codegen bug for U32 -> I32
(same-width unsigned-to-signed casts). In emit_generic_cast,
CastKind::UnsignedToSignedSameSize was treated as a no-op. On x86-64,
values live in 64-bit registers, so after (int)u32_value, the upper 32
bits must reflect the signed 32-bit value when used by later 64-bit
signed ops/promotions. Leaving it as no-op effectively kept/used a
zero-extended value in %rax, which can flip sign-sensitive logic.”
https://github.com/regehr/claudes-c-compiler/commit/acc1b4a5f9618d7e7d9c7e917afe7b622caf346a
“This commit fixes a bad range-analysis assumption in
div_by_const that could pick the wrong signed division rewrite. The pass
tracked whether values were “known i32” ([-2^31, 2^31-1]) to allow I64
signed division to use the faster expand_sdiv32_in_i64 path. It
incorrectly marked many unsigned 32-bit values as is_known_i32 = true.
But U32 values above INT32_MAX do not fit signed i32, so that transform
is invalid there.”
https://github.com/regehr/claudes-c-compiler/commit/ceff82eba63c2b9290370e48fac850a7a709d8f9
“This commit fixes a wrong constant-propagation rule in
cfg_simplify for Cast. CFG simplification tried to resolve branch/switch
conditions to constants. It treated Instruction::Cast as identity
(cast(x) == x) when resolving constants. That is wrong for
narrowing/signed casts (e.g. I64 -> I8), where truncation/sign
behavior changes the value.”
https://github.com/regehr/claudes-c-compiler/commit/9fe29b62241e3e08a82bbe61d752fc0660a6526c
Some of these mix in other changes, and none of them have useful commit messages—I didn’t start out intending to share this little project with anyone.
So, what have we learned here? For one thing, the bug fixing by Codex seems pretty impressive: I gave it zero guidance other than the reduced test cases and good reference compilers (GCC and LLVM). I had half expected Codex to patch CCC in clumsy ways that would lead towards chaos instead of correctness, but that wasn’t the case at all. Codex did go badly wrong in one instance where it tried to fix a poorly-reduced input that contained undefined behavior, but it was easy enough to notice this and discard its work. Are its fixes good ones in a sense other than “they seem to work?” I don’t know! I don’t care to try to understand a vibe coded C compiler.
Another thing we learned here was that, with respect to the subset of C that Csmith and YARPGen generate, CCC was within some reasonable edit distance (the 11 commits above) of giving a reasonable impression of being correct. Was that a foregone conclusion? Absolutely not. It could easily have been the case that the vibe coded compiler was irrevocably specialized for its initial testing environment, in such a way that it was architecturally incapable of compiling C code in the more general case.
Finally, let’s talk about the character of the bugs that I found using YARPGen. They are mostly the kind of mistake that one would make if one was implementing a C compiler without reading the standard closely and carefully. They’re surface-level bugs that you would simply not find in a serious compiler. I don’t think we ever found a bug like this in GCC. We did find maybe one or two like this in Clang, but this was because we started testing Clang very early in its history: the LLVM community was bringing it up at the same time we developed Csmith. In contrast with these surface-level bugs, the vast majority of the bugs discovered by random testers in production-grade compilers are in their optimizers. There is a vast semantic surface area over which defects can occur in an aggressive optimizer.
What’s the verdict about Claude’s C Compiler? At some level it is impressive. I think there are a whole lot of programmers out there who couldn’t create a compiler this capable in six months. For example, George Necula had this to say about writing a C frontend:
When I (George) started to write CIL I thought it was going to take two weeks. Exactly a year has passed since then and I am still fixing bugs in it. This gross underestimate was due to the fact that I thought parsing and making sense of C is simple. You probably think the same. What I did not expect was how many dark corners this language has, especially if you want to parse real-world programs such as those written for GCC or if you are more ambitious and you want to parse the Linux or Windows NT sources (both of these were written without any respect for the standard and with the expectation that compilers will be changed to accommodate the program).
However, on the other hand, CCC doesn’t optimize and it contained these 11 fairly basic bugs in interpreting C code, and it undoubtedly contains more of these that are outside of Csmith/YARPGen’s scope. From the point of view of people working with production compilers, CCC isn’t even a useful prototype. (If you’d like a more nuanced take, Chris’s piece from a couple weeks ago is good.) Moreover, there are a lot of C compilers available in Codex’s training corpus. I don’t know that any of them are written in Rust, but the modern LLMs seem really good at translating concepts between programming languages.
If anyone feels like continuing to fuzz my fork of CCC, it’s
here, make sure to get the yarpgen branch.