Human experts and the AI itself are now equally unable to tell a real receipt from a forged one.
April 29, 2026
Original Paper
When the Forger Is the Judge: GPT-Image-2 Cannot Recognize Its Own Faked Documents
arXiv · 2604.25213
The Takeaway
Forged documents created by GPT-Image-2 have reached a level of fidelity that breaks human visual verification. In testing, human accuracy in identifying fake receipts fell to 50%, which is exactly the same as guessing at random. Even the model that generated the forgeries could not reliably distinguish its own creations from authentic documents. This collapse of detection capability suggests that visual evidence is no longer a viable way to verify financial or legal records. The industry must shift toward cryptographic signatures and watermarking to maintain the integrity of digital documentation.
From the abstract
OpenAI's GPT-Image-2 has effectively erased the visual boundary between authentic and AI-edited document images: a single number on a receipt can be replaced in under a second for a few cents. We release AIForge-Doc v2, a paired dataset of 3,066 GPT-Image-2 document forgeries with pixel-precise masks in DocTamper-compatible format, and benchmark four lines of defence: human inspectors (N=120, n=365 pair-votes via the public 2AFC sitethis http URL), TruFor (generic forensic), DocTamper (qcf-568,