Why ThatScanner?

Most AI tools guess. We trace. Here is how it works, in plain English, with no jargon left unexplained.

The problem with AI document analysis

Tools like ChatGPT and Claude are impressive. You can paste in a few pages of text and ask questions about it. But they have a fundamental limitation that most people do not know about: they can only look at a small amount of text at once.

Think of it like a reading window. Even the most expensive AI plans max out at roughly 150-200 pages in that window. A congressional bill can be 1,800 pages. An insurance policy can be 500. A regulatory filing can be thousands. The AI literally cannot see the whole document. It will either refuse, cut your document off partway through without telling you, or make up an analysis based on whatever fragment it did manage to read.

What does "make up" mean? AI models are trained to always sound confident, even when they do not have enough information. If you give an AI half of a contract, it will analyze that half and present its findings as if it read the whole thing. It will not tell you it missed the other half. This is called hallucination -- when an AI generates something that sounds correct but is not based on real information.

But there is a second problem that is even more dangerous: AI tools cannot cross-reference.

In any legal document, what matters is not just what a clause says -- it is how it interacts with every other clause. Section 4(a) on page 12 might grant you a right. Section 9(c) on page 87 might quietly take it away. A human lawyer catches this because they read the whole document and hold it all in their head. AI tools cannot do this. They read text like a conversation, not like a legal document.

This means AI tools miss the most dangerous things: contradictions, overrides, and cross-references between distant sections. These are exactly the things that matter most in contracts, bills, insurance policies, and regulatory filings.

How we solve it

ThatScanner is powered by a technology called Atlas. Atlas is a retrieval engine -- software that reads, indexes, and traces connections across documents of any size. Here is what that means:

Step 1: Your document is broken into pieces. When you upload a document, Atlas splits it into small, overlapping chunks. Each chunk is a few paragraphs long. The chunks overlap so that nothing falls between the cracks -- if a sentence spans two chunks, it appears in both.

Step 2: Every piece is indexed. Atlas builds a searchable index of your entire document. Every section number, every clause reference, every defined term is catalogued. This is similar to the index at the back of a textbook, but built automatically and far more detailed.

Step 3: Cross-references are traced. When Section 12(b) says "notwithstanding the provisions of Section 4(a)," Atlas does not guess what Section 4(a) says. It retrieves the exact text of Section 4(a) and includes it alongside Section 12(b) for analysis. This is called deterministic retrieval -- "deterministic" means it finds the exact right answer every time, not a best guess.

Why does "deterministic" matter? Most AI search tools use something called embeddings -- a way of finding text that is "similar" to what you are looking for. This works well for general questions, but it fails for legal and regulatory documents. "Section 4(a)" and "Section 4(b)" have nearly identical embeddings, but they could say completely different things. Atlas does not use similarity. It uses exact matching. Section 4(a) always returns Section 4(a). Not 4(b). Not something that looks close enough.

Step 4: The AI analyzes with full context. After Atlas has indexed and cross-referenced your document, it feeds each section to the AI along with every related cross-reference. The AI never has to guess what another section says because Atlas has already retrieved it. The AI's job is analysis and judgment. Atlas's job is retrieval and accuracy.

This separation is key. The AI is good at understanding language and forming opinions. It is bad at finding specific text across thousands of pages. Atlas is the opposite: it does not understand language, but it will never fail to find the exact clause you need. Together, they cover each other's weaknesses.

What this looks like in practice

Say you upload a 200-page contract. Here is what happens with a regular AI tool versus ThatScanner:

ChatGPT / Claude / Other AI

You paste in the contract. The tool can only see about 150 pages, so the last 50 pages are silently cut off.

It finds some concerning clauses but has no way to check if they are overridden or modified elsewhere in the document.

It misses an auto-renewal clause on page 187 because that page was cut off.

It presents its analysis confidently, as if it read everything.

You sign the contract thinking it was fully reviewed. It was not.

ThatScanner + Atlas

You upload the contract. Atlas indexes all 200 pages. Nothing is cut off.

Every clause reference is traced to its source. If Clause 8 modifies Clause 3, both are analyzed together.

The auto-renewal on page 187 is flagged as a red finding with the exact quote and a suggested replacement clause.

Every finding cites the exact section and page number. You can verify anything it says.

You see every risk in the contract, including the ones buried 187 pages deep.

What does Atlas actually find?

Every scan produces a graded report with specific findings. Here are real examples of the kinds of things Atlas catches across different document types:

In contracts:

Auto-renewal traps -- A clause on page 14 that automatically renews your agreement for another 3 years unless you send written notice 90 days before expiration. Most people miss the 90-day window.
Liability shifts -- Language that makes you responsible for things the other party should be covering. "Indemnify and hold harmless" sounds like standard legal boilerplate, but the scope of what you are agreeing to indemnify matters enormously.
Cost escalation -- A monthly rate of $228 sounds manageable until Atlas calculates the total over a 25-year term: $68,400. Plus a 3% annual escalator that brings the real total over $100,000.
Missing protections -- Standard clauses that should be in a contract but are not. No limitation of liability, no warranty, no data protection clause. If a protection is missing, Atlas flags it and tells you what language to request.

In congressional bills:

Buried provisions -- A defense spending bill that includes an unrelated provision on page 1,400 affecting consumer privacy. Atlas finds it because it indexes every section, not just the ones with relevant-sounding headings.
Who pays vs. who benefits -- A tax incentive that sounds good for small businesses but is structured so that only companies with over $50 million in revenue can qualify. Atlas traces the eligibility requirements across multiple sections.
Contradictions -- Section 204 grants a right. Section 1,108 takes it away with a "notwithstanding" clause. These are 900 pages apart. No human reads both in the same sitting. Atlas reads both in the same second.

In insurance policies:

Coverage gaps -- Your policy says it covers water damage, but the exclusions section (40 pages later) excludes "ground water, surface water, and water that backs up through sewers or drains." That eliminates most real-world water damage scenarios.
Claim killers -- Requirements buried in the conditions section that, if you fail to follow them (like reporting within 24 hours), void your entire claim.

Every finding includes the exact quote from the document, the section and page number, a severity rating, and -- for anything flagged as a risk -- specific replacement language you can propose. You can verify every finding against the original document.

Why AI is the right tool for this

There is a lot of hype around AI, and a lot of it is justified skepticism. AI is not good at everything. It is not a magic solution. But document analysis is one of the things AI is genuinely, measurably excellent at. Here is why.

AI is very good at:

Reading and understanding English -- It can parse complex legal language, identify obligations and rights, and explain them in plain English. This is what large language models were built to do.
Pattern recognition -- It can spot structures it has seen before. "This clause looks like an arbitration clause that waives your right to a class action" is exactly the kind of pattern matching AI excels at.
Consistency -- It applies the same level of scrutiny to page 1,800 that it applies to page 1. It does not get tired, bored, or distracted.

AI is bad at:

Finding specific information in large documents -- If you ask an AI "what does Section 4(a) say?" and Section 4(a) is not in its reading window, it will either make something up or tell you it cannot find it. This is the retrieval problem.
Keeping track of everything at once -- An AI cannot hold an 1,800-page document in its head any more than you can memorize a phone book.

This is why Atlas exists. Atlas handles what AI is bad at (retrieval) so the AI can focus on what it is good at (analysis). The AI never has to search for anything. Atlas has already found it. The AI never has to remember what Section 4(a) says. Atlas has already retrieved it and placed it right next to the section that references it.

This separation of responsibilities is what makes the system reliable. Neither Atlas nor the AI could do this alone. Together, they cover each other's weaknesses perfectly.

What humans miss

A careful human reader can get through about 50 pages per hour when reading legal text closely. That means:

4 hours

200-page contract

10 hours

500-page insurance policy

36 hours

1,800-page NDAA

Nobody reads an 1,800-page bill cover to cover. Not even the legislators who vote on it. They rely on summaries prepared by staff, lobbyists, and advocacy groups -- each with their own agenda and their own blind spots.

Even when humans do read carefully, they face problems that have nothing to do with intelligence:

Fatigue -- Attention declines after the first hour. By page 300, even experienced lawyers start skimming.
Distance -- A cross-reference between page 42 and page 1,400 requires holding information in your head across hours of reading. Humans are not built for this.
Bias -- Humans find what they are looking for. If you are reviewing a contract to check the payment terms, you might miss the non-compete clause because you were not looking for it.
Cost -- A lawyer reviewing a 200-page contract at $400/hour bills $1,600 minimum. A lobbyist briefing you on the NDAA costs $5,000+. These costs put professional document review out of reach for most people.

ThatScanner is not a replacement for a lawyer. It is a second opinion that catches things humans miss, and a first line of defense for people who cannot afford professional review. Your lawyer should absolutely still review your contract. But now you can walk into that meeting already knowing every red flag in the document, having paid $3-$5 instead of $1,600.

How Atlas handles massive documents

A 1,800-page congressional bill contains roughly 900,000 words. No AI model can look at all of that at once. So how does Atlas handle it?

Atlas uses a proprietary compression format called LoreTokens. Think of it like a ZIP file for meaning. A LoreToken captures the important information from a section of text in a fraction of the space, while preserving the details that matter for analysis: section numbers, defined terms, dollar amounts, dates, obligations, and rights.

This means Atlas can hold the structure of a 1,800-page bill in memory while the AI analyzes it section by section, with every cross-reference resolved in real time. The bill never has to fit inside the AI's reading window all at once because Atlas handles the retrieval layer separately.

Plain-English summary: The AI reads your document in pieces, but it is never confused about what it has not read yet because Atlas has already indexed the whole thing and can instantly pull up any section the AI needs. It is like having a research assistant who has read the entire document and can hand you the exact paragraph you need in under a second.

Is there a size limit?

No. Atlas does not have a maximum document size. It processes documents in pieces, so the total size does not matter. A 10-page lease and a 10,000-page regulatory archive go through the same pipeline. The only difference is time -- a bigger document takes longer to index and analyze, but nothing is skipped or cut off.

We have scanned the full 1,800-page National Defense Authorization Act (NDAA) -- one of the largest and most complex pieces of legislation produced by the U.S. Congress each year. Atlas indexed every section, traced every cross-reference, and delivered a complete analysis. The same engine handles 3-page terms of service agreements.

Most AI tools fail at around 150-200 pages because everything has to fit inside the model's reading window at once. Atlas does not work that way. The document lives in the index. The AI only sees the pieces it needs for each finding, with all relevant cross-references already attached. The reading window never overflows because Atlas manages what goes into it.

In practical terms: If you can upload it, we can scan it. The current upload limit is 20MB, which covers virtually any PDF or Word document you would encounter in business, legal, or government contexts. If you have something larger, contact us.

How the grading system works

Every scan produces a letter grade from A to F, a set of category scores, and a list of individual findings. Here is what each part means:

The letter grade (A through F):

A -- Fair, balanced document. Few or no concerns.
B -- Mostly fair with some unfavorable terms worth negotiating.
C -- Noticeably tilted against you. Several clauses need revision.
D -- Heavily one-sided. Significant risks. Negotiate hard or walk away.
F -- Predatory terms. Do not sign. If you already signed, check your cancellation rights immediately.

Finding severity (color-coded):

Red -- This clause significantly harms you. Do not sign without changing it.
Yellow -- Unfavorable or risky. Should negotiate but not necessarily a dealbreaker.
Blue -- Worth understanding. Not harmful but you should be aware of it.
Green -- This clause protects you. It is good for your interests.
Gray -- Standard boilerplate. Neutral.

Category scores (0-100): Each scanner has category scores specific to its document type. For contracts, these include Fairness Balance, Liability Exposure, Termination Rights, Financial Terms, Legal Protections, and Hidden Risks. A score of 100 means that category is fully in your favor. A score of 0 means it is entirely stacked against you. These scores give you a quick visual snapshot before you dive into the individual findings.

Every finding is verifiable. Each one includes the exact quote from your document, the section or page number where it appears, and -- for red and yellow findings -- specific replacement language you can propose. We do not give you vague warnings like "be careful with this section." We give you the exact clause, why it is a problem, and what to change it to.

Who uses this

ThatScanner is designed for anyone who needs to understand a document but does not have the time, expertise, or budget to review it manually. Some examples:

Individuals

A homeowner reviewing a solar installation contract before signing. A renter checking their lease for hidden fees. A consumer making sure their insurance policy actually covers what they think it covers. A job seeker optimizing their resume before applying.

Small businesses

A business owner reviewing a vendor agreement before committing to a 3-year term. A startup checking an investor term sheet before signing away equity. A freelancer reviewing a client contract to make sure they retain their intellectual property.

Legal and HR professionals

A law firm doing first-pass discovery on a box of documents, narrowing 500 pages down to the 20 that matter. An HR team screening 200 resumes in a morning instead of a week. A compliance officer reviewing regulatory filings before submission.

Political offices and advocacy groups

A congressional staffer tracking 40 bills per legislative session. A policy analyst who needs to understand a 1,800-page defense bill before tomorrow's vote. A citizen advocacy group that wants to know what a bill actually does before they support or oppose it.

Six of our scanners are completely free -- including resume scanning, credit reports, lease agreements, medical bills, terms of service, and HOA bylaws. We keep these free because the people who need them most are often the people who can least afford to pay for professional review.

Your documents stay private

When you paste a document into ChatGPT, it goes to OpenAI's servers. When you use Google's AI tools, it goes to Google. You are handing your contracts, financial documents, and legal filings to the largest data companies in the world.

ThatScanner processes your documents on our own infrastructure. Your files are not sent to OpenAI, Google, Anthropic, or any third party. The AI model that analyzes your document runs locally. After the analysis is complete, your document is discarded. We do not store it. We do not train on it. We do not sell it.

This is not a privacy policy checkbox. It is an architecture decision. The AI model, the Atlas engine, and your document are all in the same place at the same time. Nothing leaves.

Same engine, different domain

ThatScanner is built by the same team behind ShipItClean.com -- an AI-powered code security scanner used by software development teams.

ShipItClean uses the exact same Atlas engine to scan software codebases for security vulnerabilities. The same deterministic retrieval that traces Section 4(a) to Section 9(c) in a contract also traces a function call on line 42 to a database query on line 3,800 in source code. Same technology, different documents.

Think of Atlas as the engine and ThatScanner and ShipItClean as different cars built on the same chassis. The engine does not care whether it is reading a legal contract or a software program. It indexes, traces references, and retrieves exact matches. The scanners on top decide what to look for: risky clauses in one case, security vulnerabilities in the other.

If our technology is accurate enough to find security flaws that could compromise software used by millions of people, it is accurate enough to find the clause in your contract that could cost you thousands of dollars.

We run our own AI

Most AI products are a wrapper around someone else's technology. They send your data to OpenAI, Google, or Anthropic, get a response back, and present it as their own. This means they are paying per query, they have no control over the model, and your data passes through a third party.

We do not do that. We run our own AI models on our own hardware.

This is an important distinction for three reasons:

Privacy -- Your documents never leave our servers. There is no API call to OpenAI. No data sent to Google Cloud. The model, the retrieval engine, and your document are all running on the same machine. When the scan is done, the document is discarded.
Control -- We choose which models to run, how to configure them, and how to optimize them for document analysis specifically. Companies that rely on the OpenAI API are at the mercy of whatever model OpenAI decides to ship next. We are not.
Cost -- Running our own models means our costs are fixed infrastructure, not per-query API fees. This is how we keep six scanners completely free and keep paid scans at a fraction of what a lawyer would charge.

What does "run our own AI" actually mean? We operate physical and cloud servers with GPUs (specialized processors designed for AI workloads). The AI models are installed directly on these servers. When you upload a document, everything happens on our infrastructure -- the document is processed, analyzed, and the results are generated without any external service being contacted. It is the difference between cooking in your own kitchen and ordering delivery.

This also means we are not locked into a single AI provider's limitations. If a better open-source model is released tomorrow, we can deploy it next week. If an AI company changes their terms of service to train on customer data, it does not affect us. We control the entire stack from hardware to results page.

Proof: We scanned the entire NDAA

The National Defense Authorization Act (NDAA) for Fiscal Year 2026 is one of the largest and most complex pieces of legislation the U.S. Congress produces. It is over 1,800 pages of dense legal text covering military spending, personnel policy, cybersecurity, foreign affairs, and hundreds of other topics.

We scanned the entire thing. Every page. Every section. Every cross-reference.

1,800+

Pages scanned

Not summarized. Not sampled. Scanned.

~900,000

Words indexed by Atlas

Every section, subsection, and cross-reference

Atlas indexed the entire bill, traced cross-references across 1,800 pages, and produced a complete analysis with graded findings from multiple perspective roles -- citizen, business owner, military personnel, government official, and more. Each perspective reveals different impacts from the same legislation.

You can verify this yourself:

Download our full NDAA analysis: shipitclean.com/congress/ndaa-fy2026-results
Read the source bill on Congress.gov: S.1071 -- National Defense Authorization Act for Fiscal Year 2026

Compare our findings against the source text. Every finding cites the exact section and page. Every cross-reference can be verified. This is what deterministic retrieval looks like at scale -- not a summary, not a best guess, but a traceable, verifiable analysis of every provision in an 1,800-page bill.

An open challenge

We have a standing challenge to OpenAI, Google, Anthropic, Meta, and every other company building AI:

Take the 1,800-page NDAA.

Scan every page.

Trace every cross-reference.

Produce a graded, verifiable report.

We already did it. Your move.

ChatGPT cannot do it. Claude cannot do it. Gemini cannot do it. Not because these are bad products -- they are excellent at what they do. But their architecture was not designed for this. They are conversation tools. They are built to chat, answer questions, and generate text. They are not built to index 900,000 words, trace cross-references across 1,800 pages, and produce a deterministic, verifiable analysis.

These companies have tens of billions of dollars in funding, thousands of engineers, and the most powerful AI models ever built. We are a small team with a fraction of their resources. And we can do something they cannot.

That is not because we are smarter. It is because we built a different thing. They built conversation engines. We built a retrieval engine. Atlas does not compete with GPT or Claude -- it does something they were never designed to do. And until they build their own retrieval layer that matches Atlas, that gap will remain.

The results are public. The source bill is public. The challenge is open.

Glossary

Terms used on this page, explained simply:

Atlas -- Our retrieval engine. Software that reads, indexes, and traces connections across documents. It is the core technology behind both ThatScanner and ShipItClean.

Deterministic retrieval -- A way of finding information that gives the exact right answer every time. If you ask for Section 4(a), you get Section 4(a). Not something similar. Not a best guess. The real thing.

Hallucination -- When an AI generates information that sounds correct but is not based on real data. Like a student writing a confident book report about a book they only read half of.

LoreTokens -- Our proprietary compression format. A way of compressing documents so that Atlas can hold their entire structure in memory without losing the details that matter for analysis.

Embeddings -- A technique most AI tools use to find "similar" text. Works well for general questions, but fails when exact matches matter (like legal section numbers). Atlas does not rely on embeddings for cross-referencing.

Tokens -- The unit AI models use to measure text. One token is roughly 3/4 of a word, or about 4 characters. A typical page of text is about 500 tokens. This is how scan costs are calculated.

Cross-referencing -- The process of connecting related sections within a document. When Section 12 says "subject to Section 4," cross-referencing means pulling up Section 4 so they can be analyzed together.

See it for yourself

Upload a resume or credit report for free -- no account needed. See how Atlas analyzes a real document. Then decide if you trust it with the important ones.

Try Free Resume Scan Try Free Credit Scan