Two-Path De-identification Guide for AI-Assisted Analysis

AAIDD 2026 Annual Meeting — Doug Kerwin, VillageMetrics

This page is take-home material from a poster presentation at the AAIDD 2026 Annual Meeting in Chicago. AAIDD — the American Association on Intellectual and Developmental Disabilities — is the field's primary professional society for clinicians, researchers, and educators. The resources below extend that conference work to anyone applying AI in IDD practice. — Doug Kerwin, Founder, VillageMetrics.

The HIPAA bar is the same for everyone. A solo BCBA, a private OT, a special educator using AI on her caseload, and a 5,000-bed hospital system all operate under the same HIPAA Privacy Rule and Security Rule. There is no lighter standard for small practices. The Security Rule has a scalability clause that lets you scale how you implement controls — it does not scale the standard itself.

To use AI with patient or client material, you need one of two things:

A Business Associate Agreement (BAA) with the AI vendor, or
Data that genuinely meets HIPAA Safe Harbor — all 18 identifiers removed, plus nothing left in the narrative that could re-identify the person.

"Mostly de-identified" is not a HIPAA standard. The middle ground does not legally exist.

This guide walks you through both paths so you can choose the one that fits your workflow.

Path 1 — BAA-Covered Tools

If you sign a BAA with the vendor, you can use AI normally for treatment, operations, and quality work without per-paste de-identification. You still apply minimum-necessary judgment about what to share, but you don't carry the redaction burden each time.

Tools that offer BAAs (verify each vendor's current terms before relying on this):

Healthcare-specific AI wrappers — products built for clinical use that ship with a BAA out of the box. Examples include BastionGPT and Doximity GPT. Per-seat pricing, no minimum seat count, self-serve sign-up. This is the realistic option for solo practitioners and small practices — the only path on this list that doesn't require an institutional sales engagement.
ChatGPT Enterprise — OpenAI's institutional tier. Requires a sales-managed account, a 150-seat minimum, and an annual contract. Sign a separate Healthcare Addendum before processing PHI. The ChatGPT Team and ChatGPT Plus plans do NOT qualify for a BAA, despite running on the same underlying model.
Anthropic Claude Enterprise — Anthropic's institutional tier. Sales-assisted plan with a 20-seat minimum; BAA is click-to-accept in admin settings once you're on the plan. The Claude Free, Pro, Max, and Team plans do NOT qualify for a BAA.

Sources for the above (verified May 2026): OpenAI BAA Help Center · Anthropic HIPAA-Ready Enterprise Plans · Anthropic BAA for Commercial Customers

Questions to ask before relying on a BAA:

Is the BAA executed for your tier and workspace? Many vendors offer BAAs only at specific tiers.
Are training-on-conversation-data controls disabled by default for your account?
Does logging, retention, and breach notification meet your state's additional requirements (CA, NY, MA, TX, etc.)?
Are administrative controls — SSO, audit logs, user management — in place if you are a multi-user practice?

This is the lower-friction path. It typically costs more per seat than a consumer plan, but it removes the per-use de-identification burden and is easier to defend to a compliance officer or regulator.

Path 2 — De-identify, Then Use Consumer AI

If you choose to use a consumer AI tool, you are personally responsible for ensuring the text genuinely meets Safe Harbor before you paste. The workflow is two steps: a tool that does the bulk redaction, then a focused human review that catches what the tool can't.

Step 1 — Run the text through a local-only redaction tool

The tool removes the common personally identifiable information (PII) — names, dates, SSNs, addresses, phone, email, URLs, and the other HIPAA Safe Harbor identifiers that follow predictable patterns. That covers most of the 18, but not all — Step 2 below closes the gap. Recommended:

CamoText (camotext.ai) — Mac and Windows, fully offline. Built by a law firm; generic PII focus. Good balance of platform coverage and offline guarantee.
Lacuna (lacunaapp.com) — Mac only (Apple Silicon, macOS 14+). Uses an on-device LLM, which can catch some context-dependent identifiers that pure pattern-matching misses.

Why local-only matters. Uploading PHI to a cloud-based redaction service is itself a HIPAA-relevant disclosure — you'd be back to needing a BAA with the redaction vendor. Both tools above are explicitly designed to run with no network connection during redaction. If you want belt-and-suspenders verification, an organization can confirm with a network-monitoring tool (Little Snitch on Mac, a Windows firewall log) on first run.

What the tool reliably catches:

Names — first, last, family-member names
Geographic identifiers — addresses, cities, ZIP codes
Dates related to the individual — birth, admission, all ages 90+
Phone, fax, email, URLs, IP addresses
SSN, account numbers, license numbers
Vehicle and device identifiers

What the tool will reliably miss — your second pass is what makes the output Safe Harbor.

Step 2 — IDD-Specific Second-Pass Review

Read the tool-redacted text and look specifically for the six categories below. They are predictable, and they are the part consumer-grade tools cannot do for you.

Institution names — schools, districts, ABA agencies, day programs, group homes, specific clinic names. The tool may catch "Westbrook Elementary" if formatted predictably, but "the program he's been at since kindergarten" or "the day program on Oak Street" will pass through. Replace with role-only references: "his elementary school," "his ABA agency," "the day program."
Therapist, teacher, paraprofessional first names that survived the tool. Tools catch most "Jane Smith" patterns but miss embedded references like "when Jenny works with him" or "Ms. Tara on Tuesdays." Replace with role: "his BCBA," "his RBT," "his classroom aide."
Rare diagnoses (especially combined with regional context). Anything below ~1:10,000 prevalence can re-identify on its own — Smith-Magenis, Phelan-McDermid, Pitt-Hopkins, Cri-du-chat, Angelman, ALG13-CDG, etc. Generalize to "a rare genetic syndrome." If the specific diagnosis is load-bearing for the analysis, that's a signal you should be on Path 1, not Path 2.
Family-structure descriptions. "Twin brother at the same school," "older sister with the same diagnosis," "lives with grandmother who is a retired teacher" — these can re-identify even after every name has been removed. Generalize ("a sibling," "another family member") or omit if not load-bearing.
Distinctive incidents — date+venue combinations, anything that hit local news, anything widely discussed in your professional network. "The IEP meeting at the district office last Tuesday" is identifying even after all names are gone.
Town, neighborhood, or program names below the city level. Tools often redact "Boston" but leave "the Northside campus" or "the Davis Square clinic." If a colleague who knows your region could place it, replace it with a generic descriptor.

The Final Test. If a colleague who knows your caseload could read your tool-redacted text plus your second-pass edits and still guess who it's about, it is not yet de-identified. This is the "no actual knowledge of remaining re-identification risk" prong of Safe Harbor — and it is your responsibility, not the tool's.

Pre-Flight Check (Both Paths)

Run before every paste, regardless of which path:

Conversation context. If you mentioned identifying details earlier in the same chat, removing them from your next message doesn't help. Start a fresh conversation.
Account configuration. Confirm the AI tool you are using is the BAA-covered tier — not your personal consumer account that happens to be open in another tab.
Training data controls. On paid plans, confirm your conversations are not used for vendor training. (Most enterprise tiers default this off; verify per vendor.)
File metadata. When uploading documents, check that filenames, document properties, and embedded comments do not carry identifiers.
Photos, video, audio. Never paste these into consumer tools regardless of redaction — biometrics are HIPAA identifiers and tools cannot reliably scrub images.

When Neither Path Is Enough

Some workflows shouldn't go through either consumer or general-purpose BAA-covered AI:

Real-time clinical documentation tied to identifiable records — use your EHR's native AI features if available, with full BAA coverage
Aggregate analysis across multiple identified clients — this is database-and-pipeline work, not chat-prompt work
Anything saved back into the medical or educational record for clinical decision-making
Anything used for billing, treatment authorization, or formal clinical documentation

These need purpose-built, BAA-covered, audit-logged infrastructure — not chat prompts in a general-purpose tool, even an enterprise one.

A Note for Special Educators

Not everyone in IDD work is a HIPAA-covered entity. School-employed special educators are typically covered by FERPA (Family Educational Rights and Privacy Act), not HIPAA — different law, similar instincts but different specifics. BCBAs in private practice usually are HIPAA-covered. If you're in a school setting, the same two-path framing still works in principle (district-approved AI tool with appropriate data agreement vs. de-identified text in a consumer tool), but the legal authorities and specific requirements differ. Check with your district's privacy officer.

Reference: The 18 HIPAA Safe Harbor Identifiers

For completeness — these are the 18 categories Safe Harbor requires you to remove. A redaction tool should catch most of these reliably; the second-pass review handles the rest plus the IDD soft identifiers above.

Names
Geographic subdivisions smaller than a state
All elements of dates related to the individual; ages 90+
Phone numbers
Fax numbers
Email addresses
Social Security numbers
Medical record numbers
Health plan beneficiary numbers
Account numbers
Certificate / license numbers
Vehicle identifiers and serial numbers
Device identifiers and serial numbers
Web URLs
IP addresses
Biometric identifiers
Full-face photographs and comparable images
Any other unique identifying number, characteristic, or code — the catch-all that captures most IDD soft identifiers when they re-identify

— Doug Kerwin · doug@villagemetrics.com · villagemetrics.com