Medical AI Compliance
Deploying autonomous AI agents in healthcare is high-stakes. Unlike a chatbot that summarises emails, a medical agent might interpret lab results, flag drug interactions, or process insurance claims. A single error can lead to patient harm or massive regulatory fines.
Regulations in this sector focus on Patient Safety (FDA/MDR), Data Privacy (HIPAA/GDPR), and Algorithmic Fairness.
Below are the key regulations governing medical AI deployment — expand each one to explore a real-world scenario, understand where AI agents typically run into compliance walls, and see how ContextGate provides a compliant path forward.
Table of Contents
I. Privacy & Security
II. Medical Device
III. Clinical & Ethics
IV. Life Sciences (GxP)
I. PRIVACY & SECURITY
What the Regulation Covers
HIPAA's Privacy Rule protects all "Individually Identifiable Health Information" (PHI) — including names, dates, Medical Record Numbers (MRNs), diagnoses, and more. Its Security Rule mandates administrative, physical, and technical safeguards for any system that stores or transmits PHI. For AI deployments, two provisions are critical: the Minimum Necessary Standard (only access the data the task strictly requires) and the Business Associate Agreement (BAA) requirement (any third-party system handling PHI must sign one).
Real-World Scenario
A GP surgery deploys an AI assistant to help doctors prepare for consultations. Before each appointment, the doctor asks: "Summarise this patient's medical history and list their current medications." The agent reads from the EHR and generates a concise briefing note, saving the doctor several minutes per patient. Straightforward in concept — but deeply problematic without the right guardrails.
Where AI Agents Get Blocked
- ▸PHI transmitted to a public LLM: The agent sends the patient's full record — name, DOB, diagnosis history, medications — to a cloud-hosted LLM with no BAA. This is a reportable data breach, even if the response is never stored.
- ▸Over-fetching violates Minimum Necessary: To find the current medication list, the agent reads the entire 10-year patient record. Regulators consider this a violation even if the extra data is never used in the response.
- ▸No audit trail: If a breach occurs, HIPAA requires you to prove exactly what data was accessed, by whom, and when. A vanilla LLM integration provides none of this.
How ContextGate Helps
- ✦Redaction Firewall: ContextGate sits as a proxy between your EHR and the LLM. Before the prompt leaves your secure environment, it automatically detects and strips all 18 HIPAA identifiers — replacing "John Smith, DOB 12/03/1965, MRN 47291" with "[PATIENT_A]". The LLM reasons on anonymised tokens; ContextGate re-hydrates the final answer locally. The cloud-hosted model never sees real PHI.
- ✦Scoped Tool Access: Rather than letting the agent read the full record, ContextGate restricts it to a
get_current_medications(patient_id)tool that returns only the medication table. Minimum Necessary is enforced at the infrastructure level, not by hoping the model behaves. - ✦Immutable Access Logs: Every query, every tool call, and every data access is logged with the requesting user, timestamp, and exact payload — giving you the forensic record HIPAA requires if questions arise later.
What the Regulation Covers
HITECH dramatically strengthened HIPAA enforcement by extending liability directly to Business Associates and by mandating breach notification. If a breach exposes more than 500 records, you must notify affected patients, the Secretary of HHS, and in many cases the media — within 60 days of discovery. HITECH also introduced tiered financial penalties, with wilful neglect carrying fines up to $1.9 million per category per year. For AI systems, the critical implication is that a compromised or manipulated agent is a reportable breach event, not just a technical failure.
Real-World Scenario
A hospital deploys an AI assistant to handle appointment scheduling, follow-up reminders, and basic clinical queries. The agent has read access to patient records and can send messages on behalf of clinical coordinators. A malicious actor crafts a prompt injection — embedding hidden instructions in a patient-submitted form — that tricks the agent into exporting a bulk patient list to an external address.
Where AI Agents Get Blocked
- ▸Prompt injection as a breach vector: An LLM that can read patient data and make outbound calls is a prime target. A successful injection that exfiltrates even a small number of records triggers HITECH's mandatory notification regime.
- ▸Unrestricted data scope: If an agent has broad read access "for convenience," a single successful attack becomes a large-scale breach. An agent that can query any table is an agent that can expose any table.
- ▸No detection capability: Without monitoring on tool calls and output payloads, you may not know the breach occurred until you receive an external complaint — starting the 60-day clock retroactively.
How ContextGate Helps
- ✦Blast Radius Containment: ContextGate enforces hard access policies on every tool the agent can use. The scheduling agent gets
READ_ONLYaccess to appointment slots with a row limit of 1 per query. Even if a prompt injection instructs it to dump the full patient database, the proxy blocks it — it physically cannot execute beyond the defined policy scope. - ✦Output Inspection: ContextGate can scan outbound responses for anomalous patterns — such as bulk lists of names, MRNs, or addresses — and block or flag them before they leave the system.
- ✦Real-Time Alerting: Unusual tool call patterns (e.g., 200 consecutive patient lookups in 30 seconds) trigger immediate alerts, allowing your security team to intervene before a breach becomes a notification obligation.
What the Regulation Covers
Under GDPR, health data is classified as "Special Category Data" — the most sensitive tier. Processing it requires either explicit patient consent or strict necessity under a recognised legal basis. Article 22 adds another dimension for AI specifically: individuals have the right not to be subject to solely automated decisions that produce significant effects on them. Organisations must also honour the Right to Erasure — if a patient requests deletion of their data, it must be fully removed.
Real-World Scenario
A European hospital deploys an AI agent to assist oncology teams — reading patient records, scan reports, and pathology notes to help draft treatment summaries. Six months in, a patient exercises their Right to Erasure. The data protection officer must confirm the patient's data has been completely purged — including from any AI systems that may have processed it.
Where AI Agents Get Blocked
- ▸Fine-tuning on patient data: Training or fine-tuning a model on patient records — even internally — requires explicit consent. Without it, this is an unlawful processing activity regardless of clinical benefit.
- ▸Right to Erasure becomes unenforceable: If patient data has been used to update model weights, "forgetting" the patient becomes technically intractable. You cannot delete a person from a neural network's parameters.
- ▸Sole automated decision-making: An agent that autonomously selects a treatment pathway without a human in the loop likely triggers Article 22 rights, requiring the patient to be able to request human review.
How ContextGate Helps
- ✦Stateless Architecture by Design: ContextGate uses RAG and SQL-based retrieval. The underlying LLM is never trained on patient data — it reads records ephemerally at query time, answers, and discards them. Erasure means deleting the database record. The AI is immediately compliant because no trace was ever baked into model weights.
- ✦Human-in-the-Loop Enforcement: ContextGate's governance layer can require explicit human confirmation before any "significant decision" — flagging outputs as recommendations requiring clinician sign-off rather than autonomous actions, satisfying Article 22.
- ✦Processing Justification Logs: Every data access is logged with the legal basis under which it was retrieved (e.g., "Article 9(2)(h) — provision of healthcare"). This provides the documentation needed during a GDPR audit or supervisory authority investigation.
II. MEDICAL DEVICE
What the Regulation Covers
The FDA classifies software as a Medical Device (SaMD) if it is intended to diagnose, treat, mitigate, or prevent disease — or to inform clinical management in a way that could affect patient outcomes. This covers a wider range of AI tools than most developers expect. Once classified, the software must meet Design Controls (21 CFR 820.30), including documented requirements, risk analysis, and software validation. The FDA's AI/ML-based SaMD guidance also requires a Predetermined Change Control Plan — a roadmap for how the system will be updated without losing regulatory clearance.
Real-World Scenario
A clinical team deploys an AI agent to assist with prescribing. A physician asks: "Based on this patient's renal function results and current medications, what is the recommended maximum daily dose of metformin?" The agent reads the lab results and formulates a dosage recommendation. This output directly influences a prescribing decision — making it squarely SaMD territory.
Where AI Agents Get Blocked
- ▸Non-determinism: Ask the same dosage question twice and get different answers. The FDA requires validated, reproducible outputs — a property that standard LLM generation fundamentally lacks.
- ▸No traceable logic: "The neural network said so" is not a valid Design Control justification. Regulators need to see requirement → implementation → test evidence, not a probability distribution over tokens.
- ▸Model updates break clearance: Updating (or your LLM provider silently updating) the underlying model constitutes a device change, potentially invalidating your 510(k) clearance if not pre-approved.
How ContextGate Helps
- ✦Deterministic Clinical Tools: ContextGate lets you define a
calculate_metformin_dose(creatinine_clearance, weight)SQL tool that runs a validated algorithm. The LLM's role is limited to natural language parsing — the actual clinical calculation is handled by deterministic, version-controlled logic you can validate to Part 820. - ✦Validation Strategy: You separate the system into two components: (1) the NLP interface — not SaMD, just a UI — and (2) the clinical calculation tools — fully validated and traceable. This hybrid architecture makes FDA clearance feasible without validating the entire LLM.
- ✦Version-Pinned Tool Registry: ContextGate's tool definitions are version-controlled. When a tool is updated, the change is logged and governed — giving you the Predetermined Change Control Plan evidence the FDA requires.
What the Regulation Covers
The EU MDR imposes rigorous Clinical Evaluation requirements — you must demonstrate clinical evidence that your device is safe and effective throughout its entire lifecycle, not just at launch. Post-Market Surveillance (PMS) is mandatory and ongoing: you must continuously collect and analyse real-world performance data. For AI-based SaMD, this includes monitoring for model drift — the gradual degradation of accuracy as the real-world distribution of patients or clinical language shifts away from validation conditions.
Real-World Scenario
A radiology department in Germany deploys an AI agent that analyses free-text radiologist reports and flags cases where follow-up imaging may be warranted. At launch, the agent performs well. But over 18 months, clinical language shifts as new junior staff join and a new scanning protocol changes how findings are described. The agent's flag rate quietly drops — missing cases it should have caught.
Where AI Agents Get Blocked
- ▸Model drift goes undetected: Without continuous monitoring, performance degradation is invisible until a clinical incident occurs. At that point, the lack of ongoing PMS data is itself a compliance failure.
- ▸No mechanism for PSURs: The MDR requires documented Periodic Safety Update Reports. Without structured logging of agent behaviour, there is nothing to put in them.
- ▸Provider-side model updates: If you rely on a third-party LLM, silent model updates can alter your device's behaviour without triggering your change control process — a direct MDR violation.
How ContextGate Helps
- ✦Automated PMS via Query Logging: ContextGate logs every agent interaction. You can run scheduled statistical probes — synthetic benchmark queries with known expected outputs — through the proxy daily. Deviations in response quality are detected automatically, giving you the real-world PMS data the MDR demands.
- ✦PSUR-Ready Reporting: The structured logs form the raw material for your Periodic Safety Update Reports — query volumes, tool usage patterns, anomaly flags, and model version history are all captured without additional instrumentation.
- ✦Model Version Control: ContextGate pins the LLM version at the proxy layer. Your device runs the validated version until you explicitly update it through a governed change control process.
What the Regulation Covers
IEC 62304 defines the software development lifecycle for medical devices. It assigns three safety classes — A (no injury possible), B (non-serious injury), and C (death or serious injury possible) — each with increasing rigour. For Class C, you must demonstrate complete traceability from every safety requirement, through the implementation, to a corresponding test case. Every safety-critical function must be traceable to a specific, testable line of code.
Real-World Scenario
A hospital builds an AI-powered prescribing assistant. A documented safety requirement states: "The system shall check for known drug allergies before any prescription is generated." During a regulatory audit, the assessor asks the development team to demonstrate exactly where this requirement is enforced, and to show test evidence that it has never been bypassed.
Where AI Agents Get Blocked
- ▸Untraceable logic: You cannot map a safety requirement to a specific neural network weight. There is no line of code an auditor can point to that implements "check for allergies" — the behaviour emerges from billions of parameters, non-deterministically.
- ▸No enforceable guarantees: Even if the agent behaves correctly in testing, nothing prevents it from skipping the allergy check when given an unusual prompt in production. The behaviour is probabilistic, not contractual.
- ▸Test evidence gap: Class C requires systematic test evidence. Testing a probabilistic model against a fixed requirement is fundamentally different from testing a deterministic function, and the evidence is weaker by design.
How ContextGate Helps
- ✦Requirements Mapped to Tools: The safety requirement "check for allergies" is implemented as a
check_allergy(patient_id, drug_id)tool — deterministic SQL against your allergy database. The traceability chain is: Requirement → Tool Definition → SQL Script → Test Cases. Fully auditable under IEC 62304. - ✦Policy-Enforced Execution: ContextGate's governance layer can be configured to require that the allergy tool is called before any prescription action is permitted. The AI cannot skip it — the proxy blocks the downstream action until the required tool has been invoked.
- ✦Audit Log as Test Evidence: The complete history of every allergy check — query, parameters, result, and timestamp — is available in ContextGate's logs. This is the systematic test evidence IEC 62304 demands, generated automatically in production.
III. CLINICAL & ETHICS
What the Regulation Covers
The 21st Century Cures Act mandates health data interoperability through FHIR (Fast Healthcare Interoperability Resources) standards and explicitly prohibits "information blocking" — any practice that interferes with the access, exchange, or use of electronic health information. For AI deployments, systems must not trap data in proprietary formats, fail to surface available information, or obstruct access to records — even unintentionally.
Real-World Scenario
A hospital uses an AI agent to consolidate patient records before discharge, pulling from multiple providers — a primary care EHR, a specialist clinic, and an external lab network. A care coordinator asks: "Get me a full picture of what happened to this patient across all their providers in the last 12 months." The agent must retrieve from multiple FHIR endpoints and present a coherent summary — without losing or blocking any source information.
Where AI Agents Get Blocked
- ▸Context window silently drops data: A naive agent loads all available records into its context window. If combined records exceed the window, earlier data is silently dropped — the agent unknowingly blocks information without any system-level constraint.
- ▸Proprietary reformatting: An agent that reformats FHIR resources into its own internal representation may lose structured field metadata, breaking downstream interoperability and potentially constituting information blocking.
- ▸Failure to surface all sources: If the agent only queries one EHR when multiple are available, it effectively blocks access to the others — a compliance violation even if no malicious intent exists.
How ContextGate Helps
- ✦FHIR-Native Tool Layer: ContextGate exposes a
get_fhir_resource(patient_id, resource_type, source)tool that queries FHIR endpoints in a standardised way. Data is never reformatted or compressed through the LLM — it is retrieved as-is, preserving interoperability. - ✦Exhaustive Source Enumeration: The tool can be configured to systematically query all registered FHIR endpoints for a patient, not just the first one that responds. Every source is documented in the audit log — providing evidence that information blocking did not occur.
- ✦Structured Output Preservation: The agent works with structured references to FHIR resources rather than raw text, ensuring that clinical data fidelity is maintained throughout the workflow.
What the Regulation Covers
Section 1557 prohibits discrimination on the basis of race, colour, national origin, sex, age, or disability in health programmes receiving federal funding. Updated HHS guidance explicitly applies this to clinical decision-support tools and AI systems used in patient care. Any algorithm that produces systematically different outcomes for protected groups — even without discriminatory intent — creates civil rights liability. Providers are responsible for the discriminatory effects of algorithms they deploy, including commercial tools.
Real-World Scenario
A busy outpatient clinic deploys an AI agent to manage triage and appointment scheduling, ranking requests by "clinical urgency." Over time, the clinic notices patients with certain insurance types are being scheduled significantly faster. The algorithm was never told to consider insurance status — but the LLM had absorbed correlations from historical data where privately insured patients were historically prioritised by staff.
Where AI Agents Get Blocked
- ▸Latent bias in training data: LLMs trained on historical healthcare data can absorb and replicate historical biases. Insurance type, postcode, or language are proxies for protected characteristics, and an agent using these signals — even implicitly — creates liability.
- ▸Unexplainable prioritisation: If a regulator asks "why was my patient's appointment scheduled later?", the answer "the AI decided" is not legally defensible under Section 1557.
- ▸No audit mechanism: Without a record of the exact logic used to rank patients, demonstrating that discrimination did not occur is impossible.
How ContextGate Helps
- ✦Deterministic Ranking via SQL: ContextGate routes clinical prioritisation logic through an explicit SQL tool:
rank_patients_by_urgency(criteria). The query is written by your clinical and compliance teams —ORDER BY triage_score DESC— with protected characteristics explicitly excluded from the schema the agent can access. - ✦Schema-Level Bias Exclusion: ContextGate's data access layer can strip insurance type, postcode, and other proxy fields from the data the agent sees. If the information isn't in the context, it cannot influence the decision — by construction, not by policy guidance.
- ✦Regulatory Evidence: The complete audit log of every scheduling decision — including the exact SQL executed and the resulting rank order — can be produced to regulators on request, demonstrating prioritisation was based solely on clinical criteria.
IV. LIFE SCIENCES (GXP)
What the Regulation Covers
21 CFR Part 11 establishes that electronic records and signatures in FDA-regulated industries (pharma, biotech, medical devices) must be trustworthy, reliable, and legally equivalent to paper records and handwritten signatures. It requires strict audit trails showing who created, modified, or approved records and when. Any computer system used to create or maintain these records must be validated, access-controlled, and equipped with an audit trail that cannot be altered or disabled.
Real-World Scenario
A pharmaceutical manufacturer deploys an AI agent to assist QA staff with batch record review. A QA scientist asks the agent to review a production batch record and flag any anomalies. The agent identifies three deviations and recommends approving the batch with documented exceptions. The scientist types "confirm." Under 21 CFR Part 11, this constitutes an electronic approval of a regulated GMP record — every step must be accountable to a specific authenticated individual, with an immutable trail.
Where AI Agents Get Blocked
- ▸No human attribution: Standard AI agents produce outputs attributed to "the system," not a specific employee. If an agent reviews and effectively endorses a batch record, tracing that decision to a regulated, accountable human is essential under Part 11.
- ▸Mutable or absent audit trails: If the agent's reasoning process, the data it reviewed, and the actions it took are not captured in an immutable, timestamped log, the electronic record is non-compliant regardless of the outcome.
- ▸Unvalidated systems: Part 11 requires computer system validation. Deploying an LLM without a structured validation protocol and documented evidence is a direct compliance gap in a regulated environment.
How ContextGate Helps
- ✦User-Attributed Action Logs: Every agent action is tied to the authenticated human user who initiated it. When the QA scientist confirms the recommendation, ContextGate's log records: user identity, the specific tool called (e.g.,
review_batch_record(batch_id)), the exact data accessed, the agent's output, and the confirmation timestamp. - ✦Immutable Audit Trail: ContextGate's logs are append-only and tamper-evident. Records cannot be modified after the fact — satisfying the audit trail requirements of Part 11 Section 11.10(e).
- ✦Validation-Friendly Architecture: Because critical actions are mediated through versioned, deterministic tools rather than free-form LLM generation, the validation scope is bounded and manageable — focusing on tool definitions rather than attempting to validate an entire language model.
What the Regulation Covers
ICH E6 (R3) is the international standard for Good Clinical Practice in clinical trials. Its core principles are participant protection, data credibility, and trial record integrity. The R3 revision explicitly addresses computerised systems and data integrity — requiring that any system used to generate, process, or store trial data meets the ALCOA+ framework: Attributable, Legible, Contemporaneous, Original, and Accurate, with audit trails for all data changes.
Real-World Scenario
A Contract Research Organisation uses an AI agent to assist with clinical trial data cleaning. A data manager asks: "Review the latest data transfer from Site 07. Flag any values that look like data entry errors and suggest corrections." The agent identifies 14 anomalous values and proposes corrections. The data manager reviews and approves them — and these corrections will become part of the official trial dataset submitted to regulators for drug approval.
Where AI Agents Get Blocked
- ▸Hallucinated corrections compromise trial integrity: If the agent invents a "corrected" value rather than flagging the anomaly for human review, it introduces fabricated data into a regulated trial dataset. This can invalidate the entire trial.
- ▸No ALCOA+ compliance: The "A" in ALCOA stands for Attributable — every data point must be traceable to the person who created or changed it. An AI correction with no human attribution fails this requirement.
- ▸Unvalidated computerised system: GCP requires that any computerised system used in a trial is validated to demonstrate it performs as intended. An unvalidated LLM integration is a finding in any regulatory inspection.
How ContextGate Helps
- ✦Source-of-Truth Grounding: ContextGate's Cognitive Cortex restricts the agent to querying and flagging existing data — it cannot insert or modify database rows directly. Proposed corrections are surfaced as structured outputs for human review, preserving the "Original" and "Accurate" principles of ALCOA+.
- ✦ALCOA-Compliant Audit Trail: When the data manager approves a correction, ContextGate records the full attribution chain — user identity, timestamp, the agent's proposed value, and the human confirmation — satisfying the attributable, contemporaneous, and original requirements of GCP.
- ✦Validation-Scoped System Boundary: Because critical data operations are implemented as explicit, version-controlled SQL tools, the validated system boundary is clearly defined. The LLM is a flagging interface; the data tools are the validated components — significantly simplifying GCP system validation documentation.
| Regulatory Domain | Key Risk | The ContextGate Fix |
|---|---|---|
| Privacy (HIPAA/GDPR) | PHI leakage to public LLMs. | Proxy Redaction: Strips names/MRNs before they leave the firewall. |
| Breach Liability (HITECH) | Prompt injection as data exfiltration vector. | Tool Scope Limits: Hard row caps and output scanning block bulk exfiltration. |
| Data Rights (GDPR Art. 9) | Patient data baked into model weights; Right to Erasure unenforceable. | Stateless Architecture: LLM never trained on patient data; erasure = database delete. |
| Safety (FDA SaMD/MDR) | Hallucinated medical calculations. | Deterministic SQL Tools: Clinical logic separated from LLM and fully validated. |
| Traceability (IEC 62304) | Safety requirements untraceable to code. | Tool-Based Requirements: Safety checks implemented as named, auditable tools. |
| Interoperability (Cures Act) | Context window silently drops available records. | FHIR Tool Layer: Structured retrieval from all sources, logged comprehensively. |
| Ethics (HHS 1557) | Hidden algorithmic bias in scheduling/prioritisation. | Auditable SQL Ranking: Explicit, explainable logic with protected fields excluded. |
| Records (21 CFR Part 11) | AI actions not attributed to a human user. | Immutable Logs: Every action tied to authenticated user with full payload record. |
| Data Integrity (GCP/ICH E6) | AI invents data corrections in trial datasets. | Read-Only Grounding: Agent flags anomalies; corrections require human approval. |