A cardiologist in São Paulo asked us this on a pilot call last month: "If I let this thing tell me to hold warfarin and the patient bleeds out, who explains that to the family?" She was not being rhetorical. She had spent eighteen months with two different "AI assistants" bolted onto her EHR, both quietly disabled by her department after one suggested a contraindicated dose and another auto-populated a wrong allergy. Her question is the only question that matters. The architecture has to answer it before the marketing does.
So here is the rule we built Cortex around. Every agent in the system — Bleeding Risk, Cardiac Risk, Medication Reconciliation, Anesthesia Routing, the rest of the bench — runs in shadow mode by default. They watch the encounter. They read the transcript, the chart context, the labs, the prior notes. They produce findings, each tagged with a confidence score, a chain-of-thought, and citations to the source data they used. Those findings appear in the co-pilot panel next to you. Nothing is committed to the patient record. Nothing fires an order. Nothing reaches the patient. Until you, the clinician, run the Conclude-Session ceremony, the agents are observers and nothing more.
The Conclude-Session ceremony is deliberately a ceremony. You review each finding, one by one. You confirm or dismiss it. You can attach a note — "dismissed: prior MI in external chart" — and you sign with your PIN. At that moment, and only at that moment, three things happen at once. The accepted findings commit to the encounter record. The patient phase advances (Diagnostics moves to Assessment, unlocking the next phase's agents). And the audit log appends one entry per accepted and per rejected finding, with your identity, your PIN-verified intent, the timestamp, the model version, and the inputs the agent saw.
That audit log is hash-chained. Each new entry's hash is derived from the previous entry's hash plus the new content, so any tampering anywhere in the chain breaks every entry after it. We run /api/cron/audit-chain-verify nightly. If a row has been altered — by an operator, a bug, anything — the verify run fails, an alert fires, and Cortex enters a degraded read-only state until the discrepancy is resolved. This is not a compliance checkbox. It is a load-bearing piece of trust: when you sign, the system can prove later, to you and to a regulator, exactly what you saw and exactly what you decided.
There is one more piece worth describing, because it is what most "AI assistants" get wrong. When an agent's confidence falls below our floor of 85%, it does not guess and it does not hide. It surfaces a review-needed flag with a specific clarification question. Instead of "Cardiac Risk: Elevated", you see "Cardiac Risk: Confirm prior MI documented elsewhere? Current chart shows AFib only." That is a question you can answer in one click — yes, no, see external — instead of a recommendation you have to second-guess. Low confidence becomes a question, not a quiet error.
We know the objection. The market wants autonomous agents. Booking, ordering, prescribing, all of it on rails. Autonomous works for low-stakes tasks: calendar slots, retrieval, summarization, intake forms. It does not work where a wrong action has irreversible patient consequences. Holi Labs will not ship an autonomous-action loop that touches a patient — not in this version, not in the next. If you want a tool that prescribes for you, Cortex is not that tool, and you would be right to be uncomfortable with anything that claims to be.
The regulators are converging on the same view. EU AI Act Article 14 requires meaningful human oversight for high-risk AI systems, and clinical decision software is named explicitly. ANVISA Class I in Brazil — the tier we operate under for the deterministic decision layer — restricts the kinds of automation that can sit between a clinician and a patient action. Doctor-in-the-loop is not a marketing posture for us. It is an architectural commitment we made on day zero so that when the audit comes, the answer is already in the code, the audit log, and the sign-off ceremony you ran twelve minutes ago for Mrs. Oliveira.
That is the whole shape of it. The agents propose. You sign. The chain holds. If a patient bleeds, you can say what you saw, what was suggested, what you decided, and why — and the system can prove every word of it. That is the version of clinical AI worth building.