One Tuesday morning, every call from a production healthcare app that I build and maintain to the French national INSi teleservice started failing server-side. Patient identity verification was completely broken.
On the user side, the software did what it was supposed to do: it did not show a stacktrace, it indicated that INS verification had not completed. The patient was there, the care flow could continue, but the identity remained unverified.
The underlying error, captured in Sentry, was a single line:
OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0
certificate verify failed (self-signed certificate in certificate chain)
Diagnosis took twenty minutes. The clean fix, two hours. But the point was not to rework the UX or the business behavior. The INSi integration was already running in production with strict TLS verification and correct behavior on failure; the incident mostly exposed one operational subject that had to be made explicit: the IGC-Santé trust chain.
The service had to work again, without weakening TLS, while preserving an important property: an external failure should not block clinical work.
There is a wrong answer to this problem, copy-pasted everywhere on forums, that you absolutely must avoid when you ship patient identity data over the wire. It fixes TLS about as well as removing the engine light fixes a car.
The Actual Need
The business need was not “complete a TLS handshake”. It was simpler:
- Verify a patient’s INS identity when the teleservice responds.
- Let the clinician keep working when the teleservice does not respond.
- Clearly mark the identity data as provisional until it has been verified.
- Know before the next outage that a certificate is about to expire.
TLS was only one part of the problem. Important, but not enough. A critical integration also needs an explicit answer for the days when the external dependency does not respond.
What INSi Does
INSi is the CNAM (French health insurance) teleservice that lets a clinical app fetch or verify a patient’s National Health Identifier from civil traits: name, first name, date and place of birth. It is a central building block for producing, exchanging, and archiving healthcare data against the right patient identity.
The endpoint lives on services-ps-tlsm.ameli.fr. It’s SOAP, authenticated by a client certificate issued through the ANS Portail de Confiance. On the server side, the TLS chain roots in the IGC-Santé root CA, the French state authority for healthcare PKI.
That root is in no default system trust store. Not Debian, not Ubuntu, not macOS, not the Mozilla bundle embedded in most HTTP libraries. It’s a specialized government PKI, separate from the public web PKI. In other words, if you do not add it explicitly, OpenSSL will not discover it through administrative telepathy.
The Bad Fix
If you search “OpenSSL self-signed certificate in certificate chain Ruby”, you’ll find ten StackOverflow answers all saying the same thing:
# DO NOT do this on a patient identity flow
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
On a personal script scraping a site with a self-signed certificate, that is between you and your conscience. On a flow that ships a patient’s social security number, name, and date of birth to a service authenticated by client certificate? Absolutely not.
Disabling verify_mode means:
- Accepting any certificate on the other side, including a MitM attacker on the network.
- Punching a hole in the only cryptographic guarantee that the machine you’re talking to is in fact CNAM’s.
- Carving into the codebase, durably, a security regression no one will ever revisit.
And for nothing: the real answer is quick, documentable, and keeps security in place. A useful detail when the payload is patient identity.
The Clean Fix
ANS publishes the IGC-Santé chain publicly on its site (root + intermediates, PEM format). The right reflex:
- Download the official chain from the ANS portal.
- Verify fingerprints: SHA-256 of the downloaded cert compared against the one served by the live endpoint, and against the one published by ANS. Three sources, one hash. If any of the three disagrees, stop.
- Bundle it into the project, under version control, distinct from client certificates (which stay in encrypted credentials).
- Point OpenSSL at it when building the SOAP client.
In a Rails app, simplified, this is what Ameli::Insi ends up looking like:
class Ameli::Insi
def http_client
Savon.client(
wsdl: wsdl_path,
ssl_cert_file: client_cert_path,
ssl_cert_key_file: client_key_path,
ssl_ca_cert_file: igc_sante_bundle_path, # bundled public chain
ssl_verify_mode: :peer # strict verification stays on
)
end
private
def igc_sante_bundle_path
Rails.root.join("lib", "certs", "igc_sante.pem").to_s
end
end
To validate end-to-end before even redeploying, openssl does the job:
openssl s_client \
-connect services-ps-tlsm.ameli.fr:443 \
-CAfile lib/certs/igc_sante.pem \
-servername services-ps-tlsm.ameli.fr \
-showcerts < /dev/null 2>&1 | grep "Verify return code"
Expected output: Verify return code: 0 (ok). From there, the Ruby client works again, in VERIFY_PEER, with no security downgrade.
The Useful Behavior
Once the root cause was fixed, one interesting point was worth making explicit: why had the incident not blocked the clinician?
Because the software did not treat an INSi failure as an exception to throw on screen, but as a business state. When INSi answered, patient identity was verified. When the teleservice did not answer, the technical error stayed in Sentry and the interface told the clinician that INS verification could not be completed.
The application boundary looks like this, simplified:
def fetch_by_traits(patient)
response = http_client.call(:fetch_by_traits, message: payload_for(patient))
Ameli::Insi::Identity.from_soap(response)
rescue HTTPI::Error, Savon::Error => e
Sentry.capture_exception(e, extra: { patient_id: patient.id })
raise Ameli::Insi::TeleserviceUnavailable
end
The calling controller rescues TeleserviceUnavailable and keeps a provisional identity (the civil traits typed by the clinician, tagged “INS not verified”), with a banner explaining INSi is down and that the record must be re-verified later. The clinician keeps working, the patient isn’t blocked, and the data will be reconciled at the next successful call.
That was already the useful behavior. Not spectacular, not great demo material, but it is what matters in production: an external dependency going down should not stop the care chain. It should become an explicit degraded mode.
The Invisible Work
This incident also brought back an operational point that is easy to underestimate. INSi, MSSanté, and Pro Santé Connect client certificates are renewed manually on the ANS Portail de Confiance, then stored in encrypted credentials. The app can work perfectly for months, until one deadline arrives silently.
The day one expires, you get the same outage again, with a familiar smell and a perfectly reasonable urge to insult a calendar.
So, right after the fix, I added a daily job that walks the inventory of bundled certificates: public chains and client certificates. It reports to Sentry any certificate within 30 days of expiry, then escalates at 7 days or once a certificate has already expired.
I also added a renewal runbook versioned in the repo (lib/certs/README.md), with the PFC procedure step by step. No one will see it as long as it works. When it threatens to break, it will be an actionable warning, not a surprise outage.
A quick map of French healthcare PKI
For anyone discovering the ecosystem, the building blocks look alike and it’s easy to get lost:
- INS: the patient’s national health identifier (NIR or NIA + traits). The data.
- INSi: the CNAM teleservice that lets you fetch / verify an INS. The API.
- MSSanté: secure messaging between professionals and with the patient. Separate PKI, certificates issued by MSSanté operators.
- Pro Santé Connect: the state SSO for healthcare professionals (dematerialized CPS card, OIDC).
- IGC-Santé: the shared PKI root that signs most of the ANS server side.
Each block is documented, each chain is public and verifiable. The trap is treating this as an annoying technical detail when it is part of the product: identity, security, continuity of care, and operations.
What I Keep From This
When a TLS integration breaks in production, the right answer is almost never “disable verification”. It’s: find the missing trust anchor, make it explicit, version it, audit it over time. The VERIFY_NONE reflex saves ten minutes now and leaves a security debt to age quietly in the codebase.
The other point, more product than TLS, is that any critical integration needs a degraded mode. An external teleservice going down is normal over a decade of operations. Here, that decision already existed on the user side: INS verification failed, the clinician was informed, and care could continue.
The real work was the whole loop: understand the clinical need, make the trust chain explicit, preserve security, rely on the existing business fallback, then monitor certificates so we do not replay the same scene three months later in a different outfit.
If you have one of these at work
French healthcare integrations (INSi, MSSanté, Pro Santé Connect, DMP) are a mix of PKI, certificates, WSDL files, and ANS procedures that require more patience than genius. They work. But you need to frame the business need as much as the technical connection.
This is the kind of work I handle at SXN Labs: understand what actually needs to keep working, simplify the scope, connect it properly, and leave behind a system people can operate. If you have a healthcare teleservice stuck somewhere, drop me a line.