One Tuesday morning, every call from a production healthcare app to the French national INSi teleservice started returning a 500. Patient identity verification was completely broken.
The interface, with admirable dedication to making things worse, showed a raw error to the clinician. The patient was there, the care flow had to continue, and the software effectively answered: “OpenSSL is having feelings.”
The underlying error, captured in Sentry, was a single line:
OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0
certificate verify failed (self-signed certificate in certificate chain)
Diagnosis took twenty minutes. The clean fix, two hours. But the real problem was not just making OpenSSL quiet again. The service had to work, without weakening TLS, and an external failure should not block clinical work.
There is a wrong answer to this problem, copy-pasted everywhere on forums, that you absolutely must avoid when you ship patient identity data over the wire. It fixes TLS about as well as removing the engine light fixes a car.
The Actual Need
The business need was not “complete a TLS handshake”. It was simpler:
- Verify a patient’s INS identity when the teleservice responds.
- Let the clinician keep working when the teleservice does not respond.
- Clearly mark the identity data as provisional until it has been verified.
- Know before the next outage that a certificate is about to expire.
TLS was only one part of the problem. Important, but not enough. A critical integration that works again without a degraded mode is just a future 500 with a better certificate.
What INSi Does
INSi is the CNAM (French health insurance) teleservice that lets a clinical app fetch or verify a patient’s National Health Identifier from civil traits: name, first name, date and place of birth. It is a central building block for producing, exchanging, and archiving healthcare data against the right patient identity.
The endpoint lives on services-ps-tlsm.ameli.fr. It’s SOAP, authenticated by a client certificate issued through the ANS Portail de Confiance. On the server side, the TLS chain roots in the IGC-Santé root CA, the French state authority for healthcare PKI.
That root is in no default system trust store. Not Debian, not Ubuntu, not macOS, not the Mozilla bundle embedded in most HTTP libraries. It’s a specialized government PKI, separate from the public web PKI. In other words, if you do not add it explicitly, OpenSSL will not discover it through administrative telepathy.
The Bad Fix
If you search “OpenSSL self-signed certificate in certificate chain Ruby”, you’ll find ten StackOverflow answers all saying the same thing:
# DO NOT do this on a patient identity flow
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
On a personal script scraping a site with a self-signed certificate, that is between you and your conscience. On a flow that ships a patient’s social security number, name, and date of birth to a service authenticated by client certificate? Absolutely not.
Disabling verify_mode means:
- Accepting any certificate on the other side, including a MitM attacker on the network.
- Punching a hole in the only cryptographic guarantee that the machine you’re talking to is in fact CNAM’s.
- Carving into the codebase, durably, a security regression no one will ever revisit.
And for nothing: the real answer is quick, documentable, and keeps security in place. A useful detail when the payload is patient identity.
The Clean Fix
ANS publishes the IGC-Santé chain publicly on its site (root + intermediates, PEM format). The right reflex:
- Download the official chain from the ANS portal.
- Verify fingerprints: SHA-256 of the downloaded cert compared against the one served by the live endpoint, and against the one published by ANS. Three sources, one hash. If any of the three disagrees, stop.
- Bundle it into the project, under version control, distinct from client certificates (which stay in encrypted credentials).
- Point OpenSSL at it when building the SOAP client.
In a Rails app, simplified, this is what Ameli::Insi ends up looking like:
class Ameli::Insi
def http_client
Savon.client(
wsdl: wsdl_path,
ssl_cert_file: client_cert_path,
ssl_cert_key_file: client_key_path,
ssl_ca_cert_file: igc_sante_bundle_path, # bundled public chain
ssl_verify_mode: :peer # strict verification stays on
)
end
private
def igc_sante_bundle_path
Rails.root.join("lib", "certs", "igc_sante.pem").to_s
end
end
To validate end-to-end before even redeploying, openssl does the job:
openssl s_client \
-connect services-ps-tlsm.ameli.fr:443 \
-CAfile lib/certs/igc_sante.pem \
-servername services-ps-tlsm.ameli.fr \
-showcerts < /dev/null 2>&1 | grep "Verify return code"
Expected output: Verify return code: 0 (ok). From there, the Ruby client works again, in VERIFY_PEER, with no security downgrade.
The Useful Fix
Once the root cause was fixed, one more interesting question remained: why did a transport-level error on an external teleservice produce a raw 500 in the clinician’s face?
Answer: the Ameli::Insi service didn’t rescue HTTPI errors. Any network failure, any time-out, any TLS error bubbled up to the Rails controller, which has nothing better to do than 500. That is not business behavior. It is an unmade product decision with a stacktrace at the end.
The fix is a few lines per SOAP method:
def fetch_by_traits(patient)
response = http_client.call(:fetch_by_traits, message: payload_for(patient))
Ameli::Insi::Identity.from_soap(response)
rescue HTTPI::Error, Savon::Error => e
Sentry.capture_exception(e, extra: { patient_id: patient.id })
raise Ameli::Insi::TeleserviceUnavailable
end
The calling controller rescues TeleserviceUnavailable and switches to a provisional identity (the civil traits typed by the clinician, tagged “INS not verified”), with a banner explaining INSi is down and that the record must be re-verified later. The clinician keeps working, the patient isn’t blocked, and the data will be reconciled at the next successful call.
That is the useful behavior. Not spectacular, not great demo material, but it is what matters in production: an external dependency going down should not stop the care chain. It should fall back to an explicit degraded mode.
The Invisible Work
This incident also surfaced a blind spot. INSi, MSSanté, and Pro Santé Connect client certificates are renewed manually on the ANS Portail de Confiance, pasted into encrypted credentials, and then nobody looks at them again. Artisanal, but not in the charming way.
The day one expires, you get the same outage again, with a familiar smell and a perfectly reasonable urge to insult a calendar.
So, right after the fix, a daily job walks the inventory of bundled certificates: public chains and client certificates. It reports to Sentry any certificate within 30 days of expiry, then escalates at 7 days or once a certificate has already expired.
I also added a renewal runbook versioned in the repo (lib/certs/README.md), with the PFC procedure step by step. No one will see it as long as it works. When it threatens to break, it will be an actionable warning, not a surprise outage.
A quick map of French healthcare PKI
For anyone discovering the ecosystem, the building blocks look alike and it’s easy to get lost:
- INS: the patient’s national health identifier (NIR or NIA + traits). The data.
- INSi: the CNAM teleservice that lets you fetch / verify an INS. The API.
- MSSanté: secure messaging between professionals and with the patient. Separate PKI, certificates issued by MSSanté operators.
- Pro Santé Connect: the state SSO for healthcare professionals (dematerialized CPS card, OIDC).
- IGC-Santé: the shared PKI root that signs most of the ANS server side.
Each block is documented, each chain is public and verifiable. The trap is treating this as an annoying technical detail when it is part of the product: identity, security, continuity of care, and operations.
What I Keep From This
When a TLS integration breaks in production, the right answer is almost never “disable verification”. It’s: find the missing trust anchor, make it explicit, version it, audit it over time. The VERIFY_NONE reflex saves ten minutes now and leaves a security debt to age quietly in the codebase.
The other point, more product than TLS, is that any critical integration needs a degraded mode. An external teleservice going down is normal over a decade of operations. A blunt 500 that blocks the business is a design choice. Often implicit. Rarely good.
The real work was the whole loop: understand the clinical need, restore the trust chain, preserve security, add a business fallback, then monitor certificates so we do not replay the same scene three months later in a different outfit.
If you have one of these at work
French healthcare integrations (INSi, MSSanté, Pro Santé Connect, DMP) are a mix of PKI, certificates, WSDL files, and ANS procedures that require more patience than genius. They work. But you need to frame the business need as much as the technical connection.
This is the kind of work I handle at SXN Labs: understand what actually needs to keep working, simplify the scope, connect it properly, and leave behind a system people can operate. If you have a healthcare teleservice stuck somewhere, drop me a line.