Just check expiration of the active certificate; if it’s under a threshold (say 1 week, assuming you auto-renew it when it’s 3 weeks to expiry; still serving a cert when it’s 1 week to expiration is enough signal that something went wrong) then you alert.
Then you just need to test that your alerting system is reliable. No need to use your users as canaries.
In real life, I guess there are people who don't monitor at all. For them failing requests would go unnoticed ... for the others monitoring must be easy.
But I think the core thing might be to make monitoring SSL lifetime the "obvious" default: All the grafana dashboards etc should have such an entry.
Then as soon as I setup a monitoring stack I get that reminder as well.
The canary narrows the blast radius and time-to-detection.
This has given me some interesting food for thought. I wonder how feasible it would be to create a toy webserver that did exactly this (failing an increasing percentage of requests as the deadline approaches)? My thought would be to start failing some requests as the deadline approaches a point where most would consider it "far too late" (e.g. 4 hours before `notAfter`). At this point, start responding to some percentage of requests with a custom HTTP status code (599 for the sake of example).
Probably a lot less useful than just monitoring each webserver endpoint's TLS cert using synthetics, but it's given me an idea for a fun project if nothing else.