Eliminating Customer-Facing SSL Failures Through Proactive Monitoring
Self-initiated solution that transformed reactive firefighting into proactive prevention.
The company managed 1,400+ domains with a reactive, manual monitoring process. 3-4 SSL certificates expired weekly, often detected by customers first.
Expired certificates were often discovered by Account Managers or customers after security warnings appeared. This damaged our reputation.
The legacy Python script took 30+ minutes and frequently crashed. New domains were often missed entirely due to static input lists.
No single source of truth. Developers had to be manually chased to renew certificates, creating communication gaps.
"The problem wasn't just checking certificates. The problem was a broken feedback loop between Monitoring and Development."
// PM Insight
I treated this as an internal product, gathering requirements from users (team) and buyers (leadership).
Needed a tool that didn't crash, ran quickly, and was easy to read. Dark mode preferred for long shifts.
Needed clear, actionable lists of what to fix—not raw logs.
Required strict access control and security logging for infrastructure data.
A web-based dashboard that centralized domain health with key PM-driven decisions.
Decision: Replaced static text file with dynamic
API feed.
Outcome: Eliminated the risk of missing newly
purchased domains.
Decision: Built structured Excel export with
auto-dated labels.
Outcome: Friction-free weekly workflow. Zero
ambiguity.
Context: CTO flagged security concerns in v0.3.4.
Action: Pivoted roadmap to prioritize Auth in
v0.4.0.
Solution: Batching (20 concurrent requests) for
speed.
UX: Vercel-inspired dark mode—empathy for tired
eyes.
From fixed script to production-ready platform in four major iterations.
Proved the core logic worked. Failed on UX and reliability.
Team could now run checks without touching code. Major adoption increase.
Implemented secure login following CTO feedback. Satisfied exec requirements.
Discovered valid SSLs pointing to wrong servers. Caught misconfigurations.
We discovered some domains had valid SSLs but pointed to the wrong server IP due to client changes.
Caught infrastructure misconfigurations that standard SSL checks missed.
From reactive firefighting to proactive prevention.
"The 'Monday Morning Panic' is gone. The team trusts the dashboard."
// Qualitative Impact
What I would do differently and where the product goes next.
Implement automated alerting (Cliq/Email webhooks) earlier. Currently requires users to "pull" data. Moving to "push" would reduce friction further.
1. Automated Cliq notifications—no manual login
needed.
2. Historical analytics—track which devs let
certificates lapse.