In this blog post, we will explore a critical issue encountered at Safepay when both the sandbox and production URLs/APIs unexpectedly stopped functioning. Through careful investigation and troubleshooting, we identified the root cause to be an outdated version of the Nginx Ingress Controller. This post will walk you through the steps we took to update the Nginx Ingress definitions and resolve the downtime issue.
Identifying the Issue:Upon discovering the malfunctioning URLs/APIs, our initial troubleshooting efforts led us to the Nginx Ingress Controller. Log analysis and thorough investigation revealed that an outdated version of the controller was causing the problem.
Updating Nginx Ingress Definitions:To resolve the issue, we needed to update the Nginx Ingress definitions from the deprecated extensions/v1beta API version to the new apps/v1 version. Here are the steps we followed:
Updating Nginx Ingress Controller:To further enhance stability, we decided to replace the existing Nginx controller with an updated version. Here's what we did:
Addressing Routing Errors:Despite the DNS adjustments, we encountered routing errors. To tackle this issue, we made specific modifications within our Ingress file:
These adjustments helped to resolve the routing errors and ensure proper functioning of the system.
Monitoring and Recommendations:Following the update, we closely monitored the system for any potential issues arising from the API version change. It is always advisable to perform such updates in a test or staging environment before applying them to production.
Conclusion:By applying the necessary updates to the Nginx Ingress definitions and the Nginx Ingress Controller, we successfully resolved the critical issue that caused downtime at Safepay.
Our Uptime notifications played a vital role in promptly identifying the problem and minimizing the impact on our users. To stay up to date with our system's availability and receive real-time updates, we encourage you to visit our dedicated status page at https://safepay.betteruptime.com.
By implementing these measures and ensuring regular updates, we strive to provide a stable and reliable platform for our users.