GUIDEcx - Login Instability Issues – Incident details

All systems operational

Login Instability Issues

Resolved
Degraded performance
Started 4 months agoLasted about 1 hour

Affected

Web Application

Operational from 6:21 PM to 6:53 PM, Degraded performance from 6:53 PM to 7:12 PM, Operational from 6:53 PM to 7:12 PM, Degraded performance from 7:12 PM to 7:35 PM, Operational from 7:12 PM to 11:00 PM

Project Management

Operational from 6:21 PM to 6:53 PM, Degraded performance from 6:53 PM to 7:35 PM, Operational from 7:35 PM to 11:00 PM

Compass Customer Portal

Operational from 6:21 PM to 11:00 PM

Resource Management

Operational from 6:21 PM to 11:00 PM

Advanced Time Tracking

Operational from 6:21 PM to 11:00 PM

Report Navigator and Report Builder

Operational from 6:21 PM to 11:00 PM

Updates
  • Update
    Update

    Summary

    During a routine upgrade of our system infrastructure, we encountered an issue related to the rate-limiting of image downloads from an external service. This rate limit disrupted the startup of essential services, leading to a temporary outage that affected the availability of certain features.

    What Happened

    The issue occurred during the upgrade process when the rapid and simultaneous restarting of multiple system components led to a higher-than-usual number of download requests within a short time frame. This exceeded the limits set by the external service provider, disrupting the startup of critical services.

    How We Fixed It

    • Enhanced Access: At 3:10 PM ET, we upgraded our access to the external service, allowing for higher download limits. A new access credential was created and applied, which allowed the impacted services to restart successfully.

    • Configuration Update: We updated our system configurations to ensure more reliable access to required components in the future, reducing the likelihood of similar issues.

    What We've Done to Prevent This in the Future

    To prevent this from happening again, we took several steps:

    • Image Caching & Version Control: We implemented changes to cache frequently used components and pin them to specific versions, reducing the need to download them from external sources repeatedly and avoiding future rate limits.

    • Upgraded Service Plan: We upgraded our plan with the external service provider to a higher tier, increasing our allowed download capacity and providing more robust support for future operations.

  • Resolved
    Resolved

    The login fix was successful. Monitoring has proven stable. All access is restored.

  • Monitoring
    Monitoring
    We implemented a fix and are currently monitoring the result.
  • Identified
    Identified

    Diagnosis is complete. We are working on solving the main login issue now.

  • Investigating
    Investigating

    We are currently investigating this incident that's causing intermittent login issues.