GUIDEcx - Notice history

All systems operational

100% - uptime

Workato Website - Operational

Workato Email notifications - Operational

Workbot for Teams - Operational

Workbot for Slack - Operational

Recipe runtime for job execution - Operational

Recipe Webhook ingestion - Operational

Recipe API gateway - Operational

Notice history

Aug 2024

Login Instability Issues
  • Update
    Update

    Summary

    During a routine upgrade of our system infrastructure, we encountered an issue related to the rate-limiting of image downloads from an external service. This rate limit disrupted the startup of essential services, leading to a temporary outage that affected the availability of certain features.

    What Happened

    The issue occurred during the upgrade process when the rapid and simultaneous restarting of multiple system components led to a higher-than-usual number of download requests within a short time frame. This exceeded the limits set by the external service provider, disrupting the startup of critical services.

    How We Fixed It

    • Enhanced Access: At 3:10 PM ET, we upgraded our access to the external service, allowing for higher download limits. A new access credential was created and applied, which allowed the impacted services to restart successfully.

    • Configuration Update: We updated our system configurations to ensure more reliable access to required components in the future, reducing the likelihood of similar issues.

    What We've Done to Prevent This in the Future

    To prevent this from happening again, we took several steps:

    • Image Caching & Version Control: We implemented changes to cache frequently used components and pin them to specific versions, reducing the need to download them from external sources repeatedly and avoiding future rate limits.

    • Upgraded Service Plan: We upgraded our plan with the external service provider to a higher tier, increasing our allowed download capacity and providing more robust support for future operations.

  • Resolved
    Resolved

    The login fix was successful. Monitoring has proven stable. All access is restored.

  • Monitoring
    Monitoring
    We implemented a fix and are currently monitoring the result.
  • Identified
    Identified

    Diagnosis is complete. We are working on solving the main login issue now.

  • Investigating
    Investigating

    We are currently investigating this incident that's causing intermittent login issues.

System unavailable
  • Resolved
    Resolved

    Fix has been applied by Vercel. Access is consistently restored. All systems operational.

  • Update
    Update

    Access is back again. We will monitor to ensure it remains stable.

  • Update
    Update

    Vercel has identified the deeper root cause on their end and they are working on resolving the issue. Thousands of sites and systems around the world are having the same issue so the urgency drives the expectation this should be resolved within the next few minutes.

  • Update
    Update

    Vercel status has been integrated into our status page for the top section called "Web Application". Real time updates on their progress fixing the hosting of the front end interface can be monitored in the "Edge Functions" and "Edge Middleware" components.

  • Update
    Update

    Vercel is volatile right now and access is temporarily lost again. We are continuing to escalate with them.

  • Monitoring
    Monitoring

    Vercel is back up again. The web interface is available. We will monitor the status for a few more minutes.

  • Identified
    Identified

    Vercel is having an outage. GUIDEcx uses that system for hosting of the front end web interace. We are escalating with them now.

  • Investigating
    Investigating

    We are investigating unavailability of the web interface. The API and recipes are still working. No data has been lost.

Jun 2024

Project creation causing template duplication
  • Update
    Update

    Summary

    We recently encountered an issue related to a new feature release, which temporarily affected project creation and led to slower performance in some areas of our platform. Specifically, customers saw projects stuck in the "Creating" state with a loading screen that wouldn't go away.

    What Happened

    During the release of a new feature, a part of our project creation process encountered an unexpected issue. While projects were being created successfully, a background process responsible for completing the setup encountered an error, causing it to retry the process multiple times. This led to the creation of duplicate templates within projects and prevented some projects from completing their setup.

    Additionally, the repeated attempts to finalize project creation increased the workload on our system, leading to slower response times for certain actions, such as syncing data with external tools like Jira and Salesforce.

    How We Fixed It

    1. Rapid Response: Within 15 minutes of the issue being reported, our team implemented a fix to ensure that new projects could complete their setup process without any issues.

    2. Clean-Up: The cleanup of impacted jobs followed this schedule:

      • 2:30 PM ET: We completed tests of a script to remove duplicate templates. A total of 735 duplicates were removed, along with duplicate milestones, tasks, and attachments.

      • 5:46 PM ET: All projects stuck in a "Creating" state were updated, and we confirmed that known customer projects were accessible.

      • 5:56 PM ET: The system processed the remaining background tasks, and overall performance returned to normal as the load on our database decreased throughout the day.

    What We've Done to Prevent This in the Future

    To ensure this doesn't happen again, we took several steps:

    1. Enhanced Quality Control: We reviewed our release processes and introduced additional quality checks and safeguards, ensuring that similar issues would be caught before they reach production.

    2. Improved Monitoring: We reinstated and enhanced our monitoring systems to quickly detect and alert us to any issues affecting critical processes like project creation.

    3. System Improvements: We also made improvements to how our background processes handle spikes in activity, ensuring that our system remains responsive even during high-demand periods.

    We understand the impact this may have had on your experience, and we are committed to learning from this event to provide you with a more reliable service moving forward.

  • Resolved
    Resolved

    The fix was confirmed as successful. Any duplicate templates that were inadvertently added to projects created over the past few minutes will be automatically cleaned up this morning.

  • Monitoring
    Monitoring

    The fix has been applied and we are monitoring to confirm it was fully effective.

  • Identified
    Identified

    The cause has been diagnosed and we are in the process of applying a fix.

  • Investigating
    Investigating
    We are currently investigating this incident.

Jun 2024 to Aug 2024

Next