OMS Service Interruption and Performance Degradation (11 February 2026) – Final Post‑Mortem

OMS Service Interruption and Performance Degradation (11 February 2026) – Final Post‑Mortem

Status
Closed (19 February 2026)

Executive summary

On 11 February 2026, the OMS application became unavailable for a short period and then remained reachable but unstable and slow for some users. The initial outage was caused by inconsistent private DNS resolution. After DNS correction restored availability, additional performance issues persisted due to configuration mismatches following a recent server migration and inefficient real-time connection behaviour on specific endpoints. A series of fixes (including authentication and WebSocket-related changes) and a capacity upgrade resolved the instability.

Customer impact

  1. OMS was unavailable starting at 09:10 CET on 11 February 2026 and was restored at 09:55 CET after DNS correction.
  2. After restoration, OMS was reachable but some users experienced intermittent slowness and instability until the final fixes and capacity changes were completed.

Timeline (CET)

  1. 09:10 – OMS became unavailable.
  2. 09:55 – DNS issue corrected, functionality restored.
  3. After restoration – Performance remained slow; investigation identified migration-related configuration mismatches.
  4. Later that day – Configuration changes attempted; some changes caused application restarts; remaining corrections planned outside business hours.
  5. 19 February 2026 – Incident confirmed resolved after hotfixes (WebSocket/header and 2FA fallback) and production capacity scaling due to CPU saturation.

Root cause(s)

This incident had multiple contributing causes:
  1. Inconsistent private DNS resolution
    1. This caused the initial OMS unavailability. The DNS issue was corrected and availability returned.
  2. Post-migration configuration mismatches
    1. After DNS correction, the system remained slow due to configuration mismatches following a recent server migration.
  3. Inefficient real-time connection behaviour on specific endpoints.
    1. Investigations pointed to authentication failures on specific endpoints and inefficient fallback behaviour that created avoidable load. The /socket.io endpoints showed very long durations consistent with long polling behaviour, increasing CPU/memory pressure and degrading responsiveness.
  4. CPU exhaustion on the production instance.
    1. The remaining degradation was ultimately attributed to CPU saturation; additional capacity (vertical scaling) stabilized performance.

Resolution

Actions taken to resolve and stabilize OMS:
  • Corrected the DNS issue to restore availability.
  • Implemented fixes related to WebSocket/header behaviour and 2FA fallback.
  • Increased production capacity (vertical scaling) after confirming CPU saturation.
  • Improved resource assignment to reduce CPU spikes and improve OMS performance.

Preventive measures (what we changed / are improving)

We are implementing and validating the following improvements to reduce recurrence risk:
  • Strengthen real-time connection authentication handling and token behaviour validation.
  • Expand monitoring and improve early detection of abnormal patterns (so performance regressions are visible quickly).
  • Ensure monitoring baselines are available during deployments to enable before/after comparison.

What you need to do

No customer action is required.
If you still experience slowness or errors, contact support and include:

• Approximate time of occurrence
• What action you were performing
• Any error message or screenshot

    • Related Articles

    • OMS and EOL Sync Request (API)

      Synchronization Management OMS and EOL API calls refer to requests made by a client application to a server through an API (Application Programming Interface). An API call typically involves sending a request from a client application to a server, ...
    • OMS Account Password Reset

      HOW TO Password reset Is the action of invalidating the current password for an account on a website, service, or device, and then creating a new one. A password may be reset using the settings of the software or service, or by contacting the ...
    • OMS Status Definition and Terminologies

      Status Descriptions & Terminologies in OMS An Order Management System (OMS) is a sophisticated software solution designed to streamline and optimize the end-to-end process of managing customer orders within a business. It serves as the central hub ...
    • OMS Account Unblocked

      DESCRIPTION: A blocked account refers to an account that restricts unlimited or indiscriminate withdrawals or other forms of access. Instead, it imposes specific limitations on when, how much, and by whom funds can be withdrawn. On means “Account is ...
    • OMS Account Deletion

      HOW TO Delete Account means the action of deleting an Account. This removes the Account from the database entirely. Until it is fully purged from all databases. PROCESS Check OMS Open the client account Open OMS User Manager Check the Email account ...