Current Status

All Systems Operational

View Sardine AI status page ↗

Components

Device APIs

Operational

Issuing API

Operational

Customer APIs

Operational

Dashboard

Operational

Crypto APIs

Operational

External Provider

Operational

Crypto Web

Operational

Recent Incidents

Elevated latency for EU instance API

minor

Jun 2, 2026 · resolved Jun 2

### Summary On June 2, 2026, some customers experienced elevated latency on `/v1/issuing/risks` and `/v1/customers` requests in `prod-eu`. We’re sorry for the disruption and for any delays this caused in customer risk decisions. ### What Happened A subset of issuing risk requests became slower than expected due to delayed customer data lookups. During the main impact window, affected requests could take up to approximately 5 seconds before continuing through a soft-fail path. ### Why It Happened The issue was caused by a customer lookup path not enforcing its intended short timeout. When some customer lookups stalled, they waited on the broader request deadline instead of failing fast at the operation-level timeout. ### What We’re Doing About It We deployed a hotfix to reduce the customer-load timeout and limit how long issuing requests can wait on stalled lookups. We are also adding follow-up improvements to reduce Datastore dependency in low-latency paths and investigate the underlying cause of the lookup stalls.

Degraded API Latency

none

May 22, 2026 · resolved May 22

#### Summary Our cache service experienced intermittent latency issues on May 22, 2026. Service has been fully restored. #### What Happened Our cache infrastructure experienced three distinct periods of elevated latency: * 22:23-22:31 UTC * 22:46-22:52 UTC * 23:00-23:05 UTC The system partially recovered between incidents but experienced cascading failures before full restoration at 23:05 UTC #### Why It Happened The incident began with an unusual spike in cache sets and cache deletes that stressed the caching infrastructure. #### What We're Doing About It * Implementing infrastructure improvements to prevent similar incidents resulting from noisy neighbor pattern * Enhancing cache monitoring, alerting and run books * Working with SME’s from our cloud provider to identify and address any contributing factors We apologize for the disruption and appreciate your patience.

Documentation is temporarily not available

minor

May 20, 2026 · resolved May 20

This has been resolved, documentation is available.

Dashboard instability while loading certain entities might ocurr

minor

May 12, 2026 · resolved May 13

**Impact:** During incident window * **Customer Intelligence Search** latency was degraded for queries spanning **>30 days** of data. * **Session Details** and **Customer Details** pages load were slow * **Connections Graph** and **Timeline** features were also impacted ## Executive Summary As part of infrastructure optimization, our development team performed multiple operations to our search databases to optimize index structure and data storage. This resulted in inefficient provision of our warm data cluster, and resulted in degraded performance. The team ultimately resolved the incident by updating data cluster configuration. Due to the volume of data, simple rollback was not possible, resulting in the long incident. ## Incident Details ### What Happened Our development team performed multiple operations to our search databases to optimize index structure and data storage. Due to bug in migration script, we migrated more data than initially anticipated. The destination cluster didn’t have sufficient storage and computing resources assigned. Latency started rising slowly as more data was migrated. This was initially dismissed as expected as we’re moving older data to separate clusters that are indeed slower but should remain within acceptable bounds. Two days later, on May 12, as the warm indices filled up as the migration completed, users began reporting that dashboard search was very slow. We then attempted upsizing the cluster but it was not able to upsize due to high traffic and large amount of data. Incident was resolved by our team manually reverted some of the operation. ## Timeline | Time \(PT, May 12\) | Event | | --- | --- | | **May 10, 23:38** | Automated operation around data migration was initiated, team was monitoring and didn’t report any issue | | **May 11, 00:00** | Latency starts climbing. Alerts were triggered but assumed as expected. | | **May 12, 6:02 AM** | Support reports dashboard slowness; on-call begins investigation | | **9:04 AM** | Incident formally created | | **10:56 AM** | First code fix deployed for customer details \+ session details | | **11:18 AM** | Deploy complete, pages still slow | | **12:25 PM** | Removed search dependency on Customer Profile \+ Session Details. Page Loads improved, Network Graph \+ Customer search still slow. | | **1:09 PM** | Root cause identified: indices incorrectly in warm tier; direct hot-tier migration initiated \(~10h estimated\) | | **3:05 PM** | Warm tier upsized aggressively migration still not converging | | **7:00–7:08 PM** | search cluster repeatedly auto-cancels in-flight shard recovery; direct migration abandoned | | **7:19 PM** | Switched to another approach of spinnig up new cluster | | **7:41 PM** | April indicies restored from snapshot; last-30d queries drop to ~15ms | | **9:03 PM** | February \+ March indicies restores complete | | **10:14 PM** | Replicas added to hot copies; search queue drops to 0. Incident resolved. | ## Action Items Immediate: * Manually rollback problematic resource allocation * Ensure all node pools have enough resources Medium Term Process Improvements: * Runbook and Migration process for search database upgrade operation * Better review process for Infra changes * Runbook for monitoring upgrade and immediate rollback * Observability in order to know if latency is expected

Dashboard latency when accessing certain items

minor

May 12, 2026 · resolved May 12

This incident has been resolved.

Get alerted when Sardine AI goes down

Alert24 monitors Sardine AI and 3,700+ other cloud and SaaS providers. When an outage is detected, it updates your status page automatically and pages your on-call team. No manual updates at 2 AM.

Start free — no credit card