Current Status
All Systems Operational
Components
Recent Incidents
Elevated errors on core and API systems (US)
minorJun 7, 2026 · resolved Jun 7
**Elevated errors on core and API systems \(US\)** Date: Jun 7, 2026 · Window: 06:58 – 07:11 AM GMT\+07:00 \(~13 minutes\) **What happened** For approximately 13 minutes, parts of our core and API systems in the US region were degraded, and clients experienced elevated error rates. A backend database reached its storage capacity, which briefly prevented it from completing write operations. **Root cause** An atypical workload consumed database storage at a rate faster than our systems had previously encountered. While automated scaling did respond, it did not add capacity quickly enough to keep pace with the abnormal rate before storage was exhausted. **Resolution** Additional storage capacity was brought online, after which error rates returned to normal and all systems fully recovered.
Partially Degraded Performance [US region]
majorJan 21, 2026 · resolved Jan 21
**Incident Date:** 2026-01-21 **Impact:** System degradation and intermittent downtime. **Primary Cause:** Infrastructure **resource exhaustion** triggered by an unprecedented high-volume traffic surge. ## 1. Summary On January 21, an unprecedented surge in traffic, peaking at **450,000 requests per minute \(~56.25x baseline\)**. While application servers autoscaled successfully, the Core Database became the bottleneck. Despite two manual vertical scaling interventions, the system experienced two periods of degradation before stabilizing as database capacity finally matched the demand. ## 2. Root Cause The root cause of the incident was **infrastructure resource exhaustion** resulting from insufficient database overhead to accommodate a sudden traffic spike. * **Traffic Volume:** An unprecedented surge in external demand drove platform traffic significantly beyond predicted growth, increasing from a baseline of **8,000 req/min** to a peak of **450,000 req/min \(a ~56.25x increase\)**. * **Scaling Operation Time:** Vertical scaling of the Core Database required a **10–30 minute operation time** per event. During these intervals, the system remained degraded as incoming demand outpaced both available capacity and recovery speed. ## 3. Optimizations & Corrective Actions Based on the investigation, we will implement the following technical safeguards: #### **A. Transition Impacted Queries to Secondary Nodes** * **Action:** Reconfigure remaining database queries to target Secondary \(Read\) Replicas rather than the Primary node. * **Goal:** Offload significant pressure from the Primary database. By reducing the load on the Primary node, we ensure it retains enough resource overhead to improve the scaling and recovery time. This prevents the Primary from being choked by contention, allowing it to complete vertical scaling operations much faster during a surge. #### **B. Optimize Autoscaling Performance \(Server & Database\)** * **Action:** Review and tune autoscaling policies for both the App Tier and Database Tier to specifically reduce operation time. * **Goal:** Decrease the "Time-to-Ready" for new resources. By optimizing scaling triggers and resource warm-up procedures, we ensure capacity is provisioned more rapidly, improving the system's overall recovery time during a sudden spike.
Partially Degraded Performance [SG region]
majorDec 30, 2024 · resolved Dec 30
This incident has been resolved.
Partially Degraded Performance [SG region]
majorDec 12, 2024 · resolved Dec 12
This incident has been resolved.
Partially Degraded Performance [SG region]
majorDec 11, 2024 · resolved Dec 11
This incident has been resolved.
Get alerted when Social Plus goes down
Alert24 monitors Social Plus and 3,700+ other cloud and SaaS providers. When an outage is detected, it updates your status page automatically and pages your on-call team. No manual updates at 2 AM.





