Current Status
All Systems Operational
Components
Recent Incidents
Replication failures
noneJun 10, 2026 · resolved Jun 10
## Problem Description, Impact, and Resolution At 19:50 UTC on June 10, 2026, we observed a small fraction of publishes originating from US-EAST-1 failing to replicate to subscribers globally. We removed the degraded publisher pod from service and the issue was resolved at 21:21 UTC on June 10, 2026. The root cause of the incident was triggered by a single process that fell into a degraded state where it continued receiving inbound traffic and passing health checks, but traffic sent outbound from the process was failing at an abnormally high rate. Our automated health check/recovery system did not auto-detect and replace the degraded process because its health check API reported itself as healthy. ## Mitigation Steps and Recommended Future Preventative Measures To prevent a similar issue from occurring in the future, we are improving the data within our health check APIs to return more complete performance metrics over a rolling time window. We are enhancing the issue detection logic to detect more patterns that infer process failure, even if the process itself is reporting as healthy.
Global Errors and Failures with Publish and Functions, Delays with Events & Actions
noneJun 9, 2026 · resolved Jun 9
### **Problem Description, Impact, and Resolution** At approximately **13:20 UTC on June 9, 2026**, we observed elevated publish errors and message replication failures in our publish/subscribe service, which also caused latency in other PubNub services globally. Customers may have experienced increased publish error rates, delayed or missed message delivery, delayed message persistence, and increased latency for Functions and Events & Actions workflows. The root cause of the incident was an unusually large concentration of global publish traffic that was not limited by our throttling layers. That traffic created resource pressure in the publish and replication layers, increased load on storage systems, and caused downstream processing delays in dependent services. We mitigated the issue by adjusting targeted traffic controls, increasing capacity for affected publish and replication components, and isolating the high-volume traffic pattern to reduce broader platform impact. The issue was resolved at approximately **13:48 UTC on June 9, 2026**. This issue occurred because we did not have sufficient automated controls and isolation processes in place to protect shared infrastructure from this type of exceptional traffic pattern. As a result, the increased load affected multiple services before mitigation could be fully applied. ### **Mitigation Steps and Recommended Future Preventative Measures** To prevent a similar issue from occurring in the future, we have isolated the identified high-volume traffic pattern onto dedicated infrastructure and also increased baseline capacity for the affected components across our PoPs. We are also further strengthening our traffic detection and management processes for exceptional load patterns so they can be identified and contained earlier without cascading impact across dependent services.
Connectivity Issues Affecting a Subset of Subscriptions
noneMar 24, 2026 · resolved Mar 24
### **Problem Description, Impact, and Resolution** On March 24, 2026, at 19:27 UTC, one network shard experienced intermittent connectivity affecting a subset of customers. The affected users may have experienced elevated latency and temporary error responses related to their subscription requests. The instability was caused by an atypical surge in message volume within a shared processing environment that had improperly configured resource limits. This led to high resource utilization and triggered automated system restarts. PubNub Engineering resolved the issue by implementing the proper limits after expanding infrastructure capacity to accommodate the increased load. Service was fully stabilized once the environment was tuned to the new traffic profile. ### **Mitigation Steps and Recommended Future Preventative Measures** **Infrastructure Tuning:** Adjusted automated scaling parameters to provide greater headroom for rapid traffic fluctuations. **Enhanced Traffic Management:** Deployed refined monitoring heuristics to better isolate and manage high-volume traffic patterns without impacting shared resources. **Dynamic Resource Allocation:** Accelerating the rollout of enhanced vertical scaling technology to allow individual processing nodes to adapt more fluidly to demand spikes. **Operational Coordination:** Strengthening internal protocols for high-capacity events to ensure large-scale traffic shifts are proactively transitioned to dedicated environments.
Delay in Publishing Messages to Storage Globally
minorJan 1, 2026 · resolved Jan 1
### **Problem Description, Impact, and Resolution** On January 1, 2026 at 00:00 UTC, we observed elevated latency in our History service across multiple regions. Customers may have experienced delays in message persistence and history availability during this period. The issue was caused by a mismatch in newly created persistence tables. Specifically, required columns for message metadata were missing from the new tables, resulting in failed write operations and backed-up queues. This created downstream pressure on our storage systems, leading to higher latency in history processing. We mitigated the issue by manually applying the correct updates across all affected persistence spaces. After the updates were applied, message processing returned to normal and queue latency cleared. This issue occurred because we did not have proper controls in place to ensure schema consistency for newly generated monthly persistence tables. ### **Mitigation Steps and Recommended Future Preventative Measures** To resolve the issue, we manually applied the required schema updates globally. In the coming days, we will update our change management processes to ensure schema changes are correctly applied to all future monthly tables. We are also auditing our schema tracking and automating validation to prevent inconsistencies across environments. These improvements will ensure that future table generation includes all necessary columns and reduce the risk of similar issues impacting History service performance.
Increased errors observed and resolved
minorNov 20, 2025 · resolved Nov 20
**Problem Description, Impact, and Resolution** Starting at 16:46 UTC on Nov. 20, 2025, we noticed a small number of errors with the publish API in the North American and Asia Pacific regions. The system automatically recovered with all functionality fully restored by 16:50 UTC on Nov. 20, 2025.
Get alerted when PubNub goes down
Alert24 monitors PubNub and 3,700+ other cloud and SaaS providers. When an outage is detected, it updates your status page automatically and pages your on-call team. No manual updates at 2 AM.



