Current Status

All Systems Operational

View Socure status page ↗

Components

Sigma Identity Fraud

Operational

Global Watchlist Screening with Monitoring

Operational

SigmaDevice iOS

Operational

Sigma Identity Fraud

Operational

Watchlist Standard

Operational

Sigma Synthetic Fraud

Operational

SigmaDevice Android

Operational

Sigma Synthetic Fraud

Operational

Predictive DocV

Operational

Watchlist Plus

Operational

DocV Android

Operational

Predictive DocV

Operational

Admin Dashboard

Operational

KYC

Operational

Watchlist Premier

Operational

DocV iOS

Operational

KYC

Operational

Developer Hub

Operational

Email RiskScore

Operational

Device Risk - WebSDK

Operational

Recent Incidents

Document Request Link Generation Degradation

minor

Jun 16, 2026 · resolved Jun 16

**Root Cause Analysis** ### Incident: Unable to generate DocV links via dashboard **Date Range:** Tue, Jun 16, 2026 8:52 am – 10:12 am ET **Impact:** Document Verification link generation via Admin Dashboard **Status:** Resolved ### **1. Summary** On Jun 16, 2026, a scheduled deployment introducing stricterOn June 16, 2026, a scheduled deployment introducing stricter access control logic inadvertently broke the Document Verification \(DocV\) link generation workflow for a subset of client configurations. The issue persisted for approximately 1 hour and 20 minutes after being reported by a client. The incident was mitigated via a full rollback. We are actively overhauling our release validation protocols and implementing shadow-mode deployment capabilities to ensure enterprise permission configurations are thoroughly tested before enforcing hard access restrictions in the future. ### **2. Timeline** | **Time \(ET\)** | **Event** | | --- | --- | | **June 16, 5:16 AM ET** | A production deployment was completed, introducing updates to the system's permission logic. This logic contained a defect rendering the DocV request links feature unusable. | | **8:52 am ET** | Clients started reporting Admin Dashboard DocV link generation stuck in loading for some users. | | **9:04 am–9:24 am ET** | Upon determining that the problem stemmed from recent updates to role and permission provisioning within the admin dashboard, the team resolved to execute a revert. | | **10:00 am–10:08 am ET** | The rollback was verified in a staging environment before being deployed to production, followed by thorough regression and sanity testing. | | **10:10 am ET** | The client confirmed that the Document Verification link generation workflow was fully restored. | | **10:11 am ET** | The incident was resolved. | ### **3. Root Cause** * **Primary Root Cause:** * Stricter permission validation logic was deployed in the morning that was meant to make our permission model more consistent across pages. However, this update caused unintended disruptions for specific role setups, preventing authorized users from accessing the document request link generation process. ### **4. Resolution** * To restore functionality, the team implemented a rollback of the stricter permission enforcement logic. Following the deployment, we performed validation of the admin dashboard's link generation flow and verified that the service was operating correctly. ### **5. Corrective and Preventive Actions** | **Action** | **Description** | **ETA / Status** | | --- | --- | --- | | Revisit stricter access control with a safer rollout approach | Reintroduce the permission enforcement in shadow mode first using a safer rollout approach \(e.g., staged rollout and additional validation\) to prevent unintended impact to existing customer role configurations. Communicate with impacted customers to guide them in updating their permission settings | **07/03** | | Add monitoring for errors/issues in document link generation | Add monitoring for errors in document link generation to be notified of and address issues in the future proactively. | **07/03** | | Expand validation of access control changes to cover all user configurations | Update our release validation process so access-control and permission changes are tested against a broader range of role setups \(including environment-scoped scenarios\) prior to production release. | **07/03** | ### **6. Lessons Learned** * Permission/environment enforcement needs testing across representative customer role setups. Dark-launch/shadow-mode logging to measure who would be blocked, communicating to customers, and establishing a rollout plan is a safer path than hard enforcement. ### **7. Next Steps & Ongoing Commitment** We are committed to _Corrective and Preventive Actions_ noted above.

RETROACTIVE: RiskOS API Degradation

none

Jun 16, 2026 · resolved Jun 16

**Root Cause Analysis** **Incident:** RiskOS API Services Degradation During NAT Gateway Remediation **Date Range:** 06-16-2026 **Impact:** Intermittent degradation for a subset of API requests **Status:** Recovered ## **1. Summary** On June 16th, 2026, RiskOS API services experienced intermittent degradation for a subset of API requests between **3:46 AM EDT and 4:02 AM EDT**. During this period, some requests failed while network connectivity was being transitioned as part of a controlled remediation. Earlier that day, Socure observed elevated connection error alerts and identified critically high Cloud NAT port utilization in the RiskOS commercial environment. The condition was not impacting existing customer transactions at the time of detection, but it created a significant capacity risk for outbound connectivity if additional workloads restarted, scaled, or deployed. To reduce the risk of broader impact, Socure performed a controlled network remediation during a lower-workload window. The remediation improved capacity handling and reduced shared NAT resource contention across origin IP blocks. During the transition, some existing connections were briefly interrupted and had to be re-established, resulting in intermittent request failures. The remediation activity completed in approximately 5 minutes, and overall recovery and stabilization completed within approximately 15 minutes. No customer action was required after connectivity was restored. ## **2. Timeline** | **June 16th,** **2026 / Time** | **Event** | | --- | --- | | 12:00 AM EDT | The Socure Engineering team observed elevated connection error alerts and began an investigation. No customer transaction impact was identified at this time. | | 12:30 AM EDT | Investigation identified critically high Cloud NAT port utilization, creating a capacity risk for outbound connectivity. SRE confirmed that existing customer transactions were not being impacted at that time. | | 12:30–3:45 AM EDT | The Socure Engineering team evaluated remediation options, assessed the risk of leaving the configuration unchanged, and planned remediation during a lower-workload window to reduce potential customer impact. | | 3:45 AM EDT | The engineering team proceeded with the planned network capacity remediation. | | 3:46 AM EDT | Intermittent request failures began as some existing network connections were briefly interrupted during the transition to the updated configuration. | | 3:52–4:00 AM EDT | Network configuration updates were applied to improve capacity handling and reduce shared NAT resource contention across origin IP blocks. | | 4:02 AM EDT | Intermittent request failures ended after traffic completed transition to the updated configuration and connections were re-established. | | ~4:02 AM EDT | Platform connectivity stabilized. | | ~4:15 AM EDT | Incident considered resolved after validation and monitoring confirmed recovery. | ## **3. Root Cause** **Primary Root Cause:**The incident was caused by critically high Cloud NAT port utilization in the RiskOS commercial environment. The NAT configuration had insufficient available port capacity to provide safe headroom for normal connection churn, workload restarts, scaling activity, or deployments. **Contributing Factors:** * The residual NAT capacity risk was a consequence of emergency recovery activities undertaken after the June 9 RiskOS service outage, where the primary focus was restoring service availability. * NAT port utilization reached approximately 99–100%, creating a significant risk of outbound connectivity degradation. * NAT capacity was shared across origin IP blocks, which increased the likelihood of port exhaustion under workload pressure. * During remediation, some existing connections were reset while traffic transitioned to the updated network configuration, resulting in temporary intermittent request failures. ## **4. Resolution** Socure updated the NAT configuration to improve capacity handling and reduce shared-resource contention across origin IP blocks. The remediation was performed because the existing NAT utilization level presented a significant reliability risk, and delaying the change could have increased the chance of broader service degradation. During the transition, some existing connections were reset and had to be re-established. Once the updated configuration took effect, intermittent request failures stopped and platform connectivity stabilized. No customer action was required after connectivity was restored. Requests retried after recovery were expected to process normally. ## **5. Corrective and Preventive Actions** | **Action** | **Description** | **ETA / Status** | | --- | --- | --- | | Reduce NAT capacity contention | NAT usage was separated to reduce shared capacity risk and limit the blast radius of future NAT capacity issues. | Completed | | Enable dynamic port allocation | Dynamic port allocation was enabled to improve NAT port capacity management and reduce exhaustion risk. | Completed | | Improve NAT utilization monitoring | Add or tune monitoring for NAT port utilization, dropped packets, and connection errors with actionable alert thresholds before utilization reaches critical levels. | Completed | | Move network configuration to Infrastructure as Code | Continue moving RiskOS network configuration into version-controlled Infrastructure as Code to improve reviewability, repeatability, and recovery as a part of AWS Migration. | Planned | ## **6. Lessons Learned** * NAT port utilization should be treated as a critical reliability signal for workloads that depend on outbound connectivity. * NAT capacity should have sufficient headroom for normal connection churn, workload restarts, scaling events, and deployments. * Remediation of high-risk network conditions can still create temporary impact when existing connections are reset, so change planning should include clear validation and communication steps. * Network changes should include pre-change checks for current utilization, available NAT capacity, expected connection reset behavior, and rollback options. * Continued investment in environment isolation, Infrastructure as Code, and stronger monitoring will reduce the likelihood and impact of similar incidents. ## **7. Next Steps & Ongoing Commitment** Socure recognizes that customers rely on RiskOS to be available and reliable. We take accountability for the disruption caused during this remediation and are continuing to improve our network architecture and operational controls. To reduce the likelihood of recurrence, Socure is pursuing the following: 1. **NAT Capacity Controls:** We are improving monitoring and alerting for NAT utilization, connection drops, and port exhaustion risk so that capacity issues can be identified and remediated before they impact customer traffic. 2. **Infrastructure as Code:** We are continuing to move RiskOS network configuration to a version-controlled model so that every change is reviewed, repeatable, and can be recovered quickly. 3. **Environment Isolation:** We are separating network resources and reducing shared capacity dependencies so that the impact of any capacity issue or network change is contained. 4. **Safer Network Operations:** We are strengthening pre-change validation, rollback planning, and maintenance window selection for network changes that may affect customer traffic. 5. **Operational Runbooks:** We are updating operational runbooks for NAT capacity exhaustion and Cloud NAT changes so that response and recovery steps are repeatable and clearly documented.

RETROACTIVE: [Sev 2] OTP Verifications Degradation

major

Jun 15, 2026 · resolved Jun 15

On June 15, 2026, between 12:15 AM and 1:00 AM EDT, the OTP Verification endpoint experienced elevated error rates. The system recovered automatically at 1:00 AM EDT. No action is required from customers. Our investigation confirmed the issue was transient. All systems are operating normally. A Root Cause Analysis (RCA) will be published once our investigation is complete.

RETROACTIVE: Increased Latency for DocV

minor

Jun 13, 2026 · resolved Jun 13

Between 1:22 PM and 1:31 PM ET, DocV experienced a latency spike due to an internal node upgrade. During this 9-minute window, some customers may have experienced increased latency or timeout errors. The change was reverted right after the latency spike and DocV performance has returned to normal. We apologize for any inconvenience this may have caused.

Admin Dashboard Transaction Search Latency

minor

Jun 11, 2026 · resolved Jun 11

The issue causing increased latency and intermittent timeouts on the Transaction Search page within the Admin Dashboard has been resolved, and functionality has returned to normal. API requests and transaction processing were not impacted during this incident. We apologize for any inconvenience and appreciate your patience.

Get alerted when Socure goes down

Alert24 monitors Socure and 3,700+ other cloud and SaaS providers. When an outage is detected, it updates your status page automatically and pages your on-call team. No manual updates at 2 AM.

Start free — no credit card