Lucidworks Platform logo

Lucidworks Platform Status Page

Healthcare IT · monitored by Alert24

lucidworks.com
All Systems Operational

Current Status

All Systems Operational

View Lucidworks Platform status page ↗

Components

Lucidworks Platform
Operational
Lucidworks AI
Operational
Agent Studio
Operational
Analytics Studio
Operational
Commerce Studio
Operational
UI Studio
Operational
Connected Search
Operational

Recent Incidents

Commerce Studio Rules Interface Disruption

minor

Jun 19, 2026 · resolved Jun 19

The issue affecting the Commerce Studio Rules interface has been resolved, and all previously affected services are now operating normally. Commerce Studio users can interact with the Rules interface and page through large datasets without experiencing page load errors. There remains no direct end-user impact on search experiences. The root cause of the incident was database contention triggered by interactions involving a large number of rules. To resolve the issue, we scaled up the database, which normalized performance and stabilized the system.

Lucidworks AI Hosted LLM Service Disruption

minor

May 28, 2026 · resolved May 28

## Summary On May 28, 2026, between 16:18 UTC and 17:53 UTC, Lucidworks AI hosted LLMs were unavailable in the `us-southcarolina` region. Customers using hosted LLM inference \(`llama-3-8b-instruct`, `llama-3v2-3b-instruct`, and `phi-4-multimodal-instruct`\) received 500 or 429 errors when attempting to query these models. Other SaaS Platform services, including search, embedding models, and the Lucidworks Platform UI, were not affected. Lucidworks Engineering declared a Sev1 incident at 16:54 UTC and began remediation efforts. All hosted LLM models were fully restored and operational at 17:53 UTC. ## Root Cause The incident was caused by a routine Kubernetes patch upgrade on a cluster in the `us-southcarolina` region.  LWAI-hosted models are served via [Ray Serve](https://docs.ray.io/en/latest/serve/index.html), which uses both “head” and “worker” nodes as part of its deployment system for routing inference requests.  The Kubernetes upgrade cycled node pools, causing all cluster head nodes and worker nodes to restart simultaneously.  Under normal conditions, the Ray cluster can tolerate a head node restart because worker nodes continue serving requests.  However, the node pool upgrades utilize a surge strategy where our platform waits for pods to leave the old node \(be evicted\) but not for them to be running on the new node. The platform considers the “drain” successful once the pod is gone from the old node and moves on to the next one, even if the pod is stuck in an “Initializing” state on the new node. This meant that all node pools were cycled in rapid succession, and both head and worker pods were evicted before any had finished initializing on their replacement nodes, resulting in a complete cluster outage. Recovery was prolonged by multiple compounding factors. First, one of the replacement head nodes was in a degraded state and unable to pull container images, requiring manual intervention to delete the node. Second, the LLM container images \(6-11 GB in size\) experienced abnormally slow Docker image transfer, taking 30-52 minutes compared to the typical 2-4 minutes observed in normal operation. Additionally, in a separate operation, new models were being brought online to expand our LWAI offering, and this caused the Ray operator's blue-green deployment strategy to require the existing _and_ replacement LLM deployments to be healthy before switching traffic, which extended the outage until the slower image pulls completed on both blue and green deployments. Lucidworks Engineering deleted the degraded node, waited for image pulls to complete on replacement nodes, and verified that all hosted models were responding to queries. The incident was verified as resolved at 18:07 UTC. ## Lucidworks Actions Lucidworks will take the following actions as a result of this incident: * Implement sequenced Kubernetes upgrade procedures for clusters hosting LLM workloads, ensuring each node pool is fully healthy before the next pool is upgraded. * Investigate Docker image pre-loading strategies \(such as pre-baked disks or image streaming\) to eliminate long container image pull times for large ML model images. * Open a support ticket with our cloud provider to investigate the abnormal Docker pull times. * Establish a notification protocol to coordinate Kubernetes maintenance windows with LLM service owners to avoid conflicts with ongoing deployments. * Improve tooling around our Ray clusters to allow Lucidworks Engineering to force a failover to a blue or green state instead of waiting for Ray to automatically resolve the new and old deployments. ## Recommended Client Actions Lucidworks recommends that clients subscribe to Lucidworks status updates to receive real-time notifications about Lucidworks SaaS Platform incidents. To enable this feature, click **Subscribe to Updates** at [status.lucidworks.com](http://status.lucidworks.com).

SaaS Platform Access Disruption

critical

May 12, 2026 · resolved May 13

## Summary On May 12, 2026, at 21:19 UTC, the Lucidworks SaaS Platform experienced an access disruption. Users encountered errors when attempting to use the Lucidworks Platform UI \([platform.lucidworks.com](http://platform.lucidworks.com)\). Other SaaS Platform services, including all search and Lucidworks AI functionality, were not affected. Lucidworks Engineering identified the issue at 21:25 UTC and began remediation efforts. The platform was fully restored and operational at 23:41 UTC. ## Root Cause The incident was caused by a database schema change that was included alongside an application code update. The Lucidworks Platform runs multiple instances of each application simultaneously, and new code is rolled out incrementally. New application instances are brought online while existing instances continue serving traffic. In this case, the database schema was updated before all application instances had received the new code, causing existing instances to query a column that no longer existed.The schema change removed a database column before all running instances of the application had been updated to use the new schema. During the rollout window, existing application instances attempted to query the removed column, resulting in errors that prevented users from accessing the platform. Lucidworks Engineering deployed a fix that restored the affected database column and repopulated the necessary data from the most recently-taken database backup. The incident was verified as resolved at 23:41 UTC. ## Lucidworks Actions Lucidworks will take the following actions as a result of this incident: * Enforce phased rollout procedures for database schema changes, ensuring destructive modifications are deployed separately from application code updates to maintain backward compatibility through the rollout window * Update automated code review tooling to detect and flag destructive database operations, adding an additional layer of defense during the review process * Evaluate automated incident creation from critical alerts to reduce the time between detection and formal incident response The Lucidworks engineering team is committed to ensuring this type of incident does not recur. These enhancements will strengthen the platform's resilience and reliability for all customers. ## Recommended Client Actions Lucidworks recommends that clients subscribe to Lucidworks status updates to receive real-time notifications about Lucidworks SaaS Platform incidents. To enable this feature, click **Subscribe to Updates** at [status.lucidworks.com](http://status.lucidworks.com).

SaaS Platform Service Routing Disruption

none

Apr 24, 2026 · resolved Apr 24

## Summary On April 24, 2026, at 16:05 UTC, the Lucidworks SaaS Platform experienced an issue that made the entire platform temporarily unavailable and all requests returned IO errors. Lucidworks Engineering was made aware of the issue and reverted the change on April 24, 2026, at 16:08 UTC. The reversion took a few minutes to propagate through the system and successful responses were restored by April 24, 2026, at 16:13 UTC. ## Root Cause The incident was caused by a configuration change to the SaaS Platform gateway components that was deployed to production at 16:05 UTC on April 24, 2026. This change introduced an error in the configuration template that caused it to render syntactically invalid output in the production environment. When the deployment system processed this invalid configuration, it interpreted the malformed output as an indication that the gateway components \(routing and load-balancing infrastructure\) were no longer needed and removed these components from the platform. This caused all incoming requests to fail with IO errors as there were no gateway instances available to handle traffic. Lucidworks engineers were actively monitoring this deployment as it rolled out, and immediately noticed the problem that the deployment introduced. The configuration change was quickly reverted at 16:08 UTC. The deployment infrastructure then correctly reinstated the gateway components, which became fully operational by 16:13 UTC and restored normal platform operation. ## Lucidworks Actions Lucidworks has taken the following actions as a result of this incident: * Added pre-deployment validation to verify that configuration templates render syntactically valid output before deployment, and that changes of this nature are deployed to a development environment before deploying the changes to production * Modified the deployment workflow to require additional verification steps before pull requests can be merged ## Recommended Client Actions Lucidworks recommends that clients subscribe to Lucidworks status updates to receive real-time notifications about Lucidworks SaaS Platform incidents. To enable this feature, click **Subscribe to Updates** at [status.lucidworks.com](http://status.lucidworks.com).

platform.lucidworks.com user interface unavailable

critical

Mar 6, 2026 · resolved Mar 6

## Summary On March 6, 2026, at 05:05 UTC, the Lucidworks SaaS Platform experienced a user interface disruption. Users were consistently redirected to a 404 error page when attempting to access [platform.lucidworks.com](http://platform.lucidworks.com). Other SaaS Platform services, including all search and LWAI functionality, were not affected. Lucidworks Engineering became aware of the issue at 05:40 UTC and the inaccessibility of the user interface was addressed by 05:45 UTC by reverting the most recently-deployed UI change. At 06:13 UTC, Lucidworks Engineering confirmed that access to the user interface had been fully restored and the platform was operational in its entirety. ## Root Cause The incident was caused by a configuration change to one of our currently in-development UI services that inadvertently affected routing across the entire platform. This change was carried out at 04:57 UTC. Following the code rollout, Lucidworks Engineering became aware that the SaaS Platform user interface was unavailable at 05:40 UTC during routine verification of this deployment. The deployment configuration setting for that UI service was left blank, which caused the service to claim routing priority for all platform traffic instead of only its intended path. The resulting global path misconfiguration caused the platform to become unavailable and redirected users to a 404 error page. Reverting this change restored the SaaS Platform to its previous configuration. The incident was verified as resolved at 06:13 UTC. ## Lucidworks Actions Lucidworks will take the following actions as a result of this incident: * Establish clear guidelines for infrastructure-sensitive changes * Include more explicit warnings to configuration files for critical settings and update internal documentation to highlight high-risk configuration areas Additionally, Lucidworks will take the following actions to further enhance our ability to detect, withstand, and respond to similar incidents in the future: * Enhance automated pre-deployment testing by integrating validation tools that run before any deployment to ensure errors are visible to engineering teams earlier in the process * Enhance real-time monitoring for routing anomalies, adding alerts for unexpected traffic patterns, and creating dashboards to provide visibility into platform routing health The Lucidworks engineering team is committed to ensuring this type of incident does not recur. Lucidworks is implementing multiple layers of prevention, detection, and response improvements. These enhancements will strengthen our platform's resilience and reliability for all customers. ## Recommended Client Actions Lucidworks recommends that clients subscribe to Lucidworks status updates to receive real-time notifications about Lucidworks SaaS Platform incidents. To enable this feature, click **Subscribe to Updates** on [status.lucidworks.com](http://status.lucidworks.com).

Get alerted when Lucidworks Platform goes down

Alert24 monitors Lucidworks Platform and 3,700+ other cloud and SaaS providers. When an outage is detected, it updates your status page automatically and pages your on-call team. No manual updates at 2 AM.

Start free — no credit card

More Healthcare IT status pages