Recently I was using ECS on AWS with attached EFS storage, running a spikey workload. We encountered challenges with slow load times and redeployment issues, which we resolved by implementing Elastic Throughput mode for EFS.
The Problem: Slow Storage Layer
After a thorough examination, we noticed that while our ECS instance seemed to function properly during regular operations, it encountered significant challenges during redeployments. These issues manifested as prolonged load times and deployment complications. In certain cases, the ECS instance would even become trapped in a death loop, repeatedly failing health checks upon service restarts. Our investigation revealed that the root cause of these problems lay within the EFS storage layer. Specifically, the default Burst mode, which imposed throughput limitations determined by the file system's size, was leading to delays and performance bottlenecks within our application.
Bursting throughput utilization was reaching its limits.
The Solution: Elastic Throughput Mode
To address the issue, we decided to leverage Elastic Throughput mode for EFS, as recommended by AWS for workloads with unpredictable or spiky characteristics. This mode dynamically adjusts the throughput based on network traffic and the stored data volume. Further details can be found here: https://docs.aws.amazon.com/efs/latest/ug/performance.html#throughput-modes
By utilizing Elastic Throughput, we no longer encounter any "utilization warning."
The Results: Improved Load Time and Redeployment
Upon configuring EFS to use Elastic Throughput mode, we witnessed a substantial enhancement in load and redeployment times. The ECS task now takes approximately 8 minutes to redeploy, compared to over 15 minutes previously. Additionally, the web gateway (service) restart time matches the ECS task boot time at around 2.5 minutes. We have also eliminated any issues with the health check or the ECS task terminating itself.
Conclusion - EFS Elastic Mode for the Win
We are extremely pleased with the positive outcomes resulting from the adoption of Elastic Throughput mode for EFS. It has significantly reduced our load and redeployment times by nearly half while enhancing application reliability and performance. If you are utilizing AWS ECS with EFS storage and spikey workloads, we highly recommend exploring this (Elastic Throughput) mode and experiencing the difference firsthand.