1. Ticket Details
Ticket ID | AMM-1402 |
Severity | Critical |
Category | Performance |
Affected Module / Feature | 104\ECD |
2. Issue Description
In Assam production not able to perform any operations. Unable to close the calls(both inbound and outbound). CPU utilization also more than 70-80%.When executing ecd monthly reports creating more cpu utilization&performance issues.
3. Root Cause Analysis (RCA)
Use one or more of the following techniques for RCA:
5 Whys
Fishbone (Ishikawa) Diagram
Fault Tree Analysis
Log/Trace review
Primary Root Cause Identified:Huge dataset extracted from production server and mysql sleep connections also not closing properly
Why it happened: Extraxted ecd monthly reports containing more than 1 lakh records
Why wasn't it caught earlier? Bulk report data extraction testing not happened in Dev&Preprod environments
4. Corrective Actions (Fixes for this instance)
Action | Owner | Target Date | Status |
Suggested Operations team not to extract bulk report data from Primary production servers Enabled General log in Production to analyze further server performance impacted queries #general-log=1 #general_log_file="ASSAM-ECD.log" Analyzed slowquery log queries and optimized with required indexes. | Anil | 2025-05-06 | Closed |
Immediate actions to resolve the issue and restore service.
5. Preventive Actions (To prevent recurrence)
Action | Owner | TargetDate | Status |
---|---|---|---|
We can plan dump the frequently used monthly reports into tables through db events at non-peak production hours. Extract data from replication server | Anil |
Systemic changes, monitoring, alerting, process updates, or automation.
6. Verification of Effectiveness
Will monitor Server impacted queries in “SlowQueryLog”&”General Logs”
7. Lessons Learned
Not to extract Bulk reports from Production server at peak hours .