Whoops, your production server ran out of space. This is just a a quick guide on how to fix it without down time.
First of all this happened because an EC2 instance does not report disk usage nor memory usage to Cloud Watch. This might be because every operating system has it’s own special way of monitoring these metrics. If you ever tried to get these values from the system programmatically, you understand. So let’s start by resizing the EBS Volume for the EC2 instance.
Expanding the EBS Volume
Go to the EC2 console, Volumes and select your volume, click on Modify. Then increase the size of the volume, this operation will take a while. Once the console reports it is “completed (100%)” , we can continue to follow this AWS Guide. Below is steps I followed, refer back to the guide for more information.
Check what file system you have with (FYI this was done on a Linux CentOS distro):
# sudo file -s /dev/xvd*
It reported back and mine is XFS. We can check how much disk space we have and whether they can be expanded with:
# df -h
Now install some tools:
# sudo yum install xfsprogs
Here my root volume is xvda1, now we extend the physical Partition:
# sudo growpart /dev/xvda 1
Then extend the file system (check guide for explanation):
# sudo xfs_growfs -d /
Verify that it worked with:
# df -h
That’s it! The magic of the cloud, expanding a Hard Disk while it’s running..
Log Disk Usage and Memory Usage to CloudWatch
There are numeruos methods to monitor and send these metrics to CloudWatch. Instead of reinventing the wheel and writing your own script that runs on a cron, consider this AWS Guide. It looks like a lot of work, and it is actually (depending what OS your running), but these are scripts and tools they build purposefully to solve this problem.
Why AWS does not report Disk and Memory usage by default, is above me..
Once again, below will just expand a bit on the guide and have some extra notes. Please refer back to it. These commands where done against a CentOS Linux 6.9 distro, so we will follow the Redhat part of the guide.
First install the required packages (this is going to take a while):
sudo yum install perl-DateTime perl-CPAN perl-Net-SSLeay perl-IO-Socket-SSL perl-Digest-SHA gcc -y
sudo yum install zip unzip
Press enter and go through the process until you have installed all 4 parts and your screen looks like:
cpan> install YAML
cpan> install LWP::Protocol::https
cpan> install Sys::Syslog
cpan> install Switch
Then install the monitoring scripts in /apps/monitoring:
# sudo mkdir /apps/monitoring
# cd /apps/monitoring
# sudo curl https://aws-cloudwatch.s3.amazonaws.com/downloads/CloudWatchMonitoringScripts-1.2.2.zip -O
# sudo unzip CloudWatchMonitoringScripts-1.2.2.zip && sudo rm CloudWatchMonitoringScripts-1.2.2.zip
# cd /apps/monitoring/aws-scripts-mon
# sudo cp awscreds.template awscreds.conf
Now we need to jump to the AWS Console, go to IAM, create a new user and give them the following policy:
This just allows the user to put metrics to CloudWatch, make sure to select Programmatic access only and then save the Key and Secret. Below, open the credentials file of the scripts and fill these in with our new Key and Secret :
# sudo nano awscreds.conf
Now test if these scripts actually work.
# ./mon-put-instance-data.pl --mem-util --verify --verbose
The command reports back with the API request id made to put the metric on CloudWatch. Go to the Console, CloudWatch, Metrics and then under the Custom Namespace you will find that it send the Memory utilization.
Then lastly create a 5 minute Cron job that send the memory and the disk space to CloudWatch.
# sudo EDITOR=nano crontab -e
*/5 * * * * /apps/monitoring/aws-scripts-mon/mon-put-instance-data.pl --mem-used-incl-cache-buff --mem-util --mem-used --mem-avail --disk-space-util --disk-space-used --disk-space-avail --disk-path=/ --from-cron
That’s it. You can now setup Alarms in CloudWatch to notify you if the Disk Utilization gets above say 85% percent and the Memory above 97%.
FYI, the memory threshold is considerably higher as Linux (and the programs that run) handles memory differently, so it is hard to tell when you really have memory problems.