Investigating Out Of Storage Errors On Linux
One Thursday morning at work — and of course it's a Thursday — QA reported that background jobs are not running on our Laravel application on the staging environment.
We're using Horizon to manage the queues for us. Upon visiting Horizon's dashboard, I saw it was "Inactive", which means the process is constantly crashing; otherwise Supervisor would've restarted it.
So the first thing I needed to check was the Supervisor logs. Now, when I define a Supervisor program, I usually add a couple of lines to route process logs to a file, something like this
1redirect_stderr=true2stdout_logfile=/var/log/horizon/myapp.log3stderr_logfile=/var/log/horizon/myapp_error.log
This ensures that if something happens within the process, I can check what's going on.
Consider adding a log-rotate for these files if you have a busy server. I wrote about how to set up log-rotate to keep log files under control here
Now that we know it's a storage issue, we can make sure by running df, which is a shell command that reports disk usage; the -h just tells it to report sizes in human-readable format instead of bytes.
1$ df -h2Filesystem Size Used Avail Use% Mounted on3/dev/sda1 46G 46G 0G 100% /4 5# The rest of the devices omitted for brevity
Ahh! It seems our main storage mount point is at 100% usage, but what's using all that space?
For this, we can use df's sister command, du. du reports disk usage for directories and files rather than devices, so it allows us to drill down to see what directories and files are taking up too much space.
To begin with, we'll start with the same mount point that df reported was full, in this case — and probably yours it's /, to make our lives a bit easier we'll tell du to only go one level deep in directories, otherwise the results would include a ton of levels, introducing more noise than what we care about. Then we pipe the result into sort so we have a sorted list of directories, from largest to smallest.
1$ du / --max-depth=1 -h | sort -hr 246G / 313G /home 423G /var 56.0G /usr 61.2G /snap 7324M /opt 8271M /root 9243M /boot10121M /tmp1185M /run129.2M /etc
Ahh, we can see that /var is taking a lot of space, so now we can check that directory
1$ du /var --max-depth=1 -h | sort -hr223G /var323G /var/log4...
Logs... let's see which logs
1$ du /var/log --max-depth=1 -h | sort -hr223G /var/log319.9G /var/log/nginx4...
Hmmm, that's weird, nginx logs aren't supposed to take 19gigs, let's see what's happening there.
We'll run ls to list all files, we can also add S to the options list to sort them in order
1$ ls -lhS /var/log/nginx2total 19.9G3-rw-r--r-- 1 www-data www-data 19G Dec 4 11:39 myapp-error.log
Yepp, it's the app's error logs. After inspecting the log file, it seems we had an issue with the app, and we returned a ton of errors a few days back.
From here, we can archive the log file somewhere if we need to retain it, and then just empty it out
1$ echo '' > /var/log/nginx/myapp-error.log
Now you can check df to verify you've reclaimed that space. You can use the same technique to investigate other directories that are taking too much space; a common offender for Dockerized environments is /var/lib/docker/overlay2.
Until the next one 🤙