Investigating Out Of Storage Errors On Linux

One Thursday morning at work — and of course it's a Thursday — QA reported that background jobs are not running on our Laravel application on the staging environment.

We're using Horizon to manage the queues for us. Upon visiting Horizon's dashboard, I saw it was "Inactive", which means the process is constantly crashing; otherwise Supervisor would've restarted it.

So the first thing I needed to check was the Supervisor logs. Now, when I define a Supervisor program, I usually add a couple of lines to route process logs to a file, something like this

1redirect_stderr=true
2stdout_logfile=/var/log/horizon/myapp.log
3stderr_logfile=/var/log/horizon/myapp_error.log

This ensures that if something happens within the process, I can check what's going on.

Consider adding a log-rotate for these files if you have a busy server. I wrote about how to set up log-rotate to keep log files under control here

Now that we know it's a storage issue, we can make sure by running df, which is a shell command that reports disk usage; the -h just tells it to report sizes in human-readable format instead of bytes.

1$ df -h
2Filesystem Size Used Avail Use% Mounted on
3/dev/sda1 46G 46G 0G 100% /
4 
5# The rest of the devices omitted for brevity

Ahh! It seems our main storage mount point is at 100% usage, but what's using all that space?

For this, we can use df's sister command, du. du reports disk usage for directories and files rather than devices, so it allows us to drill down to see what directories and files are taking up too much space.

To begin with, we'll start with the same mount point that df reported was full, in this case — and probably yours it's /, to make our lives a bit easier we'll tell du to only go one level deep in directories, otherwise the results would include a ton of levels, introducing more noise than what we care about. Then we pipe the result into sort so we have a sorted list of directories, from largest to smallest.

1$ du / --max-depth=1 -h | sort -hr
246G /
313G /home
423G /var
56.0G /usr
61.2G /snap
7324M /opt
8271M /root
9243M /boot
10121M /tmp
1185M /run
129.2M /etc

Ahh, we can see that /var is taking a lot of space, so now we can check that directory

1$ du /var --max-depth=1 -h | sort -hr
223G /var
323G /var/log
4...

Logs... let's see which logs

1$ du /var/log --max-depth=1 -h | sort -hr
223G /var/log
319.9G /var/log/nginx
4...

Hmmm, that's weird, nginx logs aren't supposed to take 19gigs, let's see what's happening there.

We'll run ls to list all files, we can also add S to the options list to sort them in order

1$ ls -lhS /var/log/nginx
2total 19.9G
3-rw-r--r-- 1 www-data www-data 19G Dec 4 11:39 myapp-error.log

Yepp, it's the app's error logs. After inspecting the log file, it seems we had an issue with the app, and we returned a ton of errors a few days back.

From here, we can archive the log file somewhere if we need to retain it, and then just empty it out

1$ echo '' > /var/log/nginx/myapp-error.log

Now you can check df to verify you've reclaimed that space. You can use the same technique to investigate other directories that are taking too much space; a common offender for Dockerized environments is /var/lib/docker/overlay2.

Until the next one 🤙