Hunting for 20GB of missing disk space where the usual culprits turned out to be innocent...
The server in question is an Azure VM running CentOS 7 that does
batch processing of large log files.
Logs are streamed to the box via
rsyslogd, processed, then archived as
hourly compressed files courtesy of logrotate.
All very straight forward stuff.
I’d seen that the available disk space had started to go down dramatically. Usually the machine runs with well over 50% disk free and has a very predictable disk usage pattern, but now it was eating through the available buffer space at an alarming rate.
df showed me a couple of GBs free when we’d typically have
$ df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 1.7G 0 1.7G 0% /dev tmpfs 1.7G 0 1.7G 0% /dev/shm tmpfs 1.7G 968K 1.7G 1% /run tmpfs 1.7G 0 1.7G 0% /sys/fs/cgroup /dev/sda2 30G 27.8G 2.1G 94% / <-- unexpected! /dev/sda1 497M 64M 433M 13% /boot /dev/sdb1 50G 4G 44G 8% /mnt/resource tmpfs 345M 0 345M 0% /run/user/1002
Next I ran
du over the root directory and
ordered the result by large entries, e.g.
du -chs /* | sort -h. I had perhaps
expected to find some large set of log files copied out to a user
directory or similar. That
didn’t happen, instead
du gave me a total disk used of 8GB.
A little discrepancy is expected, but a 20GB gap is not.
It is not completely unexpected to find the results of
du don’t match. You might think
df/disk-free should complement
du/disk-used, and they sort of do, but they
coming at values in different ways.
- df shows the blocks allocated in the entire file system, including inodes and other meta data.
- du shows the blocks actually allocated to an individual file.
One typical reason for the discrepancy is one or more processes hanging
on to file handles that are attached to deleted files. You can easily
confirm this with
lsof. The fix is often a process restart (or sending a suitable signal)
to whichever process has the handle…
$ lsof | grep deleted $
However that is not our issue this time.
Where else can disk space go?
Clutching at straws a little, I wondered if the disk was very fragmented, although to lose multi GBs of space seemed a big stretch. I also don’t think I have ever had to defrag an XFS file system, but a bad plan is better than no plan, indeed this proved true here as it pointed me in the right direction.
xfs_fsr is the tool required.
Running this threw an error
about a file too large to move after a couple of minutes. The error listed
the large file’s inode number and size. The size was around 20GB. Suspicious…
xfs_fsr works by copying files to new (disk) locations, so this
error makes sense when our total disk is only 30GB. It also tells us that we
really do have a very large file somewhere.
Trying to go from the inode to the file path proved unsuccessful:
$ find / -xdev -inum 17313616 $
Accidentally Hiding Data
To recap so far:
- We don’t have any deleted files still being used.
- We do appear to have an inaccessible large file with an inode we can’t convert to a file path.
Searching a little online for some pointers I came across an interesting
forum post about disk mounts that can obscure other file locations.
That’s an interesting idea.
This server does have two disks in use: one standard disk
/, one temp
disk (runtime only, data lost on reboot, mounted at
I was able to prove this was the issue by remounting the primary disk with mount’s
I tried this with
mount --bind / /tmp/tmpmount, so
/foo was also available
du -chs /tmp/tmpmount then gave me 28GB not 8GB
and sure enough it is because
/tmp/tmpmount/mnt/resource existed with a big
20GB temporary log file inside.
I could fix by deleting the file from
$ df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 1.7G 0 1.7G 0% /dev tmpfs 1.7G 0 1.7G 0% /dev/shm tmpfs 1.7G 968K 1.7G 1% /run tmpfs 1.7G 0 1.7G 0% /sys/fs/cgroup /dev/sda2 30G 4.7G 25G 16% / <-- more like it! /dev/sda1 497M 64M 433M 13% /boot /dev/sdb1 50G 17G 30G 36% /mnt/resource tmpfs 345M 0 345M 0% /run/user/1002
How did this happen?
Here is the theory:
- The virtual machine starts up/reboots.
rsyslogstarted listening for data and streaming it out to
- It did this before the temp disk had been mounted.
- Once the mount took place the original file was hidden.
duis going through all files summing their size, but won’t see
- The file then continued to grow as each hour when logrotate ran it could not see any files to operate on.
rsyslogd would have half resolved the issue. Files would then start
appearing in the right place, but the existing large file would be
left and stuck there forever.
Fixing The Bad Mount
systemd has support for saying ‘only start this service after X condition is
met’. Often we think of that as another service, but you can include mount
points as well (run
systemctl list-units --type=mount to see what’s
available). If this is news to you then check out Digital Ocean’s
systemd page which is a helpful condensed guide. We also don’t need to
override the system installed
rsyslog unit file, which might get lost on
upgrade. We can instead place a file in a new sub-directory. Our fix
below assumes the core file is at
$ cat /usr/lib/systemd/system/rsyslog.service.d/wait-for-mount.conf # Ensure our temp disk for log storage is ready to go before starting [Unit] After=mnt-resource.mount
Mystery solved and fixed for the future!
Postscript - Is this really fixed?
Coming back to this draft ahead of hitting publish and it turns out the fix might not be as straight forward in all cases. Let me explain a little…
The Azure temporary disk mount
appears to be a little different to normal disk mounts.
waagent.service is responsible for the
/etc/fstab, which will just fail for this case), however setting
After=waagent in your unit config doesn’t resolve this issue as
systemd considers the
waagent service running as being the same as ready.
I had wondered about using
systemd auto-restart on
rsyslogd with a
line based on the mount directory, but if the condition isn’t met it won’t then
try to load it again (it’s different to having a process fail and get auto
Some online suggestions are along the lines of having a script that checks for the mount on a timer
and then kick starts the required service. I decided to
go for a simpler
hacksolution and just set an
ExecDelay=30s in the
rsyslog extra unit file. It does the job and considering the wider processing
happening here is temporary anyway it doesn’t seem worth creating