14 Dec 2018

What is actually on this VPS?

Every couple of years I rebuild my main development VPS for some reason or another. Months ago I realised that I had gradually worked my way up the DigitalOcean spec list but was now using a fraction of the resources. I held off rebuilding too soon though as I was relying on the IP address assigned for access to a particular work network. However, now that situation is resolved with something far less brittle it's time to get on with it.

Never An Easy Task

Rebuilding this VPS has always been a hassle. The mix of development and ‘production’ stuff means I have to actually stop and consider all the different moving parts it handles. With that in mind, once I get my head around untangling this mess, I’m going to try to split it up so future development will just live on some local VM (or directly on my laptop). Everything that needs to be public and consistently available will live somewhere else, ideally deployed in a non-hand-operated way so it’s easy to manage (and move!) in the future.

The Current State

As it stands the VPS is sort of a dumping ground of stuff. When writing code I find it easy enough to make the effort setting up a git repo and creating basic docs, partly because the process of making commits helps break up the work into logical sections. The issue is everything else, most of which are experiments, projects and sites that aren’t used or needed anymore. I need a way to figure out what to move, what to archive and what to delete.

Step 1 - Tackling Apache

90% of the development on this VPS is web stuff. I figure I should look at what Apache is currently running regards to virtual hosts and use that as a starting point. apachectl -S gives this information.

root@job:~# apachectl -S 2>&1 | grep namevhost | wc -l
27

27 is a few more than I expected, I could list maybe 5! Although the output is given on a port level, so there are a few duplicates in there for sites running on HTTP and HTTPS. After taking that into account there are still plenty of sites I’d forgotten about. For the sites I want to keep I copy their configuration into a migration folder in my home directory, mirroring the original location (e.g. ~/migration/etc/apache2/sites-enabled).

Some of the sites are just placeholders for domains used for non-web projects. For each of those I setup a static Netlify site. That way I don’t have to keep manually managing them on some future server.

Step 2 - Ports…

Aside from Apache on 80/443, what else is running and listening for connections that I’ve setup and long forgotten about? Find out with lsof -i -P. The -i is because I’m only interested in internet network files (lsof is ‘List Open Files’, and all the things are files on unix). -P is because I want numeric port numbers rather than having lsof name the ports for me. I may well have picked a port for my project that is actually well-known for protocol X unbeknown to me.

The port listing gave me the following:

22 - SSH 25 - Postfix 80, 443 - Apache covered already… 3000 - Some mojolicious app? 3306 - mysqld 8081 - Apache again? 9222, 9223 - Chrome browser

SSH and Postfix have no special configuration on this server, so nothing to keep track of there.

The 3000 traced to a now unused project that uses the Mojolicious perl web framework. That whole chunk of code is already neatly in a repo, I’d just forgotten it was running.

The 8081 reminded me of a whole set of CI/build stuff I’d run ages ago. I removed most of it but must have left the underlying Apache Listen directive in place. None of the active VirtualHosts were using the port so it didn’t show up in my early apache investigation.

The 9222/9223 was part of a web-tracking project from some time ago where I was monitoring sites for interesting content changes. That, it turns out, was not very neatly in a repo. I put a bit of effort into fixing the state of this, writing down everything I could remember about the project into a README file, copying configuration files and scripts and then pushing to BitBucket for another day. I doubt I’ll pick this particular project up again with this codebase, but no harm in keeping it around.

Step 3 - All About That Data(base)

I cannot think of a project on this VPS where I needed a local database and didn’t use either MySQL or a SQLite file. My tendency with SQLite files is to store them in the same folder as the code, sometimes even as part of the repo itself if the data is that essential (probably a blog post for another day). That just leaves mysql to probe further.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
[...]
+--------------------+
30 rows in set (0.00 sec)

Of those 30, four are just MySQL’s own databases (information_schema, mysql, performance_schema and sys` dbs). 19 had ‘_wp’ as a postfix which I use to signify a WordPress site. That leaves 7 others.

For most of those running show tables jogged my memory enough on the project and whether it was worth keeping. One database however didn’t ring any bells. For that I poked around a bit and saw it had next to no data in it. It looked like only a subset of WordPress tables - a bit strange! Running a local mysql instance means I can look at the underlying data on the filesystem and see the last modified time. That suggested the database in question hasn’t been updated for over two years. I decided to just delete it.

I use trust mysqldump to get databases into a portable file I can reload later. I expect to only reload a few, and none of them have a large amount of activity, meaning no need to do anything more scalable (e.g. master/slave to get a copy elsewhere then switch the roles etc).

Step 4 - Home Directory Files

Almost anything else of interest is in my home directory. There is a general rule in computer science that time spent sorting at the point of storage means quicker retrieval (and vice versa). I have never found this so obviously true than when working through the files in my home directory.

I try to put the effort in at the point of storage. I name files with a .temp extension (or stored in a ~temp/ folder) when they really can be deleted at any time. I also try to keep everything that is a repo under a ~dev/ folder. In theory that just leaves interesting files remaining, which mostly works out.

Step 5 - Packages

Just before calling time I thought checking the packages installed could be useful to remind me of other interesting things this VPS does. On Ubuntu, /var/log/apt/* and /var/log/dpkg.log* can be grepped to give a list of installed packages. This is only semi useful though as the files are deleted by logrotate after a year (at least by default), so you can’t rely on this as a complete record of all time for old servers.

Looking Ahead

I’m toying with the idea of using a kubernetes cluster for everything that’s production, i.e. the sites/projects that are public facing and likely to be around for a while. In a way it’s overkill, but an increasing use of kubernetes at work means the move doesn’t have a large learning curve. It should also keep the production stuff easy to re-deploy in the future, or scale up/down as required. I like the idea of being able to do this, even if it’s unlikely in practice to actually be required for the code I’m running.

DigitalOcean are now doing managed kubernetes clusters, which appeals. I definitely don’t want to manage my own cluster on my own servers. Whilst it could be interesting to do, overall I think it’s probably just a lot of hassle for the minimum load I’d put on it.

For everything else, which is all the mini development projects, the intention is use a local virtual machine. I’ll try keep this in a state where it never acts as the sole copy of something I want to keep. That way it should be easier to wipe out, or migrate, in a short timeframe without me stressing that I might have nuked something really important.

Time will tell if this future Falkus utopia development environment will actually happen. I was going to set the challenge of finishing this project by the end of 2018. Old VPS completely deleted. Production kubernetes cluster all up and running. However, considering how long even writing this article took I think I probably need to scale back that goal. My new goal is to get this site, falkus.co, moved before the new year. If you see any weird errors or big outages between now and then you’ll know why!

Dev Kubernetes MySQL Netlify Perl SysAdmin
Back to posts