On August 19, 2019 In Blog, Tech

Discourse Memory on DigitalOcean Droplet Swap

Discourse Memory on DigitalOcean Droplet Swap

Had my hands busy and playing with lots of new stuff, more fun with Python, a little bit on Angular (TypeScript/Javascript), less Linux bash since the last major tech/tutorial was posted.

Notably the Synology SNMP Network Monitoring with LibreNMS Docker tutorial post that has gathered 6000 views!

So why did I start off talking about SNMP?

Discourse Forum

I have a closed discourse forum for a group of friends where we just discuss matters, tutorials and system setups.

Check Discourse out if you are looking for a forum that doesn’t rely on those few famous PHP systems

My Discourse forum is hosted on DigitalOcean , using a Droplet with 2GB RAM

Recent problem

It all started a while back (2-3months?), where I started to get ICMP ping and SNMP down Event logs on LibreNMS but didn’t dig further as it wasn’t consistent.

Maybe it’s normal as its a remote VPS compared to servers in a local network

Not.

Most of my servers/systems are monitored, and Discord started giving me more downtime notifications…

It didn’t happen before until recently

Discourse on Discord Notification

Discourse on Discord Notification

Now that’s weird. So I decided to check on LibreNMS Event Logs and found that the entries were even more consistent compared to a week ago

Discourse LibreNMS Eventlog

Discourse LibreNMS Eventlog

That definitely doesn’t look good at all!

My LibreNMS’s Alert settings are configured to have a delay/interval check of 10mins, so any devices that come back up within that 10mins timeframe will not have a notification sent to Discord

CDN Implementation

Just a few days ago I implemented CDN capabilities onto Discourse to serve faster from different countries

After the implementation, the first issue happened where the forum gets stuck halfway through, so I thought the DNS propagation took longer than it should (ping/traceroute points back to the CDN server), or the CDN is still pulling static assets off of Discourse… wrong wrong wrong!

Time to dig and research

As it won’t be very efficient by guessing the problem, I will just go into detail on all the LibreNMS RRD graphs and try to pinpoint the problem

The one that came up real quick was this

Discourse RRD - Detailed Memory Usage

Discourse RRD – Detailed Memory Usage

Without having an abundance of experience deploying Linux systems, its enough to know that RAM is important for Servers and it shouldn’t be on a tight rope, so 5% of Free RAM definitely spells trouble, as my Droplet was not using a swapfile

The server ran out of RAM?

So let’s Google search for Discourse memory problems!

Found this link: https://meta.discourse.org/t/last-couple-updates-consuming-more-memory/70208 which was in 2017

and found a related post: https://meta.discourse.org/t/discourse-swap-memory-1gb-warning/90480

SWAP…? But… DigitalOcean says…

DigitalOcean strongly discourages the use of Swap on their Droplets

https://www.digitalocean.com/community/tutorials/how-to-add-swap-space-on-ubuntu-16-04

DigitalOcean against the use of swap

DigitalOcean against the use of swap

Right, so this private discourse forum already costs $10USD per month, and I would have to increase the costs if I were to increase the RAM

Will stick to using a swapfile for now, and if it really does go out of hand, I might have to dig deeper by reducing the number of Discourse workers

swapon!

And so, following DigitalOcean’s https://www.digitalocean.com/community/tutorials/how-to-add-swap-space-on-ubuntu-16-04 own tutorial on enabling swapfile, I have created a 1GB swapfile to see how things will work out

Discourse after enabling swap

Discourse after enabling swap

The red box indicates when the swap file was added as you see the drop in RAM usage

Low Swap I/O

Low Swap I/O

Kinda fixed

Discourse forum is back to normal and LibreNMS stops throwing SNMP is down events. Will need to keep an eye on the swap I/O activity if it gets out of hand

SNMP to the Rescue

You can tell by now, the importance of SNMP and/or any other monitoring tools. LibreNMS provides RRD historical graphs, rather than logging into the server and running htop or top, waiting to catch for errors or CPU spikes

Having a monitoring solution helps gives you an idea of what could have gone or actually went wrong. Without LibreNMS sending me the notifications, I wouldn’t have guessed that Discourse was going through RAM troubles and restarting itself

If you are on a similar setup and discourse is giving you problems, try to enable swap if you have not, it might just fix all your problems

Scroll Up