Part 1. Introduction
is the prettier, more colorful, and slightly more up-to-date version of
. A few metrics such as steal and iowait are easier to see in
, but for most other purposes,
may be the better tool for troubleshooting server performance issues.
Here is our article on
, we’d recommend you start there and then come back to
if you haven’t already read this:
, this is an excellent skill to develop and will help you assist you in diagnosing any performance issues on your websites and server. It’s easy to use once you know the basics.
The goal of this article is to get you to a place where you can run the
command on your server and understand the information that it’s displaying. Armed with this information you’ll be able to troubleshoot any performance issues you discover.
will display a real-time overview of what’s happening on your server. In a nutshell, you’ll be able to see the following at a glance: –
- The number of individual CPU’s on your server and their resource usage broken down individually
- RAM usage broken down by processes, buffer and disk cache
- Swap usage
- Total number of tasks, thread count, and the number of running tasks
- 1 minute, 5 minutes, and 15 minutes load averages
- System uptime and current time (by default this is set to UTC time)
- A break down of running processes
A note on system users – we highly recommend putting each of your websites on their own system user. Not only is important to keep your websites isolated from each other (a hacked/malware infected website can’t affect sites on a different system user), it will also show in the USER column, so you can easily identify any site with high resource usage using either
. For example, if my website was assigned to a system user called “steve”, in the USER column, you’ll see the user “steve” and know exactly which site you need to look into. If they were all on the default “gridpane” user, you won’t be able to tell what’s going on.
To run the
command you will need to SSH into your server. Please see the following articles to get started:
Generate your SSH Key:
Add your SSH Key to GridPane:
Connect to your server:
Once inside your server run:
use Q or CTRL+C.
Part 2. The
summary area – what it all means
The summary area at the beginning is the system overview. Below we’ll cover what each part means, section by section.
Here, with the numbers 1 and 2, we can see this is a 2 core VPS. A 4 core will have 4 bars, 8 core = 8 bars, and so on. Each number/bar represents one CPU. Each bar has a % on the right-hand side indicating how much CPU is in use.
Specific CPU usage is then broken down by processes via the following color code:
- Blue: The % of CPU used by low priority processes
- Green: The % of CPU used for user processes
- Red: The % of CPU used by system/kernel processes
Bonus points: –
- Yellow/Orange: The % of CPU used by IRQ time
- Magenta: The % of CPU used by Soft IRQ time
- Grey: The % of CPU by IO Wait time
- Cyan: The % of CPU by Steal
maybe easier to understand in terms of what exactly is consuming CPU, but what we’re looking to see is just how busy are the CPU/s. In this article, the bars are mostly empty, which means the CPU is mostly idle.
The higher your CPU usage (full bars, higher percentage), the busier your system is, and this will affect the performance of your websites.
Bursts of high usage are normal and nothing to be concerned about. Sustained high CPU usage is something that requires further attention.
RAM and Swap Usage
This section shows information regarding the memory usage of the system. We have RAM and Swap – Swap is part of your hard disk that’s used by your server like it uses RAM. This is why having spare disk space is important for your system, and why performance issues can arise when your disk space usage gets past 90%. If you use software such as Photoshop, Illustrator, or other heavy Adobe Creative Cloud apps you’ll likely have experienced the same thing on your computer.
RAM usage is broken down by colors.
- Green: Used memory pages
- Blue: Buffer pages
- Yellow: Cache pages
Swap space is your safety net for if you run out of RAM. High swap and RAM usage will affect your system’s ability to process tasks efficiently, potentially results in 504 time out errors. Processes will get slower and backed up while they wait for memory to become available.
For clarity, Tasks and Processes for our purposes are the same thing. Here we have 44 total processes, with 1 currently running. 100 “thr” means “threads”. So here we have 44 tasks that are broken up into 100 threads.
This is the exact same breakdown that the top command gives. If you haven’t already read that article, please check it out (link at the top and bottom of this article). In a nutshell, here we can see the average “load” over one, five, and fifteen minutes.
Load is measured per CPU, so if you have more than one core, you will need to divide what you see by the number of cores your server is running. Sustained, high load times means your server is consistently busy. This could be due to a number of reasons, such as high website traffic or a brute force attack. The rest of the data provided by htop should help fill in the gaps.
System Uptime and Current System Time
Here you can see how long the server has been online for, and the current system time,. For our purposes, this isn’t particularly important.
Part 3. Active processes
So you now have an overview of what’s happening on your server at a glance. If you’re seeing high resource usage, we now need to look at the processes themselves to see what’s responsible.
The active process list is the lower part of the overview. This is highlighted below:
Below is a quick break down of what each column displays: –
- PID: Displays a task’s unique process ID
- USER: The username of the user who started the task
- PRI: Shows the scheduling priority of the task.
- NI: Represents the nice value of the task. We covered this above in the CPU section in part 2. Nice affects the priority of a task – 19 being the lowest priority, -20 being highest
- VIRT: Shows the total amount of memory used by the task
- RES: Shows the memory consumed by the process in RAM (in kb)
- SHR: Represents the amount of shared memory (in kb) used by the task
- S: The current status of the process (zombie, sleeping, running, uninterruptedly sleeping, or traced).
- CPU%: Represents the percentage of CPU used by each process
- MEM%: Shows the percentage of total available RAM used by each process
- TIME+: The total amount of CPU Time used by the task, shown in hundredths of a second.
- COMMAND: The name of the running process
Below these in the footer are
‘s menu items. More on these below.
Before anything else, you can scroll the process list with the arrow keys – vertically and horizontally. This will come in handy as you get to know the basics.
The bottom menu
You can exit any of the following commands by hitting Esc.
- F1 Help: Opens a basic intro page detailing a lot of what’s in this very section
- F2Setup: Customise features – you can probably leave it alone.
- F3 Search: Open up a search bar and run a search for processes
- F4Filter: Filter processes by typing (e.g. filter all mysql tasks by typing “mysql”)
- F5 Tree: Show processes in a tree view
- F6 SortBy: Sort processes by specific columns
- F7 Nice –: Increase the priority of a task by clicking on it or navigating to it with your arrow keys and hitting F7
- F8 Nice +: Decrease the priority of a task by clicking on it or navigating to it with your arrow keys and hitting F8
- F9 Kill: Kill a process
- F10 Quit: Close
You can use any of the following shortcuts simply by pressing them on your keyboard. There are more available (press F1 for more details while inside
), but for troubleshooting purposes, these are the main ones you’ll want to learn.
- u: Display all processes owned by a specific user
- p: Sort processes based on high CPU consumption
- m: Sort processes based on high memory consumption
- t: sort process on time.
- Space: Navigate to the process and hit space to highlight it
- Shift + u: Remove all tags
- Shift + f: Highlight and follow a process.
Part 5. Using information inside
to troubleshoot performance issues
The commands in part 4 can yield a lot of information to help you pinpoint performance issues. You’ll likely find that you’ll use F3, F4, F9, u, p, m, and t when you’re inside
What we’re looking for here is what tasks in the COMMAND column are using the most CPU% and MEM%. Once we know this we can begin further diagnosis.
For example, if you’re seeing high PHP usage, it’s possible your server is experiencing a brute force attack. Using the above commands you should able to narrow down on a specific website and take action accordingly.
High MySQL may indicate database table locking or lack of proper caching.
If you haven’t already read the
article, we highly recommend you do so here:
You may also want to check out the Security and Performance sections of our knowledgebase here:
And more specifically, our Diagnosing Performance Issues and 504 Timeouts article here:
And how to use WordPress Debug and Query Monitor:
You can also learn more about top directly on your server with: