As we run processes (software) on our systems, we start to consume the resources available to us. For example, if we have four CPU cores, each running at 3.0GHz, then we have a lot of compute power available to us, but it's still finite. If we have 16GB of RAM, then we have access to a lot of RAM, but it's still finite. The same applies to disk (storage) and network resources - they're finite in their ability to run parallel workloads but can repeatedly run workloads over and over with ease.
Although the resources we have available to use can continuously run forever - or until a hardware failure - they can only run so much at any one time. That's why we measure the performance of our systems: to make sure we're running just the right amount of workload on them so that they're being using efficiently but not overused that we get crashes, bottlenecks or poor performance.
The kind of metrics we're interested in at this point in time include:
- The percentage of CPU time used by a process or group of processes
- The amount of RAM being used by a process or processes
- How much disk I/O we're seeing from a process
- Network throughput per process
These are the basic metrics we want to be able to measure so that we can get an idea of what's happening on a system and what's causing it.
With this information we can find processes that are causing problems and attempt to optimise things.
We'll explore these metrics using simple tools available to us free of charge and in some cases, built-in to the operating system.