Skip to content

systemd#

When a Linux system boots up, a lot of services need to be started to make the whole system work as one big unit. All these services need to be brought up in the right order, have dependencies on other services, and even dependencies on hardware status or network access.

Systemd takes care of all of that for us. Let's visualise systemd's architecture:

Systemd architecture

Systemd architecture
(Shmuel Csaba Otto Traian, CC BY-SA 3.0, via Wikimedia Commons)

There's a lot going on here, so let's just cover the primary means by which we're going to interact with systemd: "systemd Utilities".

Units#

First though, let's understand what a "unit" is with regards to systemd.

A unit is an object that manages something. It can be a task or an action. Units are used by systemd to manage all kinds of things: services, processes, mount filesystems, etc. It uses them to manage a lot of things. As administrators, or users of systemd, we define units (objects) and use them to define what it is systemd will manage for us.

Here's the unit file for our nginx service: cat /lib/systemd/system/nginx.service

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
[Unit]
Description=nginx - high performance web server
Documentation=https://nginx.org/en/docs/
After=network-online.target remote-fs.target nss-lookup.target
Wants=network-online.target

[Service]
Type=forking
PIDFile=/var/run/nginx.pid
ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf
ExecReload=/bin/sh -c "/bin/kill -s HUP $(/bin/cat /var/run/nginx.pid)"
ExecStop=/bin/sh -c "/bin/kill -s TERM $(/bin/cat /var/run/nginx.pid)"

[Install]
WantedBy=multi-user.target

We can see in here a [Unit] definition. We can see other objects being defined in here too.

If we look under [Service] we can see a few interesting items:

  • ExecStart
  • ExecReload
  • ExecStop

These give us information we can find useful, for example they tell us where the nginx binary is located (although there are better ways of finding this out.) We can use this .service file to understand that if we run systemctl start nginx, this will be executed: /usr/sbin/nginx -c /etc/nginx/nginx.conf.

What happens if we do this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
superman@develop:~$ sudo systemctl stop nginx
superman@develop:~$ curl http://localhost/
curl: (7) Failed to connect to localhost port 80: Connection refused
superman@develop:~$ sudo /usr/sbin/nginx -c /etc/nginx/nginx.conf
superman@develop:~$ curl http://localhost/
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

We got the same HTML back again. When we used systemctl to stop the service, the network socket was closed because the nginx service was stopped. But then we started it again by running the /usr/sbin/nginx binary directly.

That, among many other things, is what systemd is doing for us.

Let's move on to utilities now.

Utilities#

Systemd's architecture is complex and there are a lot of moving parts, but you, as an administrator, only really need to concern yourself with a few of these utilities at this point in time: systemctl and journalctl.

You'll find a need for other utilities in the future, but today all you really need to do is manage services and read the logs from them to diagnose problems.

systemctl#

This utility is used to manage systemd and services. On the architectural diagram above, it manages systemd and manager under "systemd Core".

Let's review the sub-commands you'll mostly use in your career:

  • status
  • start
  • restart
  • reload
  • stop
  • enable/disable

We can go over each of these briefly and explore them with our nginx service.

The format for using systemctl is simple: sudo systemctl <command> <unit>. The <unit> part is referring to the systemd units we discussed earlier.

status#

Before we start manipulating services, let's see how we can get information about services to begin with. To do this, we simply run: systemctl show nginx

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
superman@develop:~$ systemctl status nginx
● nginx.service - nginx - high performance web server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2022-03-29 04:40:54 UTC; 1min 35s ago
       Docs: https://nginx.org/en/docs/
    Process: 48818 ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf (code=exited, status=0/SUCCESS)
   Main PID: 48819 (nginx)
      Tasks: 2 (limit: 4613)
     Memory: 1.8M
     CGroup: /system.slice/nginx.service
             ├─48819 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
             └─48829 nginx: worker process

Quite a bit of information here. The most important thing here is the Active line: Active: active (running) since Tue 2022-03-29 04:40:54 UTC; 1min 35s ago. This tells us the service is active and has been for 1min 35s. Easy.

There's other information too: Main PID is the literal ID of the process. There are additional processes/threads too. Those processes can be seen at the end of the output:

1
2
3
CGroup: /system.slice/nginx.service
        ├─48819 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
        └─48829 nginx: worker process

These are the master and work processes for nginx. Note that not every process will have these. Software can behave however the developer wants it to, and this is how nginx works: it creates other processes for handling different things.

We can use those process IDs to check out what resources the nginx processes are using. Let's briefly do that now: top -p 48819

Note

The process ID for your master process will be different to mine. Make sure you're using the right ID.

1
2
3
4
5
6
7
8
top - 05:31:04 up 3 days, 23:14,  2 users,  load average: 0.04, 0.02, 0.00
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3932.0 total,   1693.0 free,    193.2 used,   2045.8 buff/cache
MiB Swap:   3936.0 total,   3936.0 free,      0.0 used.   3457.9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  48819 root      20   0    8756    868     12 S   0.0   0.0   0:00.00 nginx

We won't go into detail here as we'll do that in the next chapter. We can see the memory usage, CPU usage, and more, though. Interesting stuff.

Now that we know how-to check if a service is active or not, let's move on to the sub-commands we can use to manipulate services (objects in the systemd "domain").

start#

We used start earlier to fire up the nginx service earlier. After we did that, we were able to see that nginx started up and there was a process listening on TCP 80.

I think it's a simple concept to understand, so we'll move on.

restart#

Sometimes we need to tell a service to stop, but then start again. This is known as a restart. It's common to restart a service when you've made changes to configuration files or some other component that the service only loads once.

We can restart our nginx service: sudo systemctl restart nginx

1
2
superman@develop:~$ sudo systemctl restart nginx
superman@develop:~$

Not very interesting, but if we use status:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
superman@develop:~$ sudo systemctl status nginx
● nginx.service - nginx - high performance web server
     Active: active (running) since Tue 2022-03-29 05:32:54 UTC; 17s ago
   Main PID: 48927 (nginx)
     CGroup: /system.slice/nginx.service
             ├─48927 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
             └─48928 nginx: worker process

Mar 29 05:32:54 develop systemd[1]: nginx.service: Succeeded.
Mar 29 05:32:54 develop systemd[1]: Stopped nginx - high performance web server.
Mar 29 05:32:54 develop systemd[1]: Starting nginx - high performance web server...
Mar 29 05:32:54 develop systemd[1]: nginx.service: Can't open PID file /run/nginx.pid (yet?) after start: Operation not permitted
Mar 29 05:32:54 develop systemd[1]: Started nginx - high performance web server.

We can see log entries at the bottom of the output now. They show us that things stopped and started.

Our service is still active, as we would expect. The Main PID and master/worker process IDs have changed. That's because we restarted the process and it created new ones as a result.

reload#

The reload command is very similar to the restart command, but it sends a different signal to the process. In this case, it tells the process that it should reload any files or configuration, or anything else it needed during boot up, all over again. Let's reload our nginx service:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
superman@develop:~$ sudo systemctl status nginx
● nginx.service - nginx - high performance web server
     Active: active (running) since Tue 2022-03-29 05:32:54 UTC; 5min ago
    Process: 48913 ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf (code=exited, status=0/SUCCESS)
    Process: 48942 ExecReload=/bin/sh -c /bin/kill -s HUP $(/bin/cat /var/run/nginx.pid) (code=exited, status=0/SUCCESS)
   Main PID: 48927 (nginx)
     CGroup: /system.slice/nginx.service
             ├─48927 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
             └─48948 nginx: worker process

Mar 29 05:32:54 develop systemd[1]: nginx.service: Succeeded.
Mar 29 05:32:54 develop systemd[1]: Stopped nginx - high performance web server.
Mar 29 05:32:54 develop systemd[1]: Starting nginx - high performance web server...
Mar 29 05:32:54 develop systemd[1]: nginx.service: Can't open PID file /run/nginx.pid (yet?) after start: Operation not permitted
Mar 29 05:32:54 develop systemd[1]: Started nginx - high performance web server.
Mar 29 05:37:56 develop systemd[1]: Reloading nginx - high performance web server.
Mar 29 05:37:56 develop systemd[1]: Reloaded nginx - high performance web server.

The logs show that we reloaded the service, as expected. Our process IDs have stayed the same this time and there's an interesting new entry:

1
Process: 48942 ExecReload=/bin/sh -c /bin/kill -s HUP $(/bin/cat /var/run/nginx.pid) (code=exited, status=0/SUCCESS)

This used the systemd ExecReload configuration in the nginx unit to run a shell script that then sent a special signal to the nginx process: HUP. We're not going to go into this right now. It's too advanced.

What we ended up with is the same processes but with refreshed config files.

stop#

Like start, the stop sub-command is pretty simple to understand: it stops the service and therefore shuts down the processes. Let's stop nginx:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
superman@develop:~$ sudo systemctl stop nginx
superman@develop:~$ sudo systemctl status nginx
● nginx.service - nginx - high performance web server
     Active: inactive (dead) since Tue 2022-03-29 05:49:07 UTC; 9s ago
    Process: 48913 ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf (code=exited, status=0/SUCCESS)
    Process: 48942 ExecReload=/bin/sh -c /bin/kill -s HUP $(/bin/cat /var/run/nginx.pid) (code=exited, status=0/SUCCESS)
    Process: 48959 ExecStop=/bin/sh -c /bin/kill -s TERM $(/bin/cat /var/run/nginx.pid) (code=exited, status=0/SUCCESS)
   Main PID: 48927 (code=exited, status=0/SUCCESS)

Mar 29 05:32:54 develop systemd[1]: nginx.service: Succeeded.
Mar 29 05:32:54 develop systemd[1]: Stopped nginx - high performance web server.
Mar 29 05:32:54 develop systemd[1]: Starting nginx - high performance web server...
Mar 29 05:32:54 develop systemd[1]: nginx.service: Can't open PID file /run/nginx.pid (yet?) after start: Operation not permitted
Mar 29 05:32:54 develop systemd[1]: Started nginx - high performance web server.
Mar 29 05:37:56 develop systemd[1]: Reloading nginx - high performance web server.
Mar 29 05:37:56 develop systemd[1]: Reloaded nginx - high performance web server.
Mar 29 05:49:07 develop systemd[1]: Stopping nginx - high performance web server...
Mar 29 05:49:07 develop systemd[1]: nginx.service: Succeeded.
Mar 29 05:49:07 develop systemd[1]: Stopped nginx - high performance web server.

Now all the processes show as code=exited and we can see in the log files that we Stopped enginx. Our previous curl can prove this to us:

1
2
superman@develop:~$ curl http://localhost/
curl: (7) Failed to connect to localhost port 80: Connection refused

That completely encompasses starting, stopping, restarting, reloading, checking the status of, services managed by systemd.

We have one last sub-command to cover.

enable#

Check this line out from our nginx service:

1
2
3
superman@develop:~$ systemctl status nginx
● nginx.service - nginx - high performance web server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)

The Loaded line tells us something: enabled. This means something simple: if you restarted your server, the nginx service would start again during boot time. If you we did this: sudo systemctl disable nginx

1
2
3
4
superman@develop:~$ sudo systemctl disable nginx
Synchronizing state of nginx.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable nginx
Removed /etc/systemd/system/multi-user.target.wants/nginx.service.

And then checked the status again:

1
2
3
superman@develop:~$ sudo systemctl status nginx
● nginx.service - nginx - high performance web server
     Loaded: loaded (/lib/systemd/system/nginx.service; disabled; vendor preset: enabled)

Now it shows disabled. If you restarted now, ngxin would not start at boot time. Simple as that.

If you install something new, make sure it's enabled before you walk away from the system: if it goes down the service won't come back up if it's disabled.

journalctl#

Now we're going to learn some basics tricks for looking at the logs that come out of the software we're running on our system. This isn't specifically the logs that the software produces during its day-to-day operation, but the logs that it produces when it's starting up.

Note

Log files are complicated, but just know that journalctl isn't going to be a one-stop shop containing all logs from all services/software. You'll have to work with raw files and other systems too.

Let's start our nginx service again: sudo systemctl start nginx. Now let's look at the logs that came out of the service: sudo journalctl nginx

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
superman@develop:~$ sudo journalctl -u nginx
-- Logs begin at Fri 2022-03-18 07:22:17 UTC, end at Tue 2022-03-29 06:07:46 UTC. --
Mar 29 02:21:47 develop systemd[1]: Starting nginx - high performance web server...
Mar 29 02:21:47 develop systemd[1]: nginx.service: Can't open PID file /run/nginx.pid (yet?) after start: Operation not permitted
Mar 29 02:21:47 develop systemd[1]: Started nginx - high performance web server.
Mar 29 04:31:47 develop systemd[1]: Stopping nginx - high performance web server...
Mar 29 04:31:47 develop systemd[1]: nginx.service: Succeeded.
Mar 29 04:31:47 develop systemd[1]: Stopped nginx - high performance web server.
Mar 29 04:40:54 develop systemd[1]: Starting nginx - high performance web server...
Mar 29 04:40:54 develop systemd[1]: nginx.service: Can't open PID file /run/nginx.pid (yet?) after start: Operation not permitted
Mar 29 04:40:54 develop systemd[1]: Started nginx - high performance web server.
Mar 29 05:32:54 develop systemd[1]: Stopping nginx - high performance web server...
Mar 29 05:32:54 develop systemd[1]: nginx.service: Succeeded.
Mar 29 05:32:54 develop systemd[1]: Stopped nginx - high performance web server.
Mar 29 05:32:54 develop systemd[1]: Starting nginx - high performance web server...
Mar 29 05:32:54 develop systemd[1]: nginx.service: Can't open PID file /run/nginx.pid (yet?) after start: Operation not permitted
Mar 29 05:32:54 develop systemd[1]: Started nginx - high performance web server.
Mar 29 05:37:56 develop systemd[1]: Reloading nginx - high performance web server.
Mar 29 05:37:56 develop systemd[1]: Reloaded nginx - high performance web server.
Mar 29 05:49:07 develop systemd[1]: Stopping nginx - high performance web server...
Mar 29 05:49:07 develop systemd[1]: nginx.service: Succeeded.
Mar 29 05:49:07 develop systemd[1]: Stopped nginx - high performance web server.
Mar 29 06:05:47 develop systemd[1]: Starting nginx - high performance web server...
Mar 29 06:05:47 develop systemd[1]: nginx.service: Can't open PID file /run/nginx.pid (yet?) after start: Operation not permitted
Mar 29 06:05:47 develop systemd[1]: Started nginx - high performance web server.

These logs are all from our various interactions with the service - starting it, stopping it, reloading it - so it's a "busy" log file. I used the -u nginx flag because I wanted journctl to target the nginx unit (-u). Without that flag we get a lot of information about what's running on our system:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
-- Logs begin at Fri 2022-03-18 07:22:17 UTC, end at Tue 2022-03-29 06:27:20 UTC. --
Mar 18 07:22:17 develop kernel: Linux version 5.4.0-104-generic (buildd@ubuntu) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #118-Ub>
Mar 18 07:22:17 develop kernel: Command line: BOOT_IMAGE=/vmlinuz-5.4.0-104-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro maybe-ubiqui>
Mar 18 07:22:17 develop kernel: KERNEL supported cpus:
Mar 18 07:22:17 develop kernel:   Intel GenuineIntel
Mar 18 07:22:17 develop kernel:   AMD AuthenticAMD
Mar 18 07:22:17 develop kernel:   Hygon HygonGenuine
Mar 18 07:22:17 develop kernel:   Centaur CentaurHauls
Mar 18 07:22:17 develop kernel:   zhaoxin   Shanghai
Mar 18 07:22:17 develop kernel: [Firmware Bug]: TSC doesn't count with P0 frequency!
Mar 18 07:22:17 develop kernel: x86/fpu: x87 FPU will use FXSAVE
Mar 18 07:22:17 develop kernel: BIOS-provided physical RAM map:
Mar 18 07:22:17 develop kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
Mar 18 07:22:17 develop kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
...
lines 1-31

I got close to 3,500 lines in my output, hence using -u nginx.

You're mainly going to be using journalctl to check on the logs coming out a service if it fails to start. We can simulate a failure easily enough to demonstrate this. Run nano and use it to edit the main nginx.conf file: sudo nano /etc/nginx/nginx.conf

You'll see this content at the top:

1
2
3
4
5
user nginx;
worker_processes  auto;

error_log  /var/log/nginx/error.log notice;
pid        /var/run/nginx.pid;

Ignore everything right now. Just move to the end of the line user nginx; and remove the ;, so that it looks like this: user nginx. Now save the file (Ctrl+O), hit Enter to confirm the filename, and hit exit nano (Ctrl+X). Now do this: sudo systemctl restart nginx

1
2
3
superman@develop:~$ sudo systemctl restart nginx
Job for nginx.service failed because the control process exited with error code.
See "systemctl status nginx.service" and "journalctl -xe" for details.

Uh oh! We broke nginx! Let's check out the two commands it's suggesting we run:

  1. systemctl status nginx.service
  2. journalctl -xe

First the status of our nginx service:

1
2
3
4
5
6
superman@develop:~$ systemctl status nginx.service
● nginx.service - nginx - high performance web server
     Loaded: loaded (/lib/systemd/system/nginx.service; disabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Tue 2022-03-29 06:38:03 UTC; 1min 22s ago
       Docs: https://nginx.org/en/docs/
    Process: 49416 ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf (code=exited, status=1/FAILURE)

I don't like the looks of failed. We can use the second option to view some more information: sudo journalctl -xeu nginx

Question

You know why I added the -u nginx, right?

1
2
3
...
Mar 29 06:38:03 develop nginx[49416]: nginx: [emerg] invalid number of arguments in "user" directive in /etc/nginx/nginx.conf:3
...

It wasn't super clear, but I was able to find the above line. It tells us that line 3 in /etc/nginx/nginx.conf has an invalid number of arguments in "user" directive. That's the line we edited. Let's fix it up. Load up nano again, put the ; back into place, and give nginx a restart.

That's why journalctl is a powerful tool to know about, but that's enough of that for now.