Gleam OTP : Using Supervisors

gleam_otp v1.20, December 2025

Dec 03, 2025

tl;dr

https://github.com/vpribish/small_supervisor
old tutorials are all broken because this part of the language changed recently
the gleam_otp and gleam_erlang docs are not enough to learn from but are good references
supervisors only work with actors, not basic processes
use named actors - their subject gets the same name automatically
names are not just strings, you have to pass around the original name variable
pass the name in the run function used in the child_spec
if an actor crashes while being call()ed then the caller crashes too - so it probably needs to be a supervised actor as well
you can’t supervise main() so don’t do much in it
main has to process.wait_forever(). If it ends everything shuts down.
if you want an infinite loop - you use tail-call recursion.
if you need that loop to start as soon as the actor does then you have to send() your startup message from inside the run function for the child_spec that you use to build the actor
give your actors their own supervisors unless you know they won’t fail-cascade

yeah, that’s a lot. but it’s the minimum you need to know.

Good Long; Much Read

The Erlang OTP it is one of the most exciting things about Gleam. If you want to build durable and scalable systems, Erlang is the holy grail. Supervisor networks and worker process restart policies sound very cool - but it took a lot of effort to learn them.

I am going to save you a ton of hassle.

The official packages:
- gleam_otp (the actors and supervisors)
- gleam_erlang (the underlying processes)
Their docs do not give complete examples and, while accurate, are so sparse they are at best a reference once you already know the material.
I went through a dozen or so tutorials in blog posts and articles and GitHub repos and videos and NONE of them worked. Before spending time on a lesson, make sure that code still runs. Gleam has changed a lot in the last few years and many old examples are now broken.

I have a small and practical example that you can run with a working example of the whole system continuing to run after crashes in worker processes.

Also - for those others who have been frustrated - here are some of the key mistakes I was making along the way:

Supervisors only work with Actors

Despite descriptions that supervisors manage “processes”, they actually only mange Actors (which they refer to as Workers sometimes). You can spawn off a basic process, but you can’t get your supervisor to mind it for you. So say you have a function that you would have run as a process…
- You have to wrap your process up inside an Actor. That means making a handler that kicks off your function. The handler requires a Message so make a little type for that. It’s a lot of boilerplate but you get extra flexibility if you want multiple variants of messages.
- To start up your function need to send a message to your Actor’s Subject - which you don’t have since the Actor was started inside the supervisor.
  - so you want to have a “named” subject for your Actor - and that is a tiny little easter egg hidden in the docs: when you name an Actor, its subject gets the same name! (this was not at all obvious to me)
  - you need to pass that name in the function in the child spec that the supervisor calls to start the worker - and apply it to the Actor as it’s being built. Names are not just the string you create them with, you have to pass that Name variable around and cannot re-create it.
  - and, and, and! If your process/function is just an endless tail-call recursion “loop” you want running all the time then you should start it going by sending your Actor the Message that starts that recursion from within the code in the Child_Spec that the Supervisor calls to start the Worker. hahahaha!

With that you can have a Supervisor start (and restart) a Worker (your Actor) and be able to pass messages to it from the outside. If the process your Actor created crashes then the Actor crashes too. Any reason the Actor crashes, the Supervisor will restart it INSTANTLY. Often I see a message about a restart before the crash report comes out. sweet.

… but my whole program is still crashing …

Users of Supervised Actors also must be Supervised Actors

If your Actor crashed while someone was using it to respond to an actor.call() then the CALLER crashes too. If that caller was your main process then everything comes down. If the caller was in some unsupervised process then it crashes. If it was still linked to main() that that crashes too. It’s in the docs, for sure, but I had to bump into it before I understood.

So for your ‘user’ thingamajig you need another :

Supervisor to restart whatever your user process is
… which needs to be an Actor
1. with a message_handler
2. a message type
3. a startup function
4. a named subject
5. a name you pass in from the outside
6. also the name to use to talk to the other actor under the other supervisor
7. passed in through a closure calling the startup function
8. in the child spec from the worker() function.
if you need it running from the get-go, start your process/function up from within the startup function of the child_spec for the actor.

NOW you can survive and continue to do work. Here’s what it looks like as an outline

main()
- name1 // name to access the Subject to talk to actor1
- name2 // name to access the Subject to talk to actor2
- supervisor1
  - actor1
    - handler running that dangerous job that can fail
- supervisor2
  - actor2
    - task that needs the result from the dangerous job

flow:

main sets up the supervisors
supervisors start up actor1 and actor2
actor2 contacts actor1 requesting the dangerous result
actor1 might pass the result back or crash, taking actor 2 with it.
if crashing, the supervisors restart the actors right away.

main() does not interact with the Actors!

your main function can’t really do anything except set up the supervisors. Everything needs to get moving automatically when a supervisor restarts a worker (actor). Main needs to “not exit”, though. process.sleep_forever() will do.

Why not just one supervisor?

you can put multiple workers of multiple types under one supervisor, but you can easily get reached_max_restart_intensity errors if they cause each other to crash. When a supervisor gets too many crashes too quickly it refuses to go on. This sounds a little scary - what if it’s hard to predict how crashes will spread? I DON’T KNOW - but those Erlang people talk about hierarchies of supervisors supervising supervisors. oh my.

100 lines of code.

It’s a lot of boilerplate, but there is not much I could remove while still being a lot like a useful program.

the main disappointment was that I had to make the little loop that calls the worker actor into its own actor. The original recursive loop was 8 lines - and in actor-form it took more than 30. If that actor.call() could just return a Result that I have to handle instead of crashing the caller it would be way better.

If you orchestrate things so you never have to use actor.call() but instead use actor.send() then your ‘caller’ will not crash when the worker crashes so maybe the caller doesn’t need a supervisor - but… now you needs another way to get the results of what the worker does, which is probably a named subject and an actor with messages. not a clear win for simplicity.

The Code

https://github.com/vpribish/small_supervisor

Vincent's Learning Gleam

Discussion about this post

Ready for more?