Have you ever heard of the Kuramoto Model? The Kuramoto Model Wikipedia page has an impressive video showing out of phase metronomes synchronizing:
Could two or more computers synchronize in a similar fashion? What would be the common “fabric” between the machines? In the clip with the metronomes, the base board is crucial in bringing the metronomes into phase. Perhaps you could use GPIO signals to achieve a similar link between two computers.
Architecting a Test
At a high level, you want to solve the problem of synchronizing two identical, cyclic tasks running on separate but identical hardware. A 1Hz task that blinks an LED would be an appropriate test program. The goal would be to run the blink program on both machines and, through the magic of the Kuramoto Model, the two LEDs would eventually blink in unison.
How would one computer communicate when it last ran to other? You can connect the output GPIO that drives the LED to an input GPIO on the peer board! Below is a sketch of the setup:
+-------------------------------------------+
|BeagleBone Black 1 |
BBB2_GPIO_OUT | +----------+ +---------+ | BBB1_GPIO_OUT
----------------+-->| gtimer | | gsync +--+--------------->
| +----+-----+ +----^----+ |
| | | |
| | | |
| +----v--------------------------+----+ |
| | Shared Memory | |
| +------------------------------------+ |
| |
+-------------------------------------------+
+-------------------------------------------+
|BeagleBone Black 2 |
BBB1_GPIO_OUT | +----------+ +---------+ | BBB2_GPIO_OUT
----------------+-->| gtimer | | gsync +--+--------------->
| +----+-----+ +----^----+ |
| | | |
| | | |
| +----v--------------------------+----+ |
| | Shared Memory | |
| +------------------------------------+ |
| |
+-------------------------------------------+
Each BBB would host two processes: gtimer
and gsync
. The gtimer
process
monitors an input GPIO. gtimer
blocks on the GPIO waiting for a rising edge
event. When the input GPIO goes high, gtimer
logs the time when the signal
arrived in shared memory. Here’s a flowchart showing how gtimer
does its
thing:
+------------+ +-----------------+ +----------------------+ +----------------------+
| Init Shmem +---->| Init Input GPIO +---->| Wait for Rising Edge +---->| Write Wakeup Time to |
+------------+ +-----------------+ | Event | | Shmem |
+----------^-----------+ +----------+-----------+
| |
+----------------------------+
gsync
is essentially the blink program. gsync
runs at a configurable rate,
in this case 1Hz. gsync
will immediately signal to its peer wakeup has
occurred using an output GPIO. gsync
then proceeds to read shared memory to
know when its peer last ran. Using its own wakeup time and peer wakeup time,
gsync
can run the Kuramoto Model to compute a wakeup timer delta. The next
wakeup time will be closer to bringing the process into sync with its peer.
Here’s a flowchart showing how gsync
works:
+-------------------+ +------------------+
| Attached to Shmem +------>| Init Output GPIO |
+-------------------+ +--------+---------+
|
+-----------v--------------+
+----> Get CLOCK_MONOTONIC Time |
| +-----------+--------------+
| |
| +------------v---------------+
| | Send Wakeup Signal to Peer |
| +------------+---------------+
| |
| +-----------v--------------+
| | Compute Next Wakeup Time |
| +-----------+--------------+
| |
| +-----------v-------------+
+----+ Sleep Until Next Wakeup |
+-------------------------+
The best implementation of gsync
and gtimer
isn’t immediately obvious.
However, the hardware setup is pretty straightforward so lets look at that
first.
Hardware Test Setup
You need two computers with which to test. The Beaglebone Black (BBB) single board computer is a good choice. The BBB is a good candidate for the following reasons:
- High availability.
- The BBB has a ton of unallocated GPIOs.
- The BBB runs Linux and so you can use the usual dev tools to build and deploy software.
The circuit below describes the hardware interconnect:
P9_15
is the input GPIO and P9_23
is the output GPIO. You can choose other
pins if you like. Notice that the input and output GPIOs cross. That is,
BBB1’s output GPIO is BBB2’s input GPIO and vice versa. A 470 Ohm resistor
limits current to the LED. The resistor also serves as short circuit protection
in case the GPIOs mistakenly are both outputs with one side set high and the
other set low.
Speaking of GPIO configuration, many of the pins on the BBB support multiple
functions. Chapter 6 of the book “Exploring BeagleBone” gives nice coverage
of how to configure the GPIOs on the BBB. Verify P9_15
and P9_23
are free.
You must configure the pins as GPIO with internal pull down resistors enabled
(mux mode 7). If you choose to use different pins, make sure you configure
them correctly before powering the circuit!
Doing Things Real-time
To achieve half decent sync results, execute gtimer
and gsync
as real-time
processes on a Linux kernel supporting preemption. The task of configuring and
building an RT Linux kernel is nontrivial even in 2023. The
bbb_kernel_builder project streamlines the process of building a Linux
kernel for the BBB with the PREEMPT_RT
patches applied.
There’s more to setting up a real-time Linux application than configuring and
building the kernel. A lot more. “Real-Time Linux App Development” goes
into the details. The article provides a checklist of all the tweaks you can
make at the system and source code level to achieve deterministic behavior. Both
the gtimer
and gsync
implementations follow the guidelines given in the
linked article:
- Prefault heap and stack memory.
- Lock pages to memory and disable
mmap
usage. - Configure inter-process mutexes with the
PTHREAD_PRIO_INHERIT
andPTHREAD_PROCESS_SHARED
attributes. - Set cyclic tasks to use absolute time values as their next wakeup. Reference
the
CLOCK_MONOTONIC
clock for time.
You don’t need to hardcode the scheduling policy and priorities in the source.
Instead, you can use the chrt
utility to set those parameters up from the run
script:
#!/bin/bash
GPIO_DEVNAME="/dev/gpiochip1"
GPIO_IN_OFFSET=16
GPIO_OUT_OFFSET=17
SHMEMKEY=57005
GTIMER_PRIO=80
GSYNC_PRIO=70
FREQ_HZ=1
COUPLING_CONST=0.5
chrt --fifo $GTIMER_PRIO ./gtimer $GPIO_DEVNAME $GPIO_IN_OFFSET $SHMEMKEY &
chrt --fifo $GSYNC_PRIO \
./gsync -f $FREQ_HZ -k $COUPLING_CONST $GPIO_DEVNAME $GPIO_OUT_OFFSET \
$SHMEMKEY &
SCHED_FIFO
was the most appropriate RT scheduling policy for this application.
gtimer
has a higher priority than gsync
because you want to log
the time when the peer signal arrives ASAP. That means that gtimer
may have to
preempt a running gsync
.
One thing you’d usually do on a multicore system is allocate your cores among
the real-time processes. The BBB is a single core system meaning your RT
processes get to run on the same core as all the SCHED_OTHER
tasks. There’s
not much you can do about this. That said, the code is portable to other systems
running Linux. An interesting follow-up experiment would be porting this project
to a multicore platform. You could dedicate a core to each RT process and
compare the measured latencies with those presented at the end of this article.
GPIO Woes
One of the more tedious parts of implementing this sync concept is getting
software control of the GPIOs right. You might start with the legacy sysfs API
for GPIO control. The general idea is that you can control the behavior and
state of a GPIO pin by writing/reading data in a number of different text files.
You can export GPIOs to /sys/class/gpio/export
. You compute the GPIO number
using the formula GPIO_NUM = (32 * CHIPNUM) + OFFSET
. After exporting the
GPIO, you get a nice file structure like the one shown below:
root@gsync:~# ls /sys/class/gpio/gpio48
active_low device direction edge label power subsystem uevent value
In the snippet, you have the property files for gpio48
AKA GPIO1_16
(chip
1/offset 16) associated with the BBB header pin labeled P9_15
. If you wanted
to set the pin to be an output pin, you could write out
to the direction
file:
echo out > /sys/class/gpio/gpio48/direction
If you wanted to set the pin high, you could write 1
to the value
file:
echo 1 > /sys/class/gpio/gpio48/value
You get the idea. The sysfs method of GPIO control is nice for one-and-done configurations. That said, you can imagine that opening and closing a file every time you want to toggle a pin is pretty inefficient. You might get away with it running at 1Hz but if you ever decide to up the rate, repeated file IO is going to hurt performance.
So what’s the best way of controlling GPIOs from userspace these days? The
answer is libgpiod
. libgpiod
uses the character device interface to the
GPIOs. You don’t lose any of the functionality you had with the sysfs API and
you don’t have to deal with the ioctl-based kernel-userpace interaction
directly. It even gets bonus points for coming with C++ bindings and a set of
useful examples. It’s hard to misuse the API since it throws exceptions for
every imaginable error. The time efficiency in toggling a GPIO is also optimal.
The Kuramoto Model
Finally, it’s time to implement the Kuramoto Model. Step one is to translate the equation from the wiki page to something manageable in the code. Here’s the original equation:
\[ {d\theta \over dt} = \omega_i + {K \over N}\sum_{i=1}^N \sin(\theta_j - \theta_i) \]
Phase Angles and Time
One thing is immediately apparent: you need a way to convert time to phase angles and vice versa. Your blink task has a known frequency of 1 Hertz meaning one complete task cycle takes 1 second. You can map portions of the cycle in seconds to angles on the unit circle. For example,
- 0.00 -> \( 0 \) radians
- 0.25 -> \( \pi \over 2 \) radians
- 0.50 -> \( \pi \) radians
- 0.75 -> \( 3 \pi \over 2 \) radians
- 1.00 -> \( 2 \pi \) radians
The relationship between time, \(t\), and frequency, \(F\), is \( t = {1 \over F} \). You’re interested in the time it takes to move some angle \( \theta \) in radians. What you find is that
\[ t = {\theta \over {2 \pi F}} \]
To go from time to angle you can solve for \(\theta\):
\[ \theta = {2 \pi F t} \]
Note, the \( t \) is in units of seconds. The gsync
implementation tracks
time in units of nanoseconds. The conversion equations used in the code
account for the units change.
The Wakeup Delta
At this point, you’re ready to plug some numbers into the base equation to compute \( d\theta \over dt \)! Fill in the blanks on the terms:
- \( \omega_i \) -> This is your base frequency. In radians, the base frequency is \( 2 \pi \).
- \( K \) -> This is the coupling constant and is a tuneable parameter.
gsync
defaults to \( K = 0.5 \). - \( N \) -> This is the number of participants. Since you are syncing 2 computers, \( N = 2 \).
- \( \theta_j \) -> This is your peer’s phase offset from the ideal base frequency. You can compute \( \theta_j \) by taking your peer’s reported wakeup time and converting to a phase angle using the conversion function previously derived.
- \( \theta_i \) -> This is your own phase offset. Similar to \( \theta_j \), \( \theta_i \) converts your actual wakeup time to a phase angle. To explain a bit further, you have an expected and an actual wakeup time on the computer. The expected time is the time you would execute if there were no additional latencies imposed by the system. The actual wakeup time is the measured time after you resume execution. In short, you’re off phase from the desired base frequency and the model uses \( \theta_i \) to account for that.
In the code, the sync function takes as input the computer’s actual wakeup time
and the last reported peer wakeup time extracted from shared memory. Using the
latter information, along with frequency and coupling constant info given at
program startup, gsync
computes \( d\theta \over dt\) and converts it to a
time in nanoseconds. That time is an offset to the next gsync
wakeup time.
As an example, suppose gsync
ran with a frequency of 1 Hz or every 1 seconds.
Also suppose the sync function returned time deltas in seconds. If the sync
function returned a time delta of \( -0.5 \), then gsync
would next sleep
for \( 1 - 0.5 = 0.5 \) seconds (that is, gsync
will wakeup earlier by
half a second). Maybe the sync function over shot. In the next run, the sync
function returns a delta of \( 0.8 \), then gsync
will sleep for \( 1.0 +
0.8 = 1.8 \) seconds (that is, it will wakeup later). Essentially, the delta
in the wakeup time of the computers oscillates about \( 0 \)! The smaller the
oscillations, the better the sync.
The End Result
In the end, what do you see? Well, running gtimer
and gsync
on both BBBs
with the frequency of gtimer
set to 1 Hz and the coupling constant set to \(
0.5\), you see two LEDs blinking synchronously. It takes maybe 3 to 4 cycles
(blinks) before they flash in unison. Running both processes for a day and
doesn’t produce any noticeable hiccups in the sync!
You can also play a bit with the coupling constant to see what sort of effect it has. You can increment the coupling constant in steps of \( 0.1 \) starting at \( K = 0.1 \). What you’ll find is that if \( K \) is too low, the LEDs never seem to synchronize. After crossing a threshold value, synchronization always seems to occur.
So the sync at 1Hz “looks good enough.” Still, it would be interesting to measure using an oscilloscope the delay between the rising edge of one computer’s signal versus the other’s. You can experiment with a number of different rates starting at 1Hz and ramping up to about 500Hz in increments of 50Hz. Below is a histogram of the time deltas you would encounter on a 100Hz run:
With about 20,000 samples, the average delay was ~100 usec. More interesting than the average is the absolute maximum delay which was approximately 572 usec. These observations more or less held true for all test runs in the range [1, 400] Hertz.
Beyond the 400Hz run rate, you start to see some oddball results. Below is a capture of a 500Hz run:
The average delta was still approximately 100 usec. The maximum delta saw huge
spikes around 2.7 ms. Worse yet, these were more than just a few outliers, there
were multiple hits in the 2.5 ms range. Are the spikes driven by weak coupling?
Is there a timing bug between gtimer
and gsync
? Questions for another day.
Conclusion
Synchronizing at least two computers linked via only GPIO is possible. Better yet, the Kuramoto Model used to bring the two machines into phase is relatively straightforward to code and reason about. Moreover, submillisecond synchronization is achievable for rates below 500Hz on bargain hardware using free and open source software.
The complete project source with build instructions, usage, etc. is available on GitHub under gsync.