GPS, Galileo & More: How do they work & what happened during the big outage?

By bert hubert bert@hubertnet.nl / @PowerDNS_Bert / https://galmon.eu/

Late July 2019, Galileo, “the European GPS” suffered from a week long outage. I’m a proud European, and I think we should have our own well-functioning navigation system, so I tried to figure out what was going on. Surely someone was monitoring this stuff in public? I come from the internet where we monitor all the things, if someone asked for it or not.

This led me on a journey to monitor Galileo, but quickly also GPS, the Russian GLONASS and Chinese BeiDou systems. Along the way, I found out out how positioning satellites really work. This also helped me understand what went wrong with Galileo, more about which later.

In this post, I want to share what I learned, firstly because it is fascinating, but secondly because it serves as documentation of what the monitoring website “galmon.eu” is actually showing.

Galmon, which we should really rename to Navmon, is a lot like the RIPE Atlas Probes, but then for space. Based on a network of volunteers literally around the world, we monitor the output of each and every navigation satellite and make the results openly available as a pretty website, JSON but also as raw data (messages). Galmon is GPL licensed open source and lives on GitHub.

The Galmon monitoring network (with an outage in Brazil, sadly)

I want to thank several navigation, satellite and Galileo specialists for proofreading this post & providing suggestions and improvements. The section on the Galileo outage was not proofread by anyone & all speculation is mine.

Note: this page is already very long and will sadly skip over the utterly fascinating modulation details that make satellite navigation possible. But you can read about that on this excellent page from Delft University.

How do navigation satellites work?

Let’s imagine we launch a bunch of metronomes, musical devices that tick at a precise frequency. We’ll make them tick precisely once a second and, unlike a regular metronome, we’ll also make them tick exactly on “whole seconds”. So they emit a tick at 0 seconds past the hour, 1 second past the hour etc, one tick every second, on the second.

We put the metronomes in different orbits around the earth, so at any time, some of them are further away than others. Then, we listen to their ticks, which are conveniently transmitted over radio.

Because of the speed of light, a metronome that is 30000 km away will send out its tick on a whole second, but it will arrive with us 100 milliseconds later. A metronome that is closer to us, say 25000 km, will have its tick arrive slightly earlier. These differences are large enough that if we would put the ticks on a loudspeaker we could hear the difference.

Because we can measure the precise delay, we can tell exactly how far away each satellite is from us. In itself however, this does not help us determine our location, because we don’t know where the metronomes are!

Ephemeris

Let’s add a further twist, not only do the metronomes tick, they also send out a description of their orbit. This tells us at every tick exactly where the satellite is.

This description is called an “ephemeris”, and not only does it detail where the satellite is now, it also allows us to calculate where it will be in the near future. The way the GNSS (“Global Navigation Satellite System”) vehicles describe their orbits is based on the work from Johannes Kepler in the 1620s - with some modernization.

Now that we know where the satellites are, plus how long their ticks take to reach us, we can calculate where we are:

In two dimensions, we only need two satellites to do this, assuming we already had an accurate clock that also ticks once a second, on the second. In three dimensions, we need three satellites ticking away at us.

(Note that in the figure above, we could also be in a second position where the circles intersect - we can rule out that solution by assuming we are not actually in space ourselves).

But, at a very elementary level, this is how GNSS works: satellites tell us where they are, and they send out a ‘tick’ exactly every second (and on the second), and by timing how late that tick is in arriving at our location, we know how far away the satellite is. And by drawing some circles (actually spheres), we can discover where we are. This technique is called multilateration.

What if we don’t have an accurate clock?

We want to use GNSS to figure out where we are, even if we aren’t dragging an accurate clock with us. The speed of light means that for every nanosecond that we get our clocks wrong, our position will be off by 30 centimeters. Nanosecond accurate clocks are delicate machines that do not fit in phones.

Luckily, through some clever math, it is possible to use the satellites themselves as an accurate clock - to do so, we do need an additional satellite and a guess of the time. Such a guess could be derived by taking the average of all GNSS clocks received combined with a rough knowledge how how far away such a satellite could be. Armed with this rough guess, we might end up with:

Note that the three circles do not all cross in a single point. Because our rough clock estimate was wrong (let’s say it runs late a bit), all satellites appear to be a little bit further away than they actually are. Or in other words, all the circles are a little bit too big, causing them not to intersect in one point.

A receiver can based on this observation adjust its internal clock until all circles intersect in one point:

Once this happens, we know that the correct time has been derived. Incidentally, this trick has turned our cheap GNSS receiver into a highly accurate clock, and GPS devices are frequently used exactly for this purpose.

Now down to the details

As described up to here, our positioning would be accurate by up to a kilometer or so. This is not enough for our phone to discover which shop we are at so the operator of the phone can sell advertising that is exactly targeted at us. So we clearly need to do better.

From the above it is clear that to determine our position two things are very important: that the ticks are sent exactly on the whole second, plus that we know the location of the satellite very precisely.

We’ll start with the second detail. Johannes Kepler figured out around 1617 that to describe a satellite orbit accurately, you need 7 elements. In his idealized model, an orbit sticks to a 2D plane, and is always an ellipse. You can then pin down an orbit exactly using the following elements:

  1. The mean of the long and short axes of the ellipse (in practice: the area of the ellipse, \(A\))
  2. The ratio of the major and minor axes of the ellipse (\(e\)).
  3. Three parameters that describe how the orbital plane is oriented: the inclination (\(i_0\)),
  4. The longitude of the ascending node (\(\Omega_0\)),
  5. The argument of periapsis (\(\omega\))
  6. How far along the satellite is in its ellipse at T=0 (“the mean anomaly” or \(M_0\))
  7. When T=0 is (\(t_{0e}\))

This was good enough in Kepler’s time (in fact, it was very good), but it is not good enough for global positioning. Because it turns out that the earth is not even close to a perfect sphere, nor is the gravitational field fully uniform. In real life, satellites diverge by kilometers from this perfect orbit.

To cope with this, the big brains that designed GPS back in the 1970s came up with a scheme to enhance these 7 parameters with six more, each of which describe how the existing parameters vary over time or with the position within the orbit.

I call these people big brains for a reason. It appears they positively nailed it. Satellites have only a tiny amount of bandwidth to tell us about their orbits, so it is very important to generate the best possible description in the number of bits available.

When China and Europe decided to build their own navigation systems in the 1990s, they had the opportunity to improve on the GPS orbital descriptions (ephemerides). And guess what? They didn’t. This is partially because of interoperability concerns, but also because GPS did an incredible job 20+ years earlier picking the right parameters and bit allocations.

So in short: every satellite beams down its 7 Keplerian elements, plus corrections, so we get a very accurate description of where the satellite hangs out in space.

Time is of the essence

As noted earlier, one nanosecond equals 30 centimeters at the speed of light. This means that if we strive for ‘meter’ level accuracy, or better, we need nanosecond-level accurate clocks. The problem is that such clocks barely exist, let alone that we can launch them into space and have them function reliably for 10 years.

Atomic clocks are very exciting things and in lab conditions achieve incredible stability. But a satellite in space sits there in full sunlight, or suddenly not, and only has a limited power budget. We simply can’t launch perfect atomic clocks.

To solve these problems, the less precise (but still very good!) atomic clocks in space are monitored from the ground and compared to some of these lab-grade time references. The good news is that while the clocks in space may not be as precise, it turns out their deviations can be modeled very well.

The ‘ground segment’ of a GNSS therefore transmits three correction terms to the satellites:

  1. Clock is off by \(a_{f0}\) nanoseconds
  2. Clock drift rate is \(a_{f1}\) nanoseconds/second
  3. The drift rate is increasing by \(a_{f2}\) nanoseconds/second/second

In addition, a \(t_{0c}\) term is transmitted that anchors these terms to a specific point in time.

Instead of adjusting the clock directly, the satellites leave the atomic clocks as is, but send down the correction terms so receivers can do their own correction.

Combining everything

GNSS satellites do actually send us ‘ticks’, like the metronomes from above, but these ticks are the start (‘first chip’) of navigation messages. These go by different names in different systems (pages, words, strings), but each message starts on a whole second and contains the parameters we need to know to perform accurate positioning.

There are a lot of different message types. Some tell us about the orbit, some about the atomic clock model, others inform us about the health of the satellite, the ionospheric conditions, how precise it thinks its orbit is determined, the offset between “GPS/Galileo time” and “UTC time”. Interestingly, each satellite will also tell us where all the other satellites are, and how they are doing (the almanac). Finally, there are also messages that tell us about internal cable lengths & other things that can cause a delay in signals between multiple frequency bands.

But in essence, the process for determining a ‘fix’ is:

  1. Receive ticks from more than 4 satellites
  2. Receive enough messages that we know the orbits, the corrections and the atomic clock corrections (or cheat, and get them from the internet)
  3. Measure the delay in ticks for all these satellites
  4. Run the math, report the fix

Ground segment

As noted above, the atomic clocks on board the satellites are monitored from the ground & corrections are sent back up. Similarly, the orbit of the satellites is determined accurately and the current best Keplerian elements + their corrections are transmitted to the satellites periodically.

Different systems do this at different rates. “Classic” GPS sends out an update every two hours, and on the hour. The older an orbital description & atomic clock model the larger the divergence between the prediction and the reality.

After two hours, the “new” position of a GPS satellite is often dozens of centimeters away from where it was predicted to be. Galileo and BeiDou send out far more frequent updates and therefore see far smaller divergence in orbital position (in meters):

https://ds9a.nl/galileo/eph-jumps.png (Histogram, x-axis in meters, y-axis count. This corresponds to ‘latest-disco’ on galmon.eu).

Similarly, the atomic clocks receive updates, and these frequently lead to significant corrections. Here are histograms of the atomic clock jumps, where it is important to note that the x-axes are very different between GPS, Galileo, GLONASS and BeiDou:

https://ds9a.nl/galileo/eph-clock-jumps.png (Histogram, x-axis in nanoseconds, y-axis count. This graph corresponds to ‘time-disco’ on galmon.eu).

What went wrong during the Galileo outage?

Only a very limited number of people was able to comment publicly during the outage. A movie with data was distributed by the Finnish National Land Survey (’maanmittauslaitos’).

A research group called NavSAS (famous for performing the first ever Galileo fix) also documented the breakdown & recovery as they happened.

Finally, GNSS engineer & researcher Daniel Estevez has completely gone to the bottom of what happened in several different posts.

The very brief summary from the excellent work linked above is that Galileo first stopped transmitting its regular updates to the satellites. Normal updates are numbered 0 to 127 in the IOD field (for which see below). During the beginning of the event, suddenly higher numbers were used, with big increments. This continued until the IOD reached 958, out of a maximum possible value of 1024. The next logical increment would have taken it beyond 1024.

The satellites kept transmitting the same ephemeris (orbit details) and atomic clock corrections continuously, until the 17th of July.

The Galileo Service Definition Document states that an ephemeris older than 4 hours is not precise enough, but receivers could of course still use such an ephemeris.

Real problems, including ‘impossible location fixes’ started after 3.5 days. Due to a technicality, an ephemeris that is more than 3.5 days old is suddenly assumed to be dated 3.5 days in the future. This is because the \(t_{0e}\) of an ephemeris is specified in ‘seconds since start of the week’ (which is at the start of Sunday UTC). Crucially, no week number is supplied.

This means that after 3.5 days, it becomes ambiguous if the \(t_{0e}\) refers to a few days ago or a few days in the future.

So why did the regular flow of updates stop? There has been no real clarity on this beyond ‘equipment malfunction’, although some communications seemed to imply firewall problems during an attempt to fail over to a backup control center. One thing that is sure is that the pipeline of measuring orbits, determining atomic clock corrections and sending these to the satellites is not trivial. If this derails it may not be trivial to restart it.

A Galileo Service Notice phrased it as:

The technical incident originated by an equipment malfunction in the Galileo control centres that calculate time and orbit predictions, and which are used to compute the navigation message. The malfunction affected different elements on both centres

We’ll hopefully learn more once the Galileo independent inquiry board reports its initial recommendations, likely in October.

GLONASS

GLONASS is a fascinating Soviet Union era design that made several important different choices. For one, GLONASS does not use UTC internally, but is actually defined in ‘Moscow Time’. Currently this is again correct - Moscow went through a period where it had Daylight Savings Time, which made the Soviet era definition documents incorrect.

In addition, all other systems define time as the number of seconds since a defined starting moment. GLONASS sends out time as a day number within a four year calendar, plus the actual hours, minutes and seconds a clock in Moscow would show. In this way it neatly attempts to sidestep the thorny issue of leap seconds.

Finally, GLONASS sends out its location in space in a unique way. It encodes the exact position of a satellite at a specific time, plus it gives the speed in three dimensions. Finally, the acceleration experienced at that place coming from the moon and the sun at that moment is described. To calculate the exact position of a GLONASS satellite then involves actually integrating the differential equations of motion, including taking the oblate shape of the earth’s gravitational field into account.

I can only conclude that the designers of GLONASS had very high expectations of what the developers of receivers would be able to do.

UTC difference, GPS difference, Leap seconds

Each GNSS is anchored to a room full of atomic clocks on earth. The systems attempt to synchronise their satellites with their own clock constellation. Inevitably however, “GPS time”, “BeiDou time”, “Galileo time” and “GLONASS time” will drift away somewhat from UTC.

Because of this, each system broadcasts how far it has currently diverged from UTC, and also how fast it is drifting. These differences tend to be in the ‘several nanosecond’ range, with a divergence lower than 1 nanosecond a day.

https://ds9a.nl/galileo/utc.png

Receivers on the ground that wish to use GNSS for UTC timekeeping can use these correction terms to convert ‘GPS Time’ or ‘Galileo Time’ into UTC.

In addition, Galileo and GLONASS also broadcast their difference with GPS. BeiDou can also broadcast this difference, but in practice doesn’t.

Leap seconds are ignored by GPS, Galileo and BeiDou, except that as a service to terrestrials who care about such things, the offset between ‘GNSS Time’ and the leap-second adjusted UTC time is broadcast. Every leap second is announced well in advance.

GLONASS in theory sidesteps leap seconds by using Moscow time, but anecdotal evidence suggests that in practice the GLONASS system wobbles a bit on a leap second day.

Ionospheric corrections

The reader that has made it this far down the page may have wondered about the speed of light in air or vacuum, and if the atmosphere might play a role in positioning accuracy. That reader would be right. In practice, the troposphere (with air) is only a dozen kilometers high, and it does not add significant delay (except at very low elevations).

The ionosphere however extends from 60km to 2000km high. This part of the atmosphere has an electron density that does lower the effective speed of the radio signal, which can disrupt timing to the point positions can be off by tens of meters.

To counter this, GPS, Galileo and BeiDou send out parameters describing current ionospheric conditions, in increasing level of sophistication. GLONASS does not provide such details, but a receiver can apply the corrections learned from (say) BeiDou to the GLONASS signal.

https://ds9a.nl/galileo/ionosphere.png

Modern GNSS vehicles transmit information via several different frequencies. The delay created by the ionosphere is directly proportional to the frequency of the signal. A receiver that is able to receive on multiple bands can therefore use the difference in arrival times (“TDOA”) between two bands to derive the total delay incurred and eliminate it.

So if the signal on one band arrives x nanoseconds later than on another band and the ratio of the frequencies is 1.5, and we know that the delay for 0Hz signal would have been 0 seconds, this allows us to to exactly calculate the delay for the individual frequencies.

This removes over 99.9% of the error introduced by the atmosphere, without performing further modeling.

Some further details from galmon.eu

The galmon.eu website lists many of the parameters discussed above. On the main page, you will find a column called ‘IOD’, this is a number used by GPS, Galileo and somewhat by GLONASS to describe the version of the data currently served. It is useful for debugging.

“best-tle” is derived from the US 18th Space Control Squadron database of objects in space, and is the name of their object closest to the reported position of a GNSS satellite. “tle-dist” meanwhile is the distance in kilometers between the not very precise “TLE”-derived position and the ephemeris position.

As noted, satellites also transmit an almanac with a rough ephemeris for all other satellites. “alma-dist” denotes the distance between the precise position and the almanac one.

“eph-age-m” describes the distance between the current time and the \(t_{0e}\) of the ephemeris.

Galileo always sends out ephemerides with a \(t_{0e}\) in the past, GPS always has one in the future. For GLONASS this time ranges from ‘15 minutes in the future’ to ‘15 minutes ago’.

“SISA” is a field that describes how accurate a GNSS thinks the positioning of a satellite is. So if SISA is set to 200 cm, this means the operator thinks that the combined clock and position errors should allow the range to this satellite to be determined with better than 2 meter precision. A receiver can use this number to determine how heavily to weigh a satellite in calculating a fix. A satellite can also broadcast that it does not know its SISA, which leads receivers to stay away from using it. SISA is called URA in GPS parlance.

In “health” we can find how a satellite tells us it is doing. Sometimes there is resolution to this (like with Galileo and GPS), sometimes it is a binary indicator (GLONASS and BeiDou).

“sources” shows the station numbers currently receiving this satellite, “db” shows how clearly (in dbHz).

Every satellite broadcasts on a very precise frequency, but since a satellite is generally moving away from us or towards us, because of Doppler effects, the received frequency will be different. On the ground, the galmon software calculates what this Doppler shift should be, and compares it to what is received. This number is shown as “ΔHz” and is only measured by some receivers. It should hover around 0Hz - the measurement is only accurate +- a few Hz.

“prres” stands for pseudorange residual, and it is one of the only numbers we report that are actually “the opinion of the receiver”. As noted above, a location/time fix can be generated using 4 satellites. Often however these days a receiver will have a dozen or more satellites to choose from. This means our position is overdetermined. A receiver can use this to calculate the “best” precision and then report how far away from that best location every satellite is. Consistently low or high pseudorange residuals may mean a satellite has a clock or an ephemeris problem. Another cause is the reception of reflected signals, which may make the radio wave take a longer (indirect) route to reach us.

“elev” meanwhile stands for the elevation above the horizon, as seen by each receiver. Low elevation measurements go through far more atmosphere and buildings and are therefore likely to be of worse quality. Due to a simplistic algorithm, elevations may sometimes be slightly negative - this can happen with a satellite that is no longer actually being received, but where we still report where it was. In addition, our elevation algorithm assumes a spherical earth, which is not the case.

Is that all?

For ‘meter level accuracy’, this page has listed most relevant details. GNSS however is a fiendishly difficult subject. A great resource is the giant Springer Handbook of Global Navigation Satellite Systems, but it will set you back over 250 euros. Some university libraries offer access to the e-book.

As noted, we did not discuss the fascinating RF modulation details, nor the wonders of “assisted GPS”, nor the precision enhancing space based augmentation systems like EGNOS, WAAS, QZSS or GAGAN. This may be for a future post.

A great read is the Galileo Signal-in-Space Interface Control Document which has all the nuts and bolts.

Finally, the README of the Galmon project has links to documents on GLONASS, GPS, BeiDou and several receiver chipsets.