Dutch COVID-19 Data Sources & Their Interpretation
As the world focuses on the terrible (but improving) Dutch COVID-19 numbers, I also see a lot of people interpreting the numbers badly. But we can hardly blame folks since most data is only described in Dutch. And even in Dutch the descriptions can be very confusing.
So here goes.
NOTE: Corrections and additional links & sources are very welcome on bert@hubertnet.nl or @bert_hu_bert.
The main sources
The RIVM is the “Dutch CDC”. COVID-19 is a notifiable disease, and you have to notify RIVM (if you are a registered health professional or working for one, see below). They keep a database called OSIRIS in which the notifications are tracked. This is in effect a list of everyone that has ever had a positive COVID-19 test in The Netherlands. Every day at 14:15 local time, an extract of OSIRIS is published. This data is good for knowing how many people are infected and roughly where. It also tracks which patients were ever hospitalised and which ones died, but this metadata lags behind other data sources. The semantics of the data released are weird and often lead to confusion, but see below.
The National Coordination Center for Patient Distribution (Landelijk Coordinatiecentrum Patienten Spreiding, LCPS) keeps track of the capacity of all hospitals and their intensive care units. The numbers they report reflect capacity and not individual patients.
The NICE Foundation (Stichting NICE) also tracks hospitalizations and they report both total numbers and admissions (but, not discharges). So if 50 patients get admitted and 20 get discharged, NICE will report +50 for the day, but LCPS would say +30.
Note that in reality, since reporting timeframes differ between RIVM, LCPS and NICE, the numbers will never match up on a day to day basis.
The GGDs, municipal health services, provide a weekly update on Monday about how many tests they have performed and the positivity rate. The GGDs perform the majority of Dutch tests, but hospitals do them too, so the numbers don’t line up exactly with RIVM data. The weekly data can be found through https://ggdghor.nl/nieuws/.
Update: Massive but unknown numbers of tests are now being performed by commercial rapid-testing providers. According to very credible data, around 250000 antigenic tests are now being sold in The Netherlands, every week. This does not mean they are all being used though. A big problem is that while COVID-19 is a notifiable disease, the law is somewhat ambiguous on if it applies to medical professionals alone or to anyone doing a test. To make matters worse, the GGD is ALSO being ambiguous and some parts of the municipal health system have indicated they won’t even accept positive testing reports from non-traditional labs. But the upshot is that there is a sizeable parallel testing effort going on, and it seems likely most of the data is not making it into the national reporting. This is a very frustrating situation, since we just don’t know. In the absence of official sources, know that Bart Bolkestein collects data on commercial testing providers on his Corona Locator Nederland site.
Some individual municipal health services provide more data, for example The Hague and Amsterdam.
The Netherlands also reports its daily numbers to the European Centre for Disease Prevention and Control, and it appears the numbers there may sometimes be more up to date than what is published locally.
Derived data
There is a very pretty government operated dashboard (in English even) that is based on the data sources above. Sometimes there are discrepancies. When in doubt, prefer the primary sources.
Every week on Tuesday, 60-page PDF is released containing the data described above, but also a ton of data on number tests, positivity rate, an estimation of R, data about other diseases etc. This data too is available only in Dutch. You can find it linked from here.
RIVM data
The daily data release contains fields describing if a patient was hospitalized, or sadly died. In addition, there are dates describing when the lab returned a positive result, the date when RIVM was notified and the suspected date of symptom onset.
Perhaps for privacy reasons, the data from OSIRIS is pre-processed somewhat, and once released, it contains:
- Age-group while alive: 0-9, 10-19.. 90+, and it also mysteriously has a <50 value, plus “Unknown”. Many of the early deaths are reported as <50.
- Sex
- Province (based on the place of residence of the patient)
- Hospital_admission: Yes, No, Unknown. For recent data, this field is set to ‘Yes’ only for patients explicitly admitted for COVID-19. People admitted for other reasons, incidentally having COVID-19, are reported as ‘No’. This field is sometimes only changed whole weeks after the actual hospitalization. It is not reset when the patient is discharged.
- Deceased: Unknown, Yes, No. This field is sometimes only changed whole weeks after someone died.
- Week of Death: The ISO-week number, where weeks apparently start on Monday, in 2020. Presumably this will go to week 53, 54 etc in 2021.
- Municipal_health_service: The GGD (municipal health service) that reported this patient. Note that although it is called municipal, many GGD’s span lots of cities and towns. Some towns actually have multiple reporting municipal health services (like Amsterdam).
The most confusing field is the “Date_statistics” one. Lamentably, this is an overloaded field that can contain different kinds of dates. This has lead to massive confusion. Depending on the “Date_statistics_type” field, this field can stand for:
- DOO (Date of Onset): Date of first symptoms, if known
- DPL (Date of Positive Lab result): Date when the lab reported a positive test
- DON (Date of Notification): Date when the infection was reported to RIVM
Over time, most patients evolve to reporting a DOO status. However, very recent data typically only shows DPL or DON. It would be MASSIVELY useful if RIVM reported all three dates if available, but they don’t.
Because of this overloaded date field, any graphs purporting to show testing delay may show all kinds of things, but not necessarily testing delay. Also, because patients effectively ‘move backwards in time’ over the days, it is extremely difficult to derive any kind of trends from the data. If a graph suddenly goes up for a certain date, this may simply mean the ‘DOO’ field was finally updated.
It is important to note that no one is ever removed from the OSIRIS data release. We do not keep track of who has recovered.
The data described above is made available every day at 14:15 in a file called COVID-19_casus_landelijk.csv (there is also a JSON version).
There are derived files which give daily and cumulative numbers of positive tests, hospitalizations and deaths, split out per city or town, but these do not provide age groups.
RIVM further offers data files where they estimate R and the total number of infectious patients.
Finally, there are lots of numbers from the waste water surveillance, but it is not clear how these should be interpreted. These may be more valuable to monitor a new outbreak than a rampant pandemic.
All files, including possible new additions, can be found on https://data.rivm.nl/covid-19/.
LCPS data
The LCPS data, which describes the capacity for COVID and non-COVID care on wards and intensive care units, can be found on https://lcps.nu/wp-content/uploads/covid-19.csv. A description, in Dutch, is here.
There are six fields:
- Datum: DD-MM-YYYY for which these statistics are valid
- IC_Bedden_COVID: Intensive Care Unit capacity used nationally for COVID-19
- IC_Bedden_Non_COVID: Intensive Care Unit capacity used nationally for non-COVID-19 care
- Kliniek_Bedden: number of general ward beds occupied by COVID-19 patients
- IC_Nieuwe_Opnames_COVID: since 17th of October, number of new COVID-19 ICU hospitalizations
- Kliniek_Nieuwe_Opnames_COVID: since 17th of October, number of new general ward COVID-19 hospitalizations
As noted above, the ‘beds’ number are the absolute number of occupied beds. So if 10 people get admitted and 10 people get discharged, the number reported does not change. This is different for the ‘Nieuwe’ fields which represent admissions.
Every day, LCPS also releases a news item with graphs that contain additional data, like the available ICU capacity and the number of non-COVID-19 patients on general wards. These graphs, in Dutch, can be found through https://lcps.nu/nieuws/.
Stichting NICE
The data from the NICE foundation, which reports hospitalizations (but not discharges), can be found through the many graphs they present on their website. By default the page shows statistics for the Dutch intensive care units. There is also a tab called “ COVID-19 op de verpleegafdeling” which shows graphs for general wards.
I have not found an official data feed from NICE, but their graphs are sourced from this piece of JavaScript. It requires some sleuthing to figure out which data corresponds to what graph. Be very careful interpreting this raw data since its meaning might change unexpectedly - it is not an official feed.
Of specific note for NICE data is that they report on suspected COVID-19 cases, which then later often turn out not to be COVID-19. This makes their most recent data somewhat harder to interpret.
The NICE foundation website contains a wealth of data and also analyses on duration of hospital stay, age histograms etc.