Tracking-free audience statistics

As regular readers of this blog will know, I am no friend of the surveillance capitalism that currently powers the web. It is becoming well-neigh impossible to develop any software or host any content without some component tracking your users & sending data to third parties.

Even though I try very hard not to fall for this, periodically I discover that my sites also embed things that embed things that potentially leak your details to third parties. This week I found out that the math rendering that comes with the excellent site building software I use is actually served by Cloudflare, something I try to avoid.

Many of us have become inured to every page on the internet talking to all kinds of parties that neither the reader or the author ever consciously wanted to share data with. But I for one still don’t like it.

Even those aware of all this leaking may rationalise it as the price to pay for getting the things we need. But it turns out you can get a lot of what you need without snitching on your audience to random servers on the Internet.

Metrics for your articles

Tons of websites now report their visitors to Google Analytics. Even very privacy sensitive places (including governments) have lost the battle against their marketing departments & caved. The deal here is that as a site operator, you gain insight into your audience. The price you pay is that you share your visitor data with Google.

Before I saw the light, I spent quite some time looking at such analytics, and the graphs sure are pretty. But they rarely told me anything actionable. I doubt the price is worth it. Ask yourself: have any of those fancy maps of where visitors come from ever changed your behaviour?

Also, was it worth the GDPR cookie warning?

Some things you do need to know

But the thing is, if you host or write content, you would like to know if people are actually reading it. Lots of “hits” turn out to be crawlers, bots or scripts. It is also nice to know if your human visitors are making it to the end of your articles, or if they are bailing after 25%.

Last I checked, most of the “pay with your visitors’ data” analytics platforms don’t actually tell you this.

Every author is under pressure to make their articles as short as possible. “Kill your darlings” they say, and this is true. Just because you typed it in doesn’t mean it is worth reading.

Simultaneously, it would be great to actually have data to back up if articles are too long or too short. If 100% of your readers are making it to the end of your story, it is a reasonable bet they would’ve liked to read more, for example.

Audience-minutes.js

A few weeks ago I instrumented berthub.eu/articles with a small bit of JavaScript that samples a small proportion of the reading minutes spent on the site. Complete details on how this works, and the privacy considerations, can be found on the audience-minutes GitHub page. This is all open source.

A set of three simple scripts generates graphs like this one:

These are the reading statistics of a rather long-winded article I wrote on the CureVac SARS-CoV-2 vaccine.

The audience-minutes.js script reports a proportion of the minutes readers were active on a page. And with these reports, it also notes how far the user had scrolled at that point.

What we can see in the graph is that a lot of the samples happened in the first 10% of the page. This could represent “bounces”, folks that land on the article and decide it is not for them.

Secondly, we can see that around half the remaining readers bail at the 40%-50% mark.

And finally, we can also conclude that the remaining readers probably liked what they saw, because there was no further drop-off, straight until the end.

More discussion on what these graphs look like & what they might be mean can be found in the README file.

Summarising

The majority of sites on the Internet snitch on their visitors to many third parties, for questionable gains. With some simple scripts it is possible to gather statistics on how many minutes are spent on your site, while also learning if readers are making it to the end of your articles or not. Crucially, this software involves no tracking, no cookies, no local storage and no third parties.

The audience-minutes.js script can be dropped into most websites, is open source, and can be found on my GitHub page.