Home Grown Analytics
December 2023
Let's say you want some idea of how much traffic your site is getting. Let's also say you're weird in the sense that you don't want to just drop Google Analytics on your page which is essentially the industry standard. Maybe you're looking at alternatives like Plausible, Cloudflare's Web Analytics, or Fathom. Maybe you're like me and you are really wondering how much of the feature set of those tools you actually need. What if you really whittled it down to the core of what you want to track and just rolled your own analytics. That's what I have done for Life Computed and now I'm going to talk about it on the internet.
Wait... should you really do this?
Sure, why not.
I'm usually not one to try to reinvent the wheel, and this is a domain where there are lots of existing solutions. Life Computed is a hobby, though, and mostly just an excuse to hack around with a project where there are no consequences. I have no marketing department to answer to. No product manager expecting certain types of data to be in specific formats. It's just me, a lone developer, looking to have some fun building things. All I care about at this point is tracking page views and sessions in an aggregated, privacy conscious manner.
But in all seriousness, probably don't build this yourself.
Please realize this is mostly just food for thought and not something I'm recommending for production systems at scale. It's my baby step in getting some visibility and transparency on this site's traffic, and I just wanted to share. If you are following along trying to implement some tracking for a "real life business" website, I do believe you'd probably be better off just finding something you can use off the shelf.
How does it work?
So let's touch on requirements again which are fairly basic thus far. I want to know how many sessions my website gets each day. I also want to know which pages are receiving the most traffic. I don't care much about what a specific user's journey is, where they came from, or anything like that which might be a fairly standard requirement in some people's eyes. At the core, I just want to know roughly how many people are visiting the site and which pages are they viewing. This should be plenty to have an idea of which content or tools work and which don't. It'll also be great for looking at longer term trends.
A note on privacy
Earlier I mentioned not caring much about a specific user's journey and caring more about trends. This helps a lot in letting me implement something which is fairly privacy conscious. Regulatory requirements continue to get stricter here and users generally don't like you being a creep. So my version of rolling my own analytics here doesn't retain any tie back to a specific user. The desired data we want to retain is just an aggregate or count.
Data models
Given those requirements, let's reason about the data we want to store. This is a Ruby on Rails application, so I'm making some Active Record models for persistent storage to a database. I'm going to try to talk about this in a framework/language agnostic way, though. There's no reason you couldn't build this anywhere.
I've introduced a few models / tables for persisting data related to user sessions and page views:
- AnalyticsSession - this represents a session, which is essentially a collection of page views by a single client
- AnalyticsView - this represents a GET request for any particular page
- AnalyticsRollup - groups up and counts the previous two models
Tracking sessions and views
In our application, we have a base controller ApplicationController that we can tweak to inspect every request made to the application. In this we can add two before_action hooks, one for tracking sessions and one for tracking views.
Let's review the hook for tracking sessions first. Here we rely on Rail's existing concept of a session, and simply stash the unique identifier for our tracked session record in there. The method has a condition to see if we already stashed that tracking record ID on the session. If it already exists, the method doesn't do anything. If it doesn't exist, we just create a new AnalyticsSession record and stash that ID. Easy enough.
class ApplicationController < ActionController::Base before_action :session_tracking def session_tracking return if session.key?(:analytics_session_id) analytics_session = AnalyticsSession.create session[:analytics_session_id] = analytics_session.id end end
Let's take a look at the view tracking hook; it's fairly simple too. This one has a condition to bail out if the request isn't a GET. I'm really just looking to track what is effectively the concept of a page view, so I don't want to get form submissions or whatever else. If it is a GET request, the method creates a new AnalyticsView record with the path requested by the client and the client's session ID. As far as order of operations go, take notice this depends on the session ID being present. Thus we should make sure to ensure session_tracking runs before view_tracking.
class ApplicationController < ActionController::Base before_action :view_tracking def view_tracking return unless request.get? AnalyticsView.create(path: request.path, analytics_session_id: session[:analytics_session_id]) end end
Aggregating and anonymizing
Those two types of records we're creating now to track sessions and views are quite verbose and uniquely identify a user. Both of these can be a problem. Since I hope to look at a count of these records by day over time, read performance stands to get worse over time as the volume of data piles up. I also stated earlier it's my hope to implement something privacy conscious. I don't care about a specific user; I'm more worried about trends here. So let's introduce a job that can asynchronously aggregate these records into a more useful format for our requirements.
I named my job RollupAnalyticsJob. It will aggregate (or rollup) the view records and then cleanup old ones. Then it will do the same for the session records. The rollup groups everything by date (since I want to review daily counts) and the cleanup purges records after 7 days.
class RollupAnalyticsJob < ApplicationJob queue_as :default def perform(*_args) rollup_views cleanup_views rollup_sessions cleanup_sessions end # rollup_views and cleanup_views omitted for brevity, basically the same as we do for sessions def rollup_sessions sessions_by_date = AnalyticsSession.order('date(created_at) desc').group('date(created_at)').count sessions_by_date.each do |date, count| session_rollup = AnalyticsRollup.find_or_create_by(analytic_type: 'AnalyticsSession', date:) session_rollup.count = count session_rollup.save end end def cleanup_sessions AnalyticsSession.where('date(created_at) < date(?)', 7.days.ago).find_each(&:destroy) end end
Viewing your analytics data
Now we have session counts and page views in a table that can be read fairly easy. I have added a fairly basic page which is accessible over at /analytics where you can see it in action. The page is comically unscalable since it just tries to make a bulleted list of all dates, sessions, pages it can find in the table. It works ok for now since the website only has two pages and no traffic. That's all for today, but I'm hoping to be back with another article on building a sexier page to view all this data. Thanks for reading!