Home Grown Analytics

December 2023

Let's say you want some idea of how much traffic your site is getting. Let's also say you're weird in the sense that you don't want to just drop Google Analytics on your page which is essentially the industry standard. Maybe you're looking at alternatives like Plausible, Cloudflare's Web Analytics, or Fathom. Maybe you're like me and you are really wondering how much of the feature set of those tools you actually need. What if you really whittled it down to the core of what you want to track and just rolled your own analytics. That's what I have done for Life Computed and now I'm going to talk about it on the internet.

Wait... should you really do this?

Sure, why not.

I'm usually not one to try to reinvent the wheel, and this is a domain where there are lots of existing solutions. Life Computed is a hobby, though, and mostly just an excuse to hack around with a project where there are no consequences. I have no marketing department to answer to. No product manager expecting certain types of data to be in specific formats. It's just me, a lone developer, looking to have some fun building things. All I care about at this point is tracking page views and sessions in an aggregated, privacy conscious manner.

But in all seriousness, probably don't build this yourself.

Please realize this is mostly just food for thought and not something I'm recommending for production systems at scale. It's my baby step in getting some visibility and transparency on this site's traffic, and I just wanted to share. If you are following along trying to implement some tracking for a "real life business" website, I do believe you'd probably be better off just finding something you can use off the shelf.

How does it work?

So let's touch on requirements again which are fairly basic thus far. I want to know how many sessions my website gets each day. I also want to know which pages are receiving the most traffic. I don't care much about what a specific user's journey is, where they came from, or anything like that which might be a fairly standard requirement in some people's eyes. At the core, I just want to know roughly how many people are visiting the site and which pages are they viewing. This should be plenty to have an idea of which content or tools work and which don't. It'll also be great for looking at longer term trends.

A note on privacy

Earlier I mentioned not caring much about a specific user's journey and caring more about trends. This helps a lot in letting me implement something which is fairly privacy conscious. Regulatory requirements continue to get stricter here and users generally don't like you being a creep. So my version of rolling my own analytics here doesn't retain any tie back to a specific user. The desired data we want to retain is just an aggregate or count.

Data models

Given those requirements, let's reason about the data we want to store. This is a Ruby on Rails application, so I'm making some Active Record models for persistent storage to a database. I'm going to try to talk about this in a framework/language agnostic way, though. There's no reason you couldn't build this anywhere.

diagram about persisted data

I've introduced a few models / tables for persisting data related to user sessions and page views:

Tracking sessions and views

In our application, we have a base controller ApplicationController that we can tweak to inspect every request made to the application. In this we can add two before_action hooks, one for tracking sessions and one for tracking views.

Let's review the hook for tracking sessions first. Here we rely on Rail's existing concept of a session, and simply stash the unique identifier for our tracked session record in there. The method has a condition to see if we already stashed that tracking record ID on the session. If it already exists, the method doesn't do anything. If it doesn't exist, we just create a new AnalyticsSession record and stash that ID. Easy enough.

class ApplicationController < ActionController::Base
  before_action :session_tracking

  def session_tracking
    return if session.key?(:analytics_session_id)

    analytics_session = AnalyticsSession.create
    session[:analytics_session_id] = analytics_session.id
  end
end

Let's take a look at the view tracking hook; it's fairly simple too. This one has a condition to bail out if the request isn't a GET. I'm really just looking to track what is effectively the concept of a page view, so I don't want to get form submissions or whatever else. If it is a GET request, the method creates a new AnalyticsView record with the path requested by the client and the client's session ID. As far as order of operations go, take notice this depends on the session ID being present. Thus we should make sure to ensure session_tracking runs before view_tracking.

class ApplicationController < ActionController::Base
  before_action :view_tracking

  def view_tracking
    return unless request.get?

    AnalyticsView.create(path: request.path, analytics_session_id: session[:analytics_session_id])
  end
end

Aggregating and anonymizing

Those two types of records we're creating now to track sessions and views are quite verbose and uniquely identify a user. Both of these can be a problem. Since I hope to look at a count of these records by day over time, read performance stands to get worse over time as the volume of data piles up. I also stated earlier it's my hope to implement something privacy conscious. I don't care about a specific user; I'm more worried about trends here. So let's introduce a job that can asynchronously aggregate these records into a more useful format for our requirements.

I named my job RollupAnalyticsJob. It will aggregate (or rollup) the view records and then cleanup old ones. Then it will do the same for the session records. The rollup groups everything by date (since I want to review daily counts) and the cleanup purges records after 7 days.

class RollupAnalyticsJob < ApplicationJob
  queue_as :default

  def perform(*_args)
    rollup_views
    cleanup_views

    rollup_sessions
    cleanup_sessions
  end

  # rollup_views and cleanup_views omitted for brevity, basically the same as we do for sessions
  
  def rollup_sessions
    sessions_by_date = AnalyticsSession.order('date(created_at) desc').group('date(created_at)').count
    sessions_by_date.each do |date, count|
      session_rollup = AnalyticsRollup.find_or_create_by(analytic_type: 'AnalyticsSession', date:)
      session_rollup.count = count
      session_rollup.save
    end
  end

  def cleanup_sessions
    AnalyticsSession.where('date(created_at) < date(?)', 7.days.ago).find_each(&:destroy)
  end
end

Viewing your analytics data

Now we have session counts and page views in a table that can be read fairly easy. I have added a fairly basic page which is accessible over at /analytics where you can see it in action. The page is comically unscalable since it just tries to make a bulleted list of all dates, sessions, pages it can find in the table. It works ok for now since the website only has two pages and no traffic. That's all for today, but I'm hoping to be back with another article on building a sexier page to view all this data. Thanks for reading!

our basic page to view analytics