How Would You Track User Behavior with Plack and Catalyst?

| 10 Comments

One of the persistent questions which keeps entrepreneurs on the edge is "Are we building the right thing?"

In the first web bubble, the Silly side of Silicon Valley chased vanity metrics such as "the number of eyeballs on the site" and "brand awareness" and "unique visitors". Those numbers are only interesting when you can correlate them to producing value for customers and bringing in real cash in the form of revenue.

I've enjoyed the book The Lean Startup by Eric Ries because he offers a much better mechanism to track the success or failure of any attempt to produce real value to customers. While split testing (or A/B testing) is useful to see how small changes lead to different customer behaviors, Ries recommends cohort analysis, where you can see the behavior of real customers through the sales funnel and correlate the X-axis with individual changes to your business or product.

That means tracking customer behavior. If you're building some sort of software as a service product, and if the mechanism of delivery of that product is primarily a web site, you probably already know the punchline.

Assume I already know how to identify and log events for each salient customer action type. (I've built that kind of system before.) Assume I don't want to collect personally identifiable information (I don't). Assume I'm using Plack and its middleware heavily, and assume I'm happy using Catalyst as a web framework.

How can I identify unique users (with and without accounts) on a daily basis, anonymize them, but group their actions across the site such that my automated daily cohort graphs correspond with reality?

So far I've identified few points of possible contention. I can rely on browser cookies for unique identification of users if I know that user sessions have unique identifiers within a 24 hour period. (I could generate GUIDs for this, but that may be overdoing things.) I think< I also have to track the transition from anonymous visitor to authenticated user, but I might be able to convince myself that either replacing the current session or smple subtraction of successful login events from total number of unique anonymous visitors would give the right numbers.

(I also haven't dived much into how Catalyst 5.9 and Plack interact in terms of session and cookie handling. Everything's just worked, so I've ignored the details until now.)

I don't mind building such a system if necessary, but if all of the pieces are out there and available—or if someone's already built this and can give guidance—so much the better.

Have you solved this problem? If so, how did you do it? If not, how would you do it? Would you handle logging at the Plack level or the application level? Would you worry about tracking session changes? Does Catalyst need to know about this?

10 Comments

I think this is something you'd want to ask on stackoverflow. :)

If you are looking just for the general overview of all users then I would just leverage Google Analytics (GA), it's often 'good enough' and has all the right hooks to do sales funnels and other user path stuff.

That said, if you are looking to do per-user tracking GA is less then stellar. Depending on the user load you could just use something like redis to build what would amount to a user action history keyed on UUID that just expires in a week. If you want the data, its there and you can back it up if you want to extract it but that way you don't overflow with useless data. Going this route you can store off any events that you want, though it will require building up a redis via json api, though should be simple enough.

I've seen other suggestions for external tracking, and it comes down to two things.

I block GA and other tracking mechanisms whenever I can, so I don't want to foist that on other people and I do question its accuracy. Only something on the server side under my control can be accurate at all.

Second, I don't know that I'll have the customization possibilities I want in such a campaign. Sure my business probably isn't in the business of building generalized tracking and analysis systems, but we know the details of our business and may be able to get much better data if we tune for what we want. (Conversely, we may not get what we need if we don't think about it, where with an external service we might get that by default.)

I'm still not sure what to do.

I've really not an answer, but I'm really interested in them.

I use to block js too, I've a local dns not pointing to 8.8.8.8 (dns may track you), even I've an offline network, anyway my cable-modem has an internal certificate that may track me with any MAC/IP pair, until flashed or until I'm on some "owned by other" wifi link. I'm that kind of user that has duck-tape in the webcam, and prefers and old nokia 8bit-like mobile than an iphone.

I've seen at my ("ex")$work too many tracking soft, not only g.a. the list is very long, there is software to "record" your mouse moves and replay them, there are "heat" maps, to the venerable empty pixel tracking.. etc...

Now-days web, has too much "client-side" logic, there are even MVC and standalone frameworks in js.

So at the time, I think, the solutions are really hardwired to each application. Tracking sometimes may require third party APIs, sometimes may require backend or infrastructure usage tracking, sometimes you want to track a database query, sometimes you may do the report of gets to a js file, sometimes... and other times don't.

I hope someone used to such things, can help us explaining some strategy for your descriptions.

Well, and now I re-readed your last paragraphs in the post, and I can't delete my comment xD

Sorry.

> Only something on the server side under my control can be accurate at all.

Kissmetrics and Mix Panel are both API based, and allow you to send events and metrics from your server-side, not only from the client side javascript. Here's how you integrate Django and Rails session/user management with such tools.

http://support.kissmetrics.com/apis/ruby-specific
https://github.com/votizen/django-kissmetrics/blob/master/django_kissmetrics/base.py

I've no doubt you can do the same with Catalyst's session stuff.

> I don't know that I'll have the customization possibilities I want in such a campaign.

If you don't trust Software-as-a-Service industry and want to roll your own and spend more time building tracking software, then that's your choice.

I'll look at Kissmetrics and Mix Panel--glad to hear they have server-side events.

I don't particularly want to build my own tracking software, but if that's the fastest way to get what I need, I'll do it. This project is an exercise in minimal expense and effort--definitely not what I would recommend for most clients, but it's been quite enlightening so far.

I'd like to have a simple Plack::Middleware for web analytics. AWStats and W3Perl are both written in Perl, so you might be able to port them to Plack. Piwik is more similar to Google Analytics, but written in PHP and it requires MySQL.

> AWStats and W3Perl are both written in Perl, so you might be able to port them to Plack

Porting Awstats to Plack? I hope you are not kidding, or you haven't looked at the code.
http://awstats.cvs.sourceforge.net/viewvc/awstats/awstats/wwwroot/cgi-bin/awstats.pl?revision=1.976&view=markup

Google Analytics is mentioned quite a few times now. AWStats is obviously awful. There is an open source 'alternative' for Google Analytics that you can host yourself, it's Piwik and it's really pretty good.

I'm not sure if it will meet your needs but I do think it might be interesting for some other readers of the blog: http://piwik.org

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Entry

This page contains a single entry by chromatic published on December 19, 2011 1:41 PM.

When Print Debugging Fails was the previous entry in this blog.

Don't TSA That Data! is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?