Between Apple's new privacy protections on their devices, Safari and Firefox blocking third-party cookies by default (with Chrome following close behind in 2022), and the proliferation of cookie consent banners because of GDPR (not to mention the California Consumer Privacy Act), our ability to track users for analytics, especially top of funnel when those users are almost always anonymous, is moving from complicated to almost impossible.
This problem isn't going anywhere: getting only 20% acceptance on your cookie-consent banner is considered good! If you're unable to set a consistent cookie across your user's many sessions (especially for a high retention business like e-commerce), or your javascript conversion events (Google Tag Manager for example) are being blocked, your user's historical behavior will be extremely difficult to stitch together over time.
The idea is simple. Once a user converts we know who they are (e.g. they've filled out a signup form and entered their email). We can then look up past anonymous data that matches and assign it to that user.
As a diagram:
For reference, we've been using this strategy presented at Narrator, and in most cases, we're able to attribute 95% of our clients' anonymous sessions to conversions. Said another way, our customers are able to stitch historical anonymous data to 95% of their converted users–this is even in the last few weeks after all the Apple device and browser privacy updates.
Many customer interactions are easy to attribute. You'll always know who converted, who received emails, or who submitted an order.
Pageviews, on the other hand, are almost always anonymous: think of a landing page, instead of an app where users are already logged in. Unfortunately, pageviews also contain the most important data for attribution and analysis: utm sources, referral URLs, etc. We need to know what our users were doing right before they converted, and the best place for that is page view data.
Though it's incredibly important, it's difficult to tie identifiable conversion events (purchases, emails, bookings, subscriptions) to the user's previous behavior. Most analytics tools simply don't have access to all the data. Either they'll have just page views (Google Analytics) or just leads (your CRM) or just emails (your Email Service Provider), and even when they do – they rely on unreliable javascript conversion pixels.
Imagine you're the owner of a high retention e-commerce business (good for you!), and you're spending on Google and Facebook/Instagram ads to get more traffic to your site.
If you're able to filter out the RETURNING customers from your targeted advertising, you could focus your spending entirely on NEW customers.
RETURNING customers were going to convert anyway (since you have a high retention business) so spending on advertising to "retain" them is a waste vs re-allocating that spend to capture NEW customers.
At a high level:
mysite.com/checkout?order_id=192381923
).First, ensure that individual page views are sent to your data warehouse.
The services below allow you to maintain consistent anonymous identifiers across individual sessions, and if the user allows cookies, across multiple sessions as well!
NOTE: If you already use Google Analytics, be careful as Universal Analytics is not the same as Google Analytics 4!!!
When users first come to your site they're anonymous. When they convert — submit a form, buy a product, log in, etc — you know who they are, so it's important to send that information down to your analytics system.
Some examples
Anonymous:
Identifiable:
Whenever a user tells you who they are on a site when they're anonymous, you have two options:
We highly recommend option #1 above because you're always aware when page view tracking is off, therefore, making it far more consistent and reliable. We often see Javascript-based "identify" calls break or getting caught by chrome extensions so you lose days (sometimes weeks) of data before realizing it.
The way to do this is to find a unique identifier for the order, signup form, email opened, etc. These should be consistent — most 3rd party tools have an id available (e.g. a Shopify order id).
The identifier shouldn't directly identify the user. It should instead come from the actual action taken. In other words, don't use/
[email protected]
It's harder to manage and leaks personally identifiable information.
Here's the URL approach for a few different conversion types:
Completing an Order
example.com/confirmation?order_id=192381923
(Shopify already does this with its unique checkout URLs)Signing up for a Subscription
example.com/confirmation?subscription_id=192381923
Joining a Newsletter
example.com/confirmation?contact_id=192381923
Clicking on a link from an email
example.com/product_page?contact_id=192381923
Scheduling a meeting
example.com/confirmation?booking_id=192381923
Now you should have a warehouse with:
It's time to stitch them together. All you have to do is find the page views with an identifier in the URL. The trick here is that the page views with the identifier in the url also have an anonymous id. Simply look up the user from the identifier, note the anonymous id, and replace the anonymous id with a real user in the data.
Using the "Completing an Order" example from above:
order_id
in the URL...order_id
example.com/confirmation?order_id=192381923
) go back in time and overwrite their anonymous user_id with that user's emailRemember the earlier graphic? It's this flow, just on your data warehouse:
Example SQL for folks who like queries:
SELECT
p.anonymous_id,
o.email
FROM website.pages p
left join order_service.order o
on o.id = nullif ( substring ( regexp_substr ( lower( p.search ::varchar) , 'order_id=[^&]*' ) , 9 ) , '')
where p.search ilike '%order_id%'
So why does this work even if tracking cookies are blocked?
Page views will still have a unique identifier per session (since that doesn't require cookies). As long as the user did one identifiable action during that session we'll be able to attribute their page views. If the user has cookies turned on then the anonymous_id will stay consistent across sessions.
I know I sound like a broken record, but it's important that you have a strategy for EVERY time a user comes to your site anonymously. If you have lots of returning users (E-Commerce for example), your users will come back on their phones, tablets, computers, etc... This means an individual user will have MANY anonymous identifiers to stitch together.
This also means that you'll need to run the user attribution queries on a regular basis. Building out a data platform that can easily manage this is outside the scope of this post. That said, data platform tools like Narrator can help. We attribute anonymous visits to users automatically and transparently.
By following the strategy above, we've seen Narrator clients achieve > 95% attribution on their anonymous page views even in the multi-touch/multi-device world we live in.
It all comes down to being diligent with URLs with the necessary identifiers in the query params. Once you have them you can easily identify anonymous page views and stitch the user together with your other data sources.