--

You can directly jump to:

In this article, I want to share a useful method for identifying more of your ‘direct’ traffic using Google Analytics. 

Problems with ‘direct’ traffic in Google Analytics have been well known for some time, but on a client call recently, I discovered a new way to identify traffic sources that would previously have been labeled as direct. 

Depending on the type of site you have and how much mobile traffic you typically attract, this solution could allow you to identify previously unknown sources of traffic and conversions, and help you better understand your audience.  

Google Analytics is by no means perfect, but it can be useful, and is certainly relied upon by businesses monitoring performance and can feed into key decisions around marketing budgets. 

First, we’ll look at what direct traffic is, how Google describes it, and some of the common reasons traffic from other sources is mislabelled as direct, and fixes for these issues. 

Then we’ll look at some of the problems I’ve uncovered with Android-based traffic, and why this isn’t always picked up in Google Analytics. 

Finally, I’ll share with you the custom dimension I’ve created to identify and reclassify Android traffic so you can see more of your traffic sources and create more accurate reports. 

What is direct traffic in Google Analytics and why is it a problem? 

According to Google, direct traffic comes from ‘users that typed your URL directly into their browser, or who had bookmarked your site’. 

That may be true for some cases, though the truth is that direct traffic is a catch-all word for all the traffic that Google is unable to identify the referral source for. 

Essentially, direct traffic is any pageview sent to Google Analytics without either an ‘HTTP referrer’ or UTM tracking applied to it. 

In a world where marketers are looking to be more data-driven when making business decisions, direct traffic is a blind spot. 

In this post, I’m going to share a way to recover a large chunk of referral data, which would otherwise have been thrown in the direct pile. 

It can reveal previously unknown or underestimated sources of traffic which can help businesses to attribute traffic and sales to the correct channels, and to allocate budget where it is most effective. 

How to identify lost traffic sources in Google Analytics

There are two different methods to identify more of your direct traffic in Google Analytics. I outlined one method of fixing GA traffic sources in a recent article

This is by using UTM tags to label all non-HTTP traffic sources. These include:

  • Email campaigns.
  • Open Graph URLs.
  • Downloadable assets and links within PDFs or other non-HTML documents like Microsoft Word, Powerpoint, etc – B2B white papers and case studies for example. 
  • Employee signatures in emails. 
  • Social share buttons (added to the shared URL automatically). 

This method enables you to pick up a good portion of unidentified direct traffic, but I’ve found a way to recover traffic data from Android sources, which can include chunks of organic search and social traffic which would otherwise be unknown. 

The problem with Android-app:// based traffic

Many Android-based traffic sources have no HTTP protocol for Google to pick up, so much of it will just end up in the catch-all direct channel. 

As I’ve discovered, this traffic is not direct at all, but is often a mixture of search and social traffic from sources like Slack chats, Facebook, and Android Quick Search. 

The solution I’m outlining here uses session-based tracking, which records the previous page visited as the referral source. For this article, it’s useful to define what is meant by hit and session-based scope. 

A hit-based scope is defined as any single action on a website such as a pageview or an event triggered by watching a video or downloading a pdf. Every single element that is stored has a hit-based scope.

Here’s the explanation from the official Google Analytics Documentation:

Hit-level scope

When a custom dimension has hit-level scope, the value is applied only to the hit with which the value was set. This is demonstrated in Figure A, Figure B, and Figure C below:

Figure A: User sends two hits (H1, H2). H2 has a Custom Dimension1 value of A. That value is only applied to H2. 

Figure B: User sends a third hit (H3). H3 has no Custom Dimension value.

Figure C: User sends a fourth hit (H4). H4 has a Custom Dimension1 value of B. That value is only applied to H4.

Session-level scope

When two values with session scope are set at the same index in a session, the last value set gets precedence and is applied to all hits in that session. In Figure D below, the latest value set overwrites any previous values for that index:

Figure A: User sends a hit (H1) with no Custom Dimension value. 

Figure B: In the same session, the user sends a second hit (H2) with Custom Dimension1 value set to A. 

Session scope causes value A to also be applied to H1. Figure C: User sends a third hit (H3). Although no Custom Dimension 1 value is sent with H3, session scope causes value A to be automatically applied to H3.

Figure D: User sends a fourth hit (H4) with a new Custom Dimension 1 value B. Session-scope applies value B to all the hits in the session, overwriting value A in the previous hits.

A session-based scope is defined as the first hit that is stored as the session within the session time frame. One person’s actions on your site during a single browsing session, the pages they load, the files they download, are all connected into one session scope.

For example, if a user enters checkout and is then asked to log in with Facebook, or perhaps to pay using PayPal, Amazon, or some other third party, then the page after this (in the hit scope) gets a different referral source if the previous one was (direct / none). 

The problem here is that Facebook or PayPal can suddenly become your best-converting source when you view analytics data. If business decisions – spending more on Facebook marketing for example – are made on the back of this data this means budget can be misdirected. 

Google Analytics does have some default referral exclusions, whereby you can exclude traffic measurement from a certain source, but it’s unlikely you’ll want to do this for a source like Facebook because this might screw up other reporting too (the normal referrers).

There is a solution, by using hit-based tracking and ignoring the referral source during checkout, but this isn’t well known. I guess we will tackle this in another blog post.

Many people use Google Analytics on a daily basis to track performance and various internal KPIs, but they don’t always look behind the data to question what it says or to test how reliable it is. 

This happens at lots of large companies. Google Analytics data is used to support decisions such as where to direct marketing budgets, but the validity of the data is rarely questioned. I’ve seen many cases where the Google Analytics ML (Machine Learning) would say you should increase (direct / none) traffic because it converts really well.

The (direct / none) traffic source / medium fix in Google Analytics

Looking at a client’s data, I could see a lot of traffic classified as (direct) / (none). This represents a significant amount of referral data that can’t be understood. 

In some cases, the referral source for hundreds of thousands of site visits can be lost. 

So, I created a session-scoped custom dimension, which I’ll share further down the page, and this dimension can pick up referral data we simply couldn’t see before. In this case, we are picking it up using the {{referral}} variable in Google Tag Manager.

Creating a session-scoped custom dimension for the full referrer.

  • Step 1. Go to the Admin part of Google Analytics
Google Analytics Admin view
Google Analytics Admin view
  • Then go to the scroll down menu in the property options to Custom Definitions:
Custom Dimensions in Google Analytics
Setting a custom dimension in Google Analytics
Setting the custom dimension

Now we are setting up these 2 Custom Dimensions:

hit and session based full referrer hits in Google Anslytics
Setting the scope level hits and sessions

Next up is to fill this custom dimension with the data. In this case, we use the {{referrer}} field in Google Tag Manager. The Referrer is a built-in variable. Make sure you turn the variable on.

Setting the GTM full referrer field
GTM referrer field

Then in your Google Analytics Settings in Google Tag Manager, you set the variable for the custom dimension. The custom dimension has a number (from 1 to 20). At the Index field you add the number of the custom dimension, and at the Dimension Value, you add {{Referrer}}. Test the container change carefully. If it works, you can publish the container.

This custom dimension then starts to identify traffic from sources such as Android Quick Search – the default search bar on many phones. 

This isn’t direct, it’s SEO or PPC traffic that goes into (direct) / (none) as there is no HTTP protocol for Google to pick up. 

In this scenario, a site’s search optimization or paid search strategy may actually be working well and delivering traffic, but they can’t see it as it’s mislabelled. 

Google Quicksearch box example

This is the case for lots of PWAs (Progressive Web Apps), which use browsers but send the referral data under a different protocol. 

Quick Search on Android falls into this category, and Google Analytics ignores this and similar referral sources by default, so the data is lost. 

This is what a site looked like before applying the custom dimension – up to 40,000 visits a day classified as (direct) / (none). 

Direct / non source /medium in Google Analytics
Direct / non source /medium in Google Analytics

Once the custom dimension is applied, we can see the difference it can make. Suddenly we can view referral data from sources like Android Quick Search, Facebook, Twitter, and a range of sources. In some cases, I’ve seen up to 30% impact on direct traffic using this method, though the impact will vary. 

Referrers for direct / none Google Anslytics traffic

We can begin to see things like content shared within Slack groups or shared on social via Android apps. 

With the ability to view this previously unknown data, marketers can see how well content performs on social media, which in turn can justify further effort or budget being directed towards these channels. 

Other traffic classified as direct shows up as a series of weird numbers. In many cases, there were significant amounts of traffic behind these numbers. 

TRaffic sources from direct / none in GA

I started to Google some of these numbers to see what I could find, adding APK – this is Android Application Package, a package file format used by Android operating systems. 

Number +apk serp discovery

In many cases, as you can see from the Google results, this was actually Google News traffic via Android which had been classified as direct, so all these visitors had arrived at the site having found an article via Google News. 

Again, as with the Quick Search traffic, this is an important source of traffic – one which many publishers specifically target – which has been misreported. 

Revealing the true sources of traffic enables publishers to understand where their audiences are, to learn how successful they’ve been in terms of targeting certain sources, and to learn from this. 

How to reclassify direct traffic in Google Analytics

To label direct traffic correctly in Google Analytics,  I devised an Advanced Filter that would change referral codes to the real source. 

Filters in GA for rewriting

This filter can also rewrite the medium. So, for example, direct traffic from Android Quick Search can be correctly classified as Organic. 

Then you do the same for the medium (from none > organic):

Here’s the regular expression I used for this:

Regex:

android-app://com\.google\.android\.googlequicksearchbox/https/www\.google\.com|android-app://com\.google\.android\.googlequicksearchbox/?|android-app://com\.google\.android\.googlequicksearchbox/?

In this way, huge amounts of data can be attributed correctly. In the case of sites I’ve looked at, this has been large portions of organic traffic, as well as Google News referrals. 

It’s important to use a test view first when applying this filter, or the other fixes mentioned in this article. This allows you to check you’ve applied it correctly before making the change more permanent. 

So, why should you apply the direct / none traffic fix?

For years, many marketers have accepted that direct traffic is simply what the name implies – traffic coming directly to your site from browsers, bookmarks, and so on. 

Even where there’s an understanding that direct traffic is a little more complex than that, there’s been an acceptance that this traffic will remain a mystery. 

However, the truth is that you can identify much of this traffic,  and gain a greater view of the reach of your website, and the ways that people are finding your site and content. 

The good news for many marketers is that search optimisation efforts and social campaigns may have been far more effective than analytics data is telling you. 

The fix I’ve outlined here has been tested on a publisher’s website, and in this case, has recovered a lot of useful data. 

The same principle may work on other sites, for example, online retailers may be able to identify referral sources they were previously unaware of, enabling them to target marketing efforts more effectively. 

I’d love to see how it works on other sites, so if you try this on your own website, it would be great to see the results you get.

Feel free to share you thoughts by tweeting to @hellemans And a massive thanks to Simon Vreeman.

How to recover lost Android-based (direct / none) traffic in Google Analytics

Arnout Hellemans

Arnout Hellemans, online (search) marketer with a huge focus on conversions from Amsterdam. I have a passion for making the web more usable and helping businesses getting the most out of their online adventures.

See all posts by arnouthellemans

--