Guide: How to measure PDF files as web pages in your digital analytics tools

An easy and free solution to gather PDF file usage statistics and improve the visitor experience

PDF files can play an important role in recruitment, marketing and sales processes. Who isn't interested in knowing if a prospect or customer read a whitepaper or a product specifications sheet. However, many organizations don't go beyond measuring pdf file downloads. And when it is being measured, the data often is dispersed across many tools. The data on PDF downloads from web pages is collected in web analytics tools, while downloads from emails are collected in email and marketing automation tools. Sometimes the data is spread across even more tools if PDF files are also shared through sales prospecting tools like LinkedIn Sales Navigator. This article presents a solution by which you can load your PDF files on a web page, allowing you to implement digital analytics tools on the page like you normally would. The solution uses the open-source library PDF.js, which comes with several benefits.


PDF.js solution benefits

The PDF.js library is extensive and offers many applications. This article focuses on the most basic and simple to implement feature: the PDF file viewer. Through the viewer a PDF file is embedded in an iFrame on the page. This solution comes with several user experience and digital analytics benefits:

  • The pages are hosted on your domain and you can load your digital analytics tools on the page like on any other page. So, you collect the data in all tools running on the page.
  • You can use the URL path that you like. The PDF filename is not visible in the URL itself, which sometimes can be a distraction due to weird naming conventions used by companies. You also don't have worry about search engine indexing settings through the robots.txt file or the X-Robots-tag header, as you can set the meta robots tag on a page level.
  • The PDF file pages have a consistent behavior for all visitors and usually render better on mobile (in my experience). If you include your PDF file as a download link, it will open in a new tab for some visitors and it will automatically download the file for others. Besides that, the files often not read well on mobile as the text size may be small. If you embed the PDF in an iFrame on a page, it opens as a webpage for all users and my experience the text is better readable on mobile.
  • The measurements go beyond download clicks. You are able to track if the file is actually opened, and how often. While you have no statistics on visitors sharing PDF files, you do continue to track statistics when a person opens a URL that is shared with them.
  • The digital analytics tools can measure traffic from all sources like the previous webpage, email, search, social media, and sales prospecting tools. And it is possible to add campaign tracking parameters to the URL.
  • You can manage the reference to the actual PDF file in just one place. When there is a need to update the PDF to a newer version, you can just update the reference to the PDF file location in the CMS or HTML/JavaScript code (depending on you implemented this solution). This saves the effort to change the link to the file on all places where it is being used, like in emails, web pages, sales prospecting tools, and the like. This reduces the risk of having multiple versions of the PDF file in use as it is updated is some places but not all. The option also comes with a great opportunity, as it offers the flexibility to dynamically show different files to different visitors by using dynamic file references.
  • In your marketing automation and journey orchestration tool(s) you can simply build rules based on URL of the page. There is no need to update rules when a file changes, or when you show different files to different users based on the segment or audience they are in.
  • You can implement all kinds of additional tools and measures, like the time the visitor is actually on the file (by using a ping). Or a satisfaction survey amongst visitors that have the file open for more than X minutes.

There are also some disadvantages though to keep in mind:

  • If you use the most basic implementation as suggested in this article, you are unable to track on which page the visitor is in the PDF file, nor how far they've scrolled down. This is possible through a more advanced implementation of the PDF.js library, but that goes beyond the scope of this article
  • If your organization uses a special custom font in PDF files, the solution may not work properly. So please test it properly across all major browses, device types, and operating systems before rolling it out.
  • In my experience, the solution does not handle some PDF files and custom fonts well in old Microsoft browsers. This issue also exists when you use the legacy version of PDF.js. The solution does not work well in all versions of Internet Explorer and all versions of Edge that are not based on the Chromium codebase. You can recognize these browsers based on the user agent, which contain “MSIE”, “Trident”, or “Edge”. All current (Chromium) versions of Edge contain the string “Edg” in the user agent, which is on mobile/tablet merged with an OS reference (e.g., “EdgA” on Android and “EdgiOS” on iOS). If you want to support these older Microsoft browsers, you can safely provide a fallback solution based on the user agent. My personal recommendation for these older browsers is to not use PDF.js, but use a fallback with a link to the file instead.

Example page

Now we have a high-level overview of the benefits and disadvantages, lets look at an example PDF page. There is no need to dig into the source code yet, as the steps to implement it are explained in the following sections. So, have a look at this example page with digital analytics integration.


Implementing the solution in four simple steps

In this section you will find the solution to implement the basic solution, which outlines how the implementation setup works. The sections after this one share approaches on implementing it in a more scalable manner. The basic solution used on the example page is easy to implement, as you can just follow these four steps:

  1. Determine where you would like to host the pages, PDF files and PDF.js viewer files. The pages can for example sit on your main domain, your marketing automation platform (Salesforce, HubSpot, Eloqua, etc.), or a dedicated subdomain. The PDF files and PDF.js are ideally stored on the same (sub)domain. You can store the library and the files on your website hosting, the marketing automation tool, or a separate CDN location. To keep it easy, I hosted the pages on my main domain and I used my website hosting to store the files and PDF.js library.
  2. Download the PDF.js files and store them on your webserver or CDN. It is recommended they are stored on the same (sub)domain as the PDF files you want to load, to prevent any warnings or errors due to cross origin requests. Go to the PDF.js website and go to Download. There you can download the latest prebuild stable version. Note that they also offer a legacy version, which you can use as the fallback / differential bundle.
  3. For those of you not familiar with differential bundling: you can embed the modern ES6+ and the legacy ES5 versions on the page, whereby the browser will load/execute the applicable version. You can do so by wrapping the modern ES6+ version in a <script type=”module”></script> tag, and the ES5 version in a <script nomodule></script> tag.
  1. Add an iFrame on the page with the correct link. The link exists out of two parts:
  2. The link to the viewer.html on your web hosting or CDN, e.g.: "https://www.mydomain.com/pdfjs/web/viewer.html"
  3. The link to your PDF file, e.g. "https://www.mydomain.com/files/pdf/my-pdf.pdf"
    You create the full link by adding the link to the pdf file in the "file" parameter of the PDF viewer link. Example:
    https://www.mydomain.com/pdfjs/web/viewer.html?file=https://www.mydomain.com/files/pdf/my-pdf.pdf
  1. Make the PDF file full screen by adding the following CSS declarations to the iFrame
  2.                 
    position: absolute;
    left: 0;
    top: 0;
    width: 100% !important;
    height: 100% !important;           
                  
                

The whole iFrame HTML element should now look like this:

          
<iframe src="https://www.mydomain.com/pdfjs/web/viewer.html?file=https://www.mydomain.com/files/pdf/my-pdf.pdf" style="position: absolute; left: 0; top: 0; width: 100% !important; height: 100% !important;"></iframe>
          
        

As you can see the solution itself is very simple to implement. However, it is not yet scalable. Now we understand how the HTML element should look like, we can make the solution scalable with JavaScript.


More scalable implementation with JavaScript

For the purpose of this article, I included the CSS styling and JavaScript in the HTML code of the example page. This is far from ideal for a real implementation. With every update one would need to touch the HTML code of the page, which is often a tedious, time consuming and error prone task. This section shares the code used to create the example page, so you can easily embed it in your code base. Or if you quickly want to test it out without development support, you can as easily host it in your tag manager (not recommended as a long term solution).

The two code sections below show the CSS and JavaScript code used on the example page. You can also inspect this in the source code included on the bottom of the HTML of the example page, which also contains a placeholder for a ES5 fallback.


CSS styling:

          
iframe#pdf-viewer {
  position: absolute;
  left: 0;
  top: 0;
  width: 100% !important;
  height: 100% !important;
}
         
      

JavaScript:

          
'use strict';

/* Support function to create elements */
const createElement = (type, attributes = {}) => {
  const element = document.createElement(type);

  Object.keys(attributes).forEach((key) => {
      element.setAttribute(key, attributes[key]);
  });

  return element;
};

/* Support function to get the iFrame target URL */
const getIframeUrl = (pdfFilePathAndName, pdfViewerFilePath, origin) => {
  return 
    `${origin}${pdfViewerFilePath}?file=${origin}${pdfFilePathAndName}`;
};

/* Function to inject the iFrame in the page */
const injectIframe = (
  pdfFilePathAndName,
  pdfViewerFilePath,
  origin = window.location.origin,
  cssSelectorInsertLocation = 'body'
) => {
  const iFrame = createElement('iframe', {
      id: 'pdf-viewer',
      src: getIframeUrl(pdfFilePathAndName, pdfViewerFilePath, origin)
  });
  document.querySelector(cssSelectorInsertLocation).appendChild(iFrame);
};

/**
 * CONFIG variables
 * Please update these to your respective implementation.
 * it is recommended that you update these with dynamic values.
 * I included them here as static values for clarity.
 */
const pdfFilePathAndName = '/pdf/files/example-pdf-page.pdf'; 
  /* PDF file path and name */
const pdfViewerFilePath =
  '/pdf/pdfjs-2.14.305/web/viewer.html'; 
  /* path to the pdf.js html viewer file */

injectIframe(pdfFilePathAndName, pdfViewerFilePath);
          
        



The following assumptions and notes are applicable to the code:

  • Both the PDF.js and PDF files sit in the same origin (protocol + domain name, e.g., https://www.example.com). If that is not the case, you need to make some minor tweaks to the code.
  • The PDF.js and PDF files sit in the same main domain as the website. If that is not the case, the correct domain must be added as third argument when invoking the injectIframe function.
  • CSS is added separately in the HTML code. it is recommended to add this to the styling sheet.
  • The iFrame is now loaded through JavaScript. It is recommended to dynamically insert the filename and even the pdfjs version, so you can easily update these when needed. The PDF.js version can for example be set as a config variable in your JavaScript repository, while there are many ways to dynamically set the filename.
  • The code assumes there is a HTML body tag. If that is not the case, then update this with another CSS selector as fourth argument when invoking the injectIframe function.

And there you have it. A simple solution to load PDF files as web pages, allowing you to load the digital analytics tools you like on the page.

In case you have questions or comments with regards to this article, feel free to reach out on LinkedIn

  • Was this article helpful?