Chapter 1 Marketing Analytics Foundation
1.1 Definitions
Retail Media
Retail media refers to advertising and promotional activities that take place within retail environments, both online and offline. It involves brands and advertisers collaborating with retailers to reach their target audience directly through various advertising channels within the retail space. The goal is to influence consumer behavior, drive sales, and enhance brand visibility at the point of purchase.
In the context of e-commerce, retail media often involves advertising on the websites or platforms of online retailers. Brands may pay for sponsored product listings, display ads, or other promotional placements on these platforms to increase their products’ visibility to potential customers.
In physical retail settings, retail media can include in-store displays, product placements, and other advertising methods within brick-and-mortar stores. This form of advertising is designed to capture the attention of shoppers while they are in the process of making purchasing decisions.
Overall, retail media aims to leverage the retail environment to deliver targeted and relevant advertising messages to consumers, ultimately driving sales and creating a mutually beneficial relationship between brands and retailers.
1.2 Introduction to Marketing Analytics
1.2.1 Some Marketing terms
1.2.1.1 A brick-and-mortar store
It refers to a physical retail location where business is conducted in person with customers. The term “brick-and-mortar” is used to distinguish these traditional stores from online or e-commerce businesses. In a brick-and-mortar store, customers can visit the physical location, browse products on shelves or displays, make purchases in person, and interact with store staff.
The term “brick-and-mortar” comes from the materials used to build physical structures—bricks and mortar. This contrasts with online or virtual stores, which operate on the internet without a physical presence. Brick-and-mortar stores have been a common and traditional way of conducting retail business for many years, but the rise of e-commerce has led to increased competition and changes in the retail landscape. Some businesses also use a combination of both brick-and-mortar and online channels, known as omnichannel
retailing, to reach a broader customer base.
1.2.1.2 Retail Media
“Retail media” refers to the use of media channels, often within a retail environment, to promote and advertise products or services. It involves advertising and marketing efforts directly within the retail space, whether in physical stores or online platforms.
In a retail media strategy, businesses leverage various channels such as in-store displays, digital signage, sponsored product listings on e-commerce websites, and other promotional methods to reach their target audience. This approach allows retailers to monetize their own media properties and engage with customers in a more targeted and contextually relevant manner.
The concept of retail media has become increasingly important as e-commerce and digital marketing have grown in prominence. It provides retailers with an additional revenue stream and allows brands to connect with consumers at key points in the purchasing process.
For example, a retail media strategy in an online marketplace might involve brands paying to have their products featured prominently in search results or on category pages, increasing visibility and potentially influencing purchasing decisions.
1.2.2 Role of marketer
If you spend some time online, you’ve probably noticed that quite a few of the ads you see are for products you’re genuinely interested in and some of these ads may have prompted you to buy. Well, that’s no coincidence. You likely saw these ads because some underlying data and marketing analysis made the advertiser understand that you might be interested in the product advertised.
That’s just one of the many ways in which marketing analytics powers marketing. In fact, it’s fair to say that all marketing benefits from data
and analytics.
Marketing is responsible for promoting and selling the products that a company makes. And that usually involves the following steps:
First, the marketers identify the right customers
for the product. They think about and research who might be interested in buying the product.
They may interview people and use questionnaires to understand their needs better and whether these needs might be met by the product they sell.
They will also try to form a better picture of the lives of the people that may want their product. All this is referred to as market
or consumer research
and it’s an important part of marketing. Marketing analytics is essential in this phase to help analyze the data from the research. Based on this research, marketers will create a message and a story about the products they sell. They will also decide where that message should go so it can reach the customers who may be interested. This may mean creating a website, a Facebook page, or an Instagram account.
And in some cases, creating ads that could be put in magazines, on TV, on the web, and so on. This is the creative part of the marketer’s job.
Many companies will rely on advertising or creative agencies to help them. Coming up with the right message and the right imagery to complement that message is a real art. But marketing analytics is used in this phase as well.
Often marketers will test their messages out and gather data about the ads or promotions they run to try and understand what works best for their audience.
To get marketing messages out, marketers need a plan. Marketers will carefully select the different places where they talk about their product.
And if they plan to advertise, they will work on strategies that help them to most effectively spend their advertising budgets. Marketing analysts have an important role to play here. By using data and analytics to create their plan, marketers can save time and money.
Once campaigns and promotions start to run, marketers will watch them carefully and focus on evaluating the results they see, like clicks on ads from campaigns, sales resulting from ads, and so on. Based on that information, marketers will adapt their advertising and their marketing plans. Given that a large share of marketing budget is spent online these days, real-time updates are possible
and good marketers will make use of the opportunity by optimizing in real-time as well. But to do it effectively, they need marketing analytics.
After an advertising campaign runs its course, marketers will want to evaluate whether their money was well spent. And they will have to report back to other people in the business on the success of their efforts, so they perform more analysis. And while reporting is essential, this analysis will also help the marketer get better and optimize their message and strategy for their next campaign.The better you are at marketing analytics, the better you’ll be able to optimize and get your message out to the right people in a more efficient way.
Analytics plays a crucial role in marketing. A marketer equipped with good data and analytics, will be a better marketer. Online actions generate loads of data that can be a goldmine for marketers if used well. As a result, marketing analytics skills are in high demand, and all great marketers today rely on analytics to make decisions.
1.2.3 What is marketing analytics?
It is the practice of measuring and analyzing data to inform, evaluate, and improve the performance of your marketing initiatives. Marketing analytics is all about gathering and analyzing data to make your marketing better.
5 Main Uses of Marketing Analytics
- Identifying the target audience
- Planning and forecasting
- Evaluate marketing effectiveness
- Marketing optimization
- Optimizing the sales funnel
Now let’s take a look at where the role of the marketing analyst comes in, or how marketing analytics can support and improve marketing.
1. Identifying the target audience: Marketers have limited budgets, so they want to make sure that their marketing message reaches a receptive
audience. Or in other words, they want to talk to people that may have an interest in buying their product. In marketing, they refer to that as your target audience. So how do marketers find that audience? Well, they use research.
Imagine a marketer for a mattress company. They could use surveys to get a better understanding of when people buy mattresses, what they find important when buying a mattress, how much they would spend on a mattress, and so on. They might also use some databases that exist about mattress sales, and the demographics of the people who buy them. All of this will help describe who the best target audience is for the mattresses.
- What are their characteristics?
- Where do they live?
- What phase of their life are they in, and so on?
All the information the marketer gathers comes in the form of data. And marketing analytics will help to make sense of that data and paint the detailed picture of the target audience the marketer needs to create the marketing message and get in front of the interested people.
2. Planning and forecasting: Marketing and advertising can be expensive. So before marketers spend their budgets, they evaluate where and when they should put their marketing message. They will carefully think about how much of their time and how much of their budgets should be spent on social media, ads on TV, radio, search engines, and so on. Often, marketers will take a look back at their previous marketing efforts, and on the basis of the success they have had with campaigns in all of these different places, they may decide to spend more or less money in some of them this time around.
The mattress company may use data from the previous year to determine the mix of advertising for the coming year. They may have learned, for instance, that advertising on TV and social media really worked for them, but that there was little payoff from the ads on the radio. Often, this exercise comes with a detailed forecast of the sales they can expect based on the budget they plan to spend. This is where marketing analytics comes in. The more a marketer can rely on data and analysis in this phase, the better the results of their marketing will be. After marketing campaign is over, marketers will take a look back and ask themselves whether their campaign was effective. It helps them to learn more about what worked and what didn’t work. But they’re also doing it to report back to other people in the company.
3. Evaluate marketing effectiveness: They’re given a budget, and management typically expects a report on how the budget was spent and how successful the marketing was. For our mattress marketer, there may be a quarterly management meeting in which the marketer reports the sales that resulted from the marketing campaign. The marketer may also use this opportunity to show data on the ads that worked best, the placement of these ads that generated the largest audience and sales, and so on.
Showing that the marketing was effective is crucial for most marketers, as it will help them to get the budget they need for success in the future.
4. Marketing optimization: Even after very careful planning and forecasting, things can always be improved. This is true with marketing too. Once a marketing campaign is up and running and you collect or gather data on how it is performing, there is an opportunity to optimize. And the more data you have, and the more you know about analytics, the better you’ll be able to fine-tune and optimize your marketing. As soon as you get some information on how your ads are performing, you can use that information to adjust your marketing.
Say for our mattress company, if you learn from the initial data that certain search engine ads are delivering more sales than others, you could decide to spend a bit more money on the ads that perform well, and less on the ads that don’t. If you can take these kinds of actions, you optimize how your money is spent. Data and analytics are really powerful that way, and can save you quite a bit of money.
5. Optimizing the sales funnel: Finally, there’s a specific task of making sure that people who want to purchase a product or have shown an interest can easily do so. Marketers will refer to that as optimizing the sales funnel. That is because they think of the sales process in a few steps. For instance, you could think of the sales process in four steps.
Awareness – Interest – Decision – Action
Awareness; or when a person first hears about a product. Then Interest; which is when a person is interested in the product and may try to learn a bit more about it. Then there is the Decision step when a person decides that they want to get your product. And then there is the Action step, when they buy your product.
Marketers described this process as a funnel, because as people go through this process, some people drop-off and don’t take the next step. So when you draw the group of people that go through the four steps, the groups get gradually smaller.
And when you put these groups together, it looks like a funnel. When marketers talk about optimizing the funnel, they really talk about reducing the number of people that drop out.
Say for our mattress marketer, it’s important that as many people as possible who are aware of the mattress become interested. And of the people who are interested, you want as many as possible to decide that they want your mattress. And then you want them to take action and buy your mattress. Since the marketer spends money on getting people to consider the mattress, they want as many people as possible to take all four steps and check out.
Data and analytics will help inform marketers about how healthy their sales funnel is, how many people drop out and don’t take the next step, and why that may be.
1.2.4 The Future of Marketing is Data
The future of marketing will be defined by analytics. Marketers must use data to more accurately understand who their customer is and predict what their future behavior will be.
Marketing is all about personal connection. It's about filling a need or desire a customer has with a product or service you offer, and providing value to that customer while doing so.
Marketing connects people with a problem they want solved to the business that can solve it. But too often, companies are simply guessing at who may need their product or service, and may advertise the features of their products without articulating any of the benefits to their customer. Or, they may just assume that everyone will want their product and never put the effort into identifying a target audience of those who will not only benefit from the product or service they offer, but who can become brand ambassadors in the future.
Additionally, consumers today expect personalized experiences with the brands they interact with, from unique online and brick-and-mortar experiences, to customized product offerings, to a personable voice on social media. Just as a blanket advertising approach doesn’t work for organizations, a one-size-fits-all experience doesn’t work for customers.
But marketers don’t just need better awareness about making personal connections with their audience, they need tools to do it as well. There’s a shift in looking at how well each product performs to how valuable each customer is, and marketers can only get to know their customer and create that personal connection with data.
With data, marketers can learn:
- what products each customer has bought in the past
- what social media content is resulting in the most purchases
- what keywords customers are searching for
- age, gender, and location
- what their interests are
- how to market to those customer in the future
This is why the future of marketing will be defined by analytics, or using data in order to more accurately understand who the customer is, what their past behavior has been, and what their future behavior will be. The future of business growth and success will belong to organizations who will use data in smart, insightful, applicable ways.
1.2.5 Marketing Trends for a Data-Driven Future
As we see an increase in data-driven customer connections, here are some trends we’ll see in the future of marketing.
Smarter use of data
Marketing teams will increase their use of data in order to learn more about their customers, which will create better targeting, personalization, and connection. More personalized marketing means more returns, as 80% of consumers say they’d be more likely to purchase from a brand that offers personalized experiences, and 72% say they only engage with marketing messages that are specific to their interests.
Growth of AI
There will be an increase in use of AI, machine learning, and algorithms in marketing, as it can predict patterns and make recommendations based on the insights it sees. Already, algorithms choose 70% of what viewers watch on YouTube and 75% of what viewers watch on Netflix, and recommendation algorithms drive 35% of sales on Amazon. Additionally, automation will increase as well, with Gartner predicting that “By 2023, autonomous marketing systems will issue 55% of multichannel marketing messages based on marketer criteria and real-time consumer behavior, resulting in a 25% increase in response rates.”
Shift to first-party data
As Google is phasing out its use of cookies, organizations will shift from third-party data to first- and second-party data, which will help them understand their customer more precisely.
Giving customers transparency into how data is collected will be a priority as well. According to Kevin Cochrane, the CMO of SAP Customer Experience, “To initiate a more trustworthy relationship, organizations must start by eliminating internal processes of acquiring third-party data. They must use only data that they have earned through explicit customer consent. … Moving forward, consumers should (and will) have full visibility into how extensively their personal data is being monetized.”
More online data
As the COVID-19 pandemic shifted life online, organizations suddenly saw an increase in new data they could collect about their customers, and the trend will only increase. “In the absence of the face-to-face, we had to lean in on digital, and that allowed us to have the information, the data, to then better serve our customers,” notes Kevin Warren, CMO of UPS. “This year really has revealed that strategic importance of analytics that maybe wasn’t quite there pre-COVID.”
Better budget optimization
As marketing teams are able to better target their customers, they’ll be able to better optimize their budgets. In wanting to optimize its marketing budget, DoorDash first looked at which ads brought in new customers
, drew channel-level cost curves based on the data, and created better ways to deliver. “Accurate, timely, and fine-grained attribution data is the key to understanding and optimizing our marketing,” they explain.
1.2.6 An Application
We are going to refer to an imaginary company, DCB Cleaning Services. They are a company that provides office cleaning services in the San Francisco Bay Area. James manages marketing for the company. Recently he presented an idea for a new product to his management, Snackwall. It’s a service where the company installs a snack wall in an office. Customers subscribe to the service and select their snacks and refills through an app. Management was very excited about his idea and developed the product. James was given a marketing budget to launch the new service. We’ll follow James and look at some of the decisions he needs to make.
1.2.6.1 Audience
James at DCB Cleaning is excited to bring Snackwall to market and he’s gearing up to launch. With a quickly approaching launch date and limited budget, James knows that it is essential to get the message about Snackwall in front of the right audience; people who may be interested in the service. Given his limited budget, he wants to make sure not to waste any of his money on people in offices where Snackwall would likely not be a good fit.
James needs to define his target audience. James believes that it will be best to promote Snackwall among the existing DCB Cleaning clients. But he does not think all the clients will be interested.
To get a better understanding of who may have an interest, James decides to create a survey. He divides his survey in three parts.
In the 1st part, he asks questions about the office space.
- How many people work in the office?
- Is most of the work desk work?
- Where’s the office located?
- How far is the office from the convenience stores?
- Does the office provide lunch, etc.?
In the 2nd part, James asks about the employees and the work they do.
- What is the average age of the employees at the location?
- What’s the education level?
- What industry is the company in?
In the final part of the questionnaire, James describes Snackwall and asks the level of interest in the product, from not at all interested to very interested. James’ survey goes out to all 300 clients of DCB Cleaning. It’s an online survey and 200 clients fill it out. Now that James has the answers to his questions, he goes to his marketing analyst, Alia, who had suggested to survey in the first place and who will help James analyze the results. Alia sorts all the responses in a spreadsheet. Then, she runs the segmentation analysis on the data. Segmentation is a technique we will learn more about later in this program. It’s a way to sort people in groups based on characteristics they have in common. In this case, Alia uses segmentation to get a better understanding of the characteristics that people who are interested in Snackwall have in common. After Alia runs her analysis, she finds that there are two distinct groups that have a very high interest in Snackwall. She presents her results to James. The 1st group she named “Focused Tech”. This group includes companies that have over 30 employees, provide lunch on-site, are active in the tech industry, and are primarily employing engineers and have a young workforce. The 2nd group Alia refers to as “Isolated Office”. This group includes companies with between 30 and 50 employees located outside of the urban areas with little access to food. They don’t provide lunch for their workers. They are in various industries and their workforce is young. Alia also tells James that the Focused Tech segment is quite a bit larger than the Isolated Office group. This information is super helpful for James. Using the information provided in the survey responses, James does a bit more work on describing these two audience groups. He plans to target his first advertising campaign to the Focused Tech audience. He will advertise to his current clients, but he will also use this information to find more companies that are similar to the audience in the Focused Tech segment.
The segmentation that James’ marketing analyst Alia did for him helped James define his target audience. He knows that narrowing his advertising to this target audience will help him spend his budget wisely since it will increase the chance that his ads will be seen by people who are interested in his product. For any marketer who’s trying to define their target audience, marketing analytics is crucial.
Segmentation helps to group the people interested in Snackwall and describe their common characteristics
1.2.6.2 Planning and Forecasting
A second crucial task where marketing analytics supports marketing is in the planning and forecasting phase. James has a budget that he can use to market Snackwall. He has identified a target audience, but now he needs to come up with a marketing plan. He needs to make sure that he makes the best possible use of his budget. On top of that, his management would like to get a sense of the expected sales. There are a few things James needs to consider in his plan. First, he needs to select the platforms he will use to advertise. James believes that social media would be great, but he needs to decide which platforms to include. Alia, who runs marketing analytics for DCB cleaning and the Snack Wall product, has access to comScore data, a database with information about the different online media people use, people’s browsing behavior and demographics. Alia shows James how the age group of the focused text segment he is targeting uses Facebook and LinkedIn a lot, more so than Instagram, for instance. James agrees that these would be good platforms for his ads. Now he needs to decide how much of his money should be spent in each platform, should it be 50-50? Alia advises against that. She shows James how based on the company’s advertising experience, they found that Facebook tends to be cheaper if you target your ads well, or in other words, if you put the right message in front of the right people. But she has also found that LinkedIn helps to generate awareness among professionals. Even if they see less sales coming from the ads directly, they tend to create good leads. Based on this historical information, Alia has created a formula that she can use to help James divide up his budget. James ends up putting 70 percent of his ad budget in Facebook and 30 percent in LinkedIn. Now, James has one more step to take. He needs to present his plan to his management and his boss expects to see a forecast of the sales. Alia suggests that James rely on the historical data they have about the results of the campaigns the company ran for its cleaning services. They both know that this product is different, but since the product is new, this is the best data they have. Alia shows James how she can use the data to predict the number of people that will click on his ads in Facebook, and how many of those people will then go on to subscribe to the service. She can do the same for LinkedIn. This sounds simple enough, but in fact, the model Alia uses to forecast is a bit more complex. It also involves factors like the types of ads James is planning to use, how many video ads, single image ads, etc. The time of year when the campaign will run to people, James will target and so on. All of this information helps Alia build a better model to help James forecast the sales for Snack Wall.
As you can see from this example, data and analytics play an important role into planning and forecasting phase. Of course, James could decide how to spend his budget without using analytics, but the more data and information James uses to plan, the better his strategy will be. Alia used what a company learned from the past, and that helps James to build his marketing approach and forecast sales. If you don’t have data from the past that can help guide you, marketing analytics will often use data they can purchase, like the comScore data about browsing behavior we referred to. In any case, using marketing analytics will make your strategy and forecasts better.
Historical data for similar events in the past can help predict events in the future and can thus guide the planning process
1.2.6.3 Evaluate Advertising Effectiveness
Marketing analysts play an important role in helping to evaluate marketing effectiveness. Or another way to think about this is whether or not the marketing budget was spent well. And an important part of that is evaluating the effectiveness of advertising.
Marketing analysts will try to answer the question: did the advertising campaign payoff? A good starting point to answer that question is by looking at return on Ad spent or ROAS.
ROAS is simply a way to find out how much revenue
you made on your advertising versus what you spent on it. The calculation for ROAS is revenue made from the ads divided by advertising costs of those ads. For example, if you spent $1 on advertising and made $10 off of that advertising, your ROAS as is 10.
Or can be expressed as 10 to 1 or 10x. So this advertising made 10 times what we spent on it, which would be pretty good.
Marketing analysts will use ROAS to compare how ads on different platforms perform. Or they may use it to compare campaigns they ran in the past with new campaigns and so on. All of this will help them evaluate whether they are getting the most for the advertising dollars they’re spending. Or whether there is room for improvement. Let’s go back again to James and his campaigns for SnackWall. After is campaign ran for two weeks, he finds that ROAS on his facebook ads is 12. This means that for every dollar he spends on advertising on Facebook, he sees 12 dollars in revenue for the Snack Wall product. That doesn’t sound too bad. But James knows that the advertising campaigns he ran on facebook in the past for the cleaning service had a ROAS of 23. So now James is a bit worried, he doesn’t think that the product is less attractive, but he thinks he could optimize his campaign. He decides to create new ads with more images and a clearer explanation of the service. Two weeks later, James sees that ROAS for his ads is up to 19. James feels a lot better about that. He thinks there’s still some room for improvement, but this is definitely going in the right direction. Marketing analysts will use different data and tools to calculate ROAS. Most advertising platforms provide detailed reports to calculate and track the effectiveness of ads. Of course, the example we walked through here is a bit simpler than what analysts encounter in real life, as there are a few more considerations that go into evaluating the effectiveness of ads. In this program, we will cover several techniques that you can use to evaluate whether your campaigns paid off. As you’ll find throughout these courses, ad effectiveness evaluation is a crucial task for marketing analysts
1.2.6.4 Optimize your Marketing Strategy
We discussed that a lot of planning goes into marketing from identifying the right audience, to creating the right message, to selecting the right channels for that message. But once you actually start your marketing, you get a lot more data that provides you with helpful information that you can use to adjust your plan and further optimize.
Actually I find that that’s the beauty of digital marketing in particular, as soon as you start running campaigns online, you get data. And you can use that data to change course and adjust in real time. That’s super powerful especially if you know marketing analytics. Here are some of the things marketing analytics can help with when it comes to optimizing your marketing campaigns.
First, if you’re running ads in different channels like Facebook, YouTube, Twitter, Google, and so on, you may find that as the initial data on the effectiveness of your ads come back that some of the ads work better than others. Some channels may be more effective for your message and for your target audience than others. That insight can help you adjust your marketing mix. You may decide to shift some of your remaining budget to the channels that perform best.
Second, you may see that within a channel your ads aren’t exactly delivering the results you were expecting. That may prompt you to change your advertising a bit or in other words, optimize within a channel. You could change your message or the ad creative you’re using or you may adjust who you’re targeting your ads to, in other words, the audience that will get to see your ad.
And third, while you make these changes in your ad strategy, you may decide to test your new ads or the new audience that you will target. You can do that by running your new ads as well as the initial ads and test them against one another. That way, you can see whether your new ad does indeed work better than the old ad. Testing is a powerful way to make your decision and adapt your marketing based on data rather than intuition. Let’s go back to James. Remember how Alia had suggested, based on historical data, that he put 70% of his budget in Facebook ads and 30% in LinkedIn. Well, after the ads ran for two weeks, James noticed that he got better results from LinkedIn than he and Alia expected. So, he decided to put a little bit of the Facebook budget into LinkedIn instead. And when James saw that the ads in Facebook did deliver less results than he had hoped for in the first three weeks, he decided to change the ads to include a clearer message and more images. However, instead of just stopping the first campaign altogether and switching to these new ads, James decided to run both ads and see which one delivered the best results. Based on the data, he learned that the new ad with a clearer message and more images was the better one. So he decided to put all his Facebook budget towards this new ad and stop the old version from running. This type of fine tuning your marketing strategy based on data can have huge payoffs. It helps you to get the most results for your budget. And that can mean big savings and a big difference in the payoff from your marketing. Because a larger share of the marketing budgets these days is spent on digital marketing, and because digital marketing makes this type of optimization possible, marketing analytics has become increasingly important over the past decade. Businesses have learned that if you let the data speak, you get a lot more out of the marketing budget.
1.2.6.5 Optimize the Sales Funnel
We discussed how marketers will pay close attention to the sales funnel and how they might split the purchase process in four different steps;
- awareness,
- interest,
- decision,
- and action.
As you can imagine, getting people to the awareness stage usually involves quite a bit of time and effort and often budget. If you are introducing a new product, for instance, you need to get a word out to an audience that may have an interest in your product. Often that involves developing an ad, buying advertising space, and so on. It’s no wonder then that marketers want to closely study what happens next. That’s where marketing analytics comes in to help them understand if and how people make it through the sales funnel. Specifically, they want to identify points of friction or points where people stop moving or leave the funnel altogether. Friction can happen at different points of the funnel. For instance, people may be aware of your product, but it may be hard to get them to become interested. That could happen if you just don’t get your message across or don’t explain the benefits of your product very well. That may mean you have to change your advertising message. You may also find that people who have become interested don’t bite the bullet and decide to buy your product. That could be because a competitor may have a more attractive offer, or they may have a better message and people may decide to buy their product instead. In some cases, you may see people who’ve decided to get your product, but they failed to take action, and that could be because there’s friction in the checkout process for instance, like on a website that’s hard to navigate or that does not work well on a mobile phone. By studying the data on the number of people that make it from one stage to the next, or by evaluating the flow of how people navigate your website, these points of friction come to light and that’s what shows a marketer where they should focus their efforts and adjust their marketing strategy, or in some cases, parts of the product or the sales process.
Indeed, anything that hampers the online checkout process or leaves the user with questions during the process can cause friction.
When James was evaluating the sales funnel for SnackWall, he saw that there was a substantial drop-off from the decision to the action phase. Of the people that made it to the SnackWall website, only 11 percent ended up enrolling and thus purchasing the product. While James figures that some drop-off is natural, he and Aaliyah did a little bit more digging. In fact, they turned to Google Analytics and evaluated how people flow through the different pages in the online checkout process. This is the type of report they were studying. We will learn more about these reports and Google Analytics later in this program. They saw that people who landed on the website did click on the “Enroll Now” button, but they did not submit their information when they landed on the next page. This puzzled James and Aaliyah and they decided to interview a few target customers. They interviewed five people and had them go through the checkout process with them. As they were going through the website, all five of them told James and Aaliyah that they did not feel comfortable providing their name and company information without a clear idea about what a SnackWall subscription would cost them. As a result, James and Aaliyah created a quick cost calculator. Instead of the “Enroll Now” button, they now have a “Learn More” button. That leads people to a page where they are asked how many employees they have and they get a quick calculation of the estimated costs of the SnackWall subscription. Then there is an “Enroll Now” button. After making this change to the website, James saw the enrollment percentage go up to 20 percent. As I’m sure you can imagine, James was glad he did some digging into the sales funnel data. Analyzing the sales funnel can highlight weak links in your sales and marketing process and give you clues as to what part of the process may warrant more research and where there may be opportunities for optimization.
1.2.6.6 Discussion: Friction in the purchase funnel
While browsing online or on social media have you noticed an ad campaign that caused too much friction and resulted in you abandoning the sales funnel.
Was it an aspect of the campaign that caused you to lose interest? If so, describe what caused you to lose interest.
Did a competitor have a better offer or perhaps a better message? Who and why?
Was their website difficult to navigate? If so, describe what made it difficult.
1.3 Marketing Data Sources
1.3.1 What Data do Marketers use
Marketing analytics can help you plan your marketing efforts, evaluate your effectiveness, and optimize your marketing and your purchase funnel, but none of that can be done without data. Data can come from different sources, can be of varying quality and can be helpful or unhelpful to you depending on what it is, but without data and the ability to understand and analyze it, marketing efforts can only go so far.
As you know, all our online interactions generate data, but depending on the device you use to go online, the way data is collected differs. You may have heard of cookies and pixels used to track online behavior for instance. It’s important to understand the mechanics of how this works as it will help you understand the nature of the digital data you’re working with and the limitations.
Then, we’ll walk through an example of how a marketing team collects digital data and uses it for their campaigns.
1.3.1.3 Sampled or non-Sampled Data
Learn what types of data are going to be useful to your business, what your best methods are for tracking data, how to interpret data points to provide you with insights and how to interpret raw data. One more item to be aware of is understanding if you want to use sampled or unsampled data.
Sampled data is exactly what it sounds like, a sample or selection of the larger data set that represents the whole of the data set. Why would you use sampled data? If you want to get insights about your customers, you’d ideally want to ask all of them, but often that’s not possible. So, you may take a sample of your customer base knowing that their insights and feedback will represent the whole.
For example, a movie theater launches a new self service kiosk for tickets and 1000 people use it on the first day. The manager wants to get a sense of how the experience went, but it’s too big of a job to talk to all 1000 people who used the Kiosk. What the manager would do is talk to a few users, or a sample, to get their feedback. The movie theater would get the data and insights they want from the sample without needing to ask every single person. From the sample, the movie theater is able to make inferences on how to leverage and promote the new self service kiosk.
Here’s another example: do you remember the media consumption data from Nielsen we refer to in an earlier video? This is a good example of a data set that uses sampling. To monitor people’s tv viewing habits, Nielsen recruits households whose tv viewing behavior it will monitor. These people agree to be monitored and they also fill out a questionnaire that provides Nielsen with demographic and interest data. Nielsen then uses the information they gather from those households to build a data set that represents the total population.
You may also use sampling when your data set is too large to handle, which would slow down generating any insights. Using a sample of the data would give you a more manageable data set with faster analysis time. And since the sample represents the whole, it should give you the same insights.
For example, a marketer collects customer data but finds that they have thousands and thousands of entries to analyze. Working with that many pieces of data will take a lot of time and effort yet pulling out a sample of that data, only 1000 entries or a few 100 for instance, would give the same insights, just at a smaller scale, and would be much easier to work with.
Marketers use this tool to monitor people’s browsing behavior on their website or app. If you use Google Analytics on a very large website that is visited by many people, then Google Analytics will use sampling, so it can give you access to your data and reports faster.
What is non sampled data? Well, it’s simply the whole population or data set in its raw state and there are certainly times when you use the entire population for your analysis purposes. For instance, if you’re looking for events that don’t occur frequently, a sample may not be a good way to go as you may not catch the event you were looking for, if you only look at a small selection of data.
You probably noticed when I talk about sampled data that I refer to the sample as a data selection that represents the total data set. It means that we assume that the characteristics of the sample are the same as the characteristics of my total data set. And thus it’s okay to look at the sample and draw conclusions about the total data set.
There are a few different methods you can use to get a representative sample. We will take a closer look at them later in this program when we dive into statistics. For now, I will just mention the most common method: random sampling, which simply involves picking members or entries at random. This can be done using a random number generator and gives everyone a fair chance of being chosen.
Selecting a representative sample is important. Let’s go back to our movie theater example and their self service kiosk. If I want to know what people think of the self service kiosk experience and I plan to take a sample of 20 people, it’s not a great idea to select the first 20 people in line. By doing that, I may, for instance, pick only families with kids who are catching the morning show of a Disney movie. The self service experience may be different for them than for the crowd that shows up later at night to see a romantic comedy. By randomly selecting 20 people throughout the entire day, I have a better chance of selecting a group that represents the full spectrum of theater visitors.
1.3.1.4 First party - Second Party - Third party Data
The different types and quality of data you may use as a marketer or analyst, as determined by how you collect it. If you ever had to write a report for school that involved research, you know that using primary sources like journals or firsthand accounts were always the best sources to use, because they got you the closest to what actually happened. Secondary or tertiary sources were fine, but weren’t as factual, or helpful as primary sources. The same is true when you’re working with first-party, second-party, and third-party data.
Not all data is equally as useful or helpful. So it’s important to know the difference between the three, how you obtained each, and when you would use each one to help in your marketing efforts.
We’ll start with first-party data, since that type of data is going to be the most valuable and relevant to you. First-party data is simply data you’ve collected about your customers directly from your customers.
- Rewards sign up
- Unique visitors
- Newsletter interests
- Ad click
- Purchase data
This can be through the offline and online sources we mentioned in previous videos, like tracking site visits, social media follows, or via sign up in store. Examples of first-party data could be the name and address of a customer who signs up for a rewards card at a store location. The amount of unique visitors your homepage received last year, the information a customer fills in about their interests when they fill out a newsletter sign-up form. The average age of Facebook users who clicked on one of your ads, or the customers who purchased a specific item last month.
Because first-party data directly reflects your audience or customer base, it will show you customers’ behavior and actions, purchase history, and how your audience engages with your brand. This kind of information can help you plan campaigns and new strategies, and continue deepening the relationship with your current audience. Additionally, because first-party data is data directly collected from your customers, you know that it’s going to be accurate, and provide a lot of rich insights.
But one of the drawbacks is that it’s only limited to your audience, so you aren’t necessarily getting insights from potential audiences, or able to engage with new customers.
Still, first-party data is going to be the most useful to you, and the most accurate representation of your audience.
Second-party data is essentially second hand data, or data you did not collect yourself. Think of it as another businesses’ first-party data that they’re sharing with you. Or, if you share your first-party data with another business, that would be their second-party data. Why would businesses willingly share data with one another? If two businesses cater to a similar demographic or audience, they may partner in sharing their data, which may give insights into potential new audiences or trends.
For example, there’s a coffee shop on one block, and a bakery on the next block, and the owners are friends. The coffee shop only focuses on coffee, no baked goods. But sometimes, people come in with bags from the bakery. Similarly, the bakery just focuses on baked goods, and doesn’t want to branch into coffee. But they often see customers come in with cups from the coffee shop. Even though there is some overlap, there are still bakery customers that may not yet have tried the coffee shop, but may enjoy it. And there are coffee shop customers that may not yet have tried the bakery. The businesses then share their first-party data with each other, so that each business can now target more people who may be interested in their products, and bringing more new customers.
Businesses may also choose to purchase second-party data they believe could help them in their marketing efforts. This approach is certainly easier, but unless you’re able to preview the data first, you may not be getting a hold of useful data for your business. You may also run into privacy issues, based on how the data was originally collected.
Finally, third-party data is collected not by you or another business you partnered with, but by a third party not directly linked to the end customer. That data is then sold to businesses who can use it to expand their targeting efforts. Third-party data is collected using similar approaches to first-party data, like through customer surveys, feedback, or tracking of online behavior.
But the data is typically collected through random sampling. In other words, it won’t be your particular audience, but across the general population. For example, a new restaurant opens in a neighborhood, and they want to target an audience within a specific zip code. They may buy data about everyone within that particular area. They, of course, wouldn’t know if the population on the list would even like their restaurant, or be in their target audience. But they can use the data to do an all encompassing campaign, or add the data to their current customer data to increase their reach.
When you buy third-party data, you may not have as much insight into how the data was collected. And the data may be incomplete, or of lower quality than first-party data.
Additionally, any organization can purchase the data, so it wouldn’t be uniquely yours. There may also be further privacy risk around how third-party data was collected as well. While third-party data may not be the primary data set a marketer uses, third-party data can be useful to round out first or second-party data, or for comparison.
So acquiring third-party data may be worth it. What data will be best for you? You will probably find reasons to use all three in your marketing efforts and planning. But it’s going to be best to emphasize first-party data, since that will give you the most accurate reflection of who your customers are, and provide you the most accurate insight into their behaviors.
1.3.2 Sources of Digital Data
There are different types of data marketers and marketing analysts work with, and how these days, a lot of data is related to the use of digital media.
We get data about different websites people visit, the way they navigate on websites, the products they purchase online, apps they download, ads they see, and so on. In order to get the most out of the data that’s available to you as a marketer, it’s important to understand how this data is collected.
First, we’ll go over how online interactions generate data. Then, we’ll take a look at data collection on websites using cookies. We will also look at the use of tags or pixels. Then, we’ll take a closer look at the use of software developer kits or SDKs for data collection for mobile apps.
We’ll also discuss the use of Platform APIs to help connect data that a company may already have to the advertising platforms they may want to use. Finally, we will look at the use of UIDs.
1.3.2.1 How Online Interactions Generate Data
To understand how data fuels digital marketing and how data and advertising are connected, it’s important to have a closer look at where the data comes from.
How does that come about when you visit content online? Let’s look at how interactions happen online.
As an example, let’s look at what happens when someone interacts with the publisher’s website.
Every publisher’s website starts as a blank canvas made of code and stored on a web server. Think of your favorite news site, for instance. In your mind, strip away all of the content. The shell that’s left is the blank canvas the website started with.
To fill this blank canvas with content, the publisher uses a tool known as a content management system, or short: CMS. Publishers use their CMS to store, create and manage content on their websites. So, imagine your favorite news site, they use such a CMS. It’s typically a system that makes it possible for many people to easily create and manage the content without needing to know how to code.
Or in other words, it’s an easy system to write a news article that you will see appear on the new site.
The publisher will leave some space on the website for advertising.
A separate server will place the ads. This server is referred to as the ad server
.
So note that two different systems handle the content and the ads. The ads come from advertisers. To get the right ad in front of the right people, the publisher sends a signal to connect with the advertiser’s ad server and retrieve the creative for the ad that needs to be displayed.
Again, when we think about our favorite news website, the people who are writing the articles that you see aren’t deciding on the ads that you see on the site.
The ads you get to see are coming in from the Ad server. To publish, this website will connect with the Ad server and the Ad server fills the advertising spaces. Now let’s look at what happens when someone accesses a website.
As soon as the person’s browser requests a web page from the publisher, some data about a person is sent to the publisher. That information is used to bring the right content to the person, but it’s also sent to the Ad server to make sure a relevant Ad is displayed.
So, data is exchanged between the person and both the publisher and the advertisers’ servers.
The publisher and the advertiser both store data. Every interaction like accessing a page, clicking on a link, clicking on an ad or making a purchase leaves a record. The publisher and the advertiser categorize and store some of that information to personalize the content and adapt the ads people see.
So we now know that a lot of data is generated as people interact with online content.
But what do we really mean when we talk about data?
As you probably know, every website is made of code. That code is stored on a publisher server. Every time you interact with the website, you tell its server which piece of the code to display. Every request you make for a piece of code or elements of the website, leaves a record in the server. That record is referred to as the web server log.
A web server log consists of strings of code like this one. This code may look foreign to us but it’s not too difficult to understand its components.
First, you see a series of numbers. That’s the person’s IP address
. It tells the server where to send the data. Next is a unique identifier
. This is how the server recognizes who is asking for the information. This identifier is typically pulled from the person’s browser and is usually a sequence of characters.
Note, that this is not personally identifiable information
also referred to as PII.
So, no names or physical addresses are stored here. Next, if the website requires the person to log in, there may be a user name here.
Next, there’s a date and a time stamp of when the information was requested from the server. After that is the string of code that identifies what information the person is requesting. This is how the server finds the right piece of data to return. This string of code is embedded in the links on websites so that you can click on a link and give an instruction to the server at the same time.
Next, you’ll see a number that tells us whether the information was successfully provided to the person. 200 means successfully delivered, while 404 means error.
Finally, another number reflects the size of the content filed a person received. While every interaction leaves a trace in the publisher and advertiser servers, the servers often also send some information back to the user and store it in their browser as a cookie.
1.3.2.5 Software Developer Kit (SDK)s for mobile Apps
So far, we’ve learned about several ways in which marketers can access data related to online behavior. We talked about web server logs, cookies, pixels, and tags.
Everything we described so far has been related to data collected from interactions with websites. As you know, a lot of the online user behavior happens on mobile devices, and a large portion of mobile activity is on apps. Apps are different from websites, and tools like cookies, pixels, and tags don’t work in the same way on apps. Of course, we can still get data related to app usage, but we need to use different tools to collect the data. That’s where SDKs come in. In this video, you’ll learn what that’s all about.
SDK stands for Software Developer Kit. You can think of it as a toolbox for software developers. The toolbox contains code developers can install to help create applications. In some ways, you can think of it as a library of ready-made code that makes the life of the developer easier. Instead of having to manually code every piece of an app, for instance, they can plug in pieces of ready-made code from an SDK to achieve certain functions.
Here’s an example: I’m sure you’ve downloaded apps on your phone where you were asked to log in with your Google or Facebook account. That makes it easier on you as you don’t need to create a new login. The developer of that app would have used an SDK to make that work. Instead of manually programming a way for you to log in, developers would use the Facebook SDK, which has code they can implement in their app to let you log in with your Facebook credentials. It’s a bit like copying and pasting code. The code inside the Facebook SDK is especially written to make it easy for outside developers to integrate their applications with the Facebook functionality. Many platforms like Facebook provide an SDK for a number of different functions. You could use an SDK, for instance, to allow people to use certain filters in their images, or to create a smooth check-out when buying a product, and so on. SDKs exist for all kinds of software development, not just for mobile apps, but they are definitely heavily used by mobile app developers. But what does this have to do with data about user behavior? Well, when you use an SDK from an advertising platform like Facebook or Google for instance, then you’ll have the option to connect data from your advertising to actions that happen in your mobile app. Imagine you have a gaming app. You may decide to use the Facebook SDK so your users can log in with their Facebook credentials. By having your users log in with their Facebook ID using the SDK, some data is sent back to Facebook. This makes it possible for you to see things like whether people who saw an ad on Facebook for your gaming app, download your app. And, you could even see whether these people are more likely to make in-app purchases, like purchasing extra powers inside your game, for instance. The code in an SDK could instruct your app to send certain information over to the platform that created the SDK. That data can then be connected to other actions marketers take using the platform, like advertising, for instance. As a marketing analyst, it’s good to know that SDKs are the way to go when you need to connect marketing platforms to your app. Most likely you won’t need to install any of this code yourself. But, it helps to know that you can ask your developers to build in some code that can help you track the connection between your advertising and the actions people take on your app.
1.3.2.6 Connecting Data Through APIs
You probably gathered from the previous videos that often, the data gathering we go through as marketing analysts has to do with connecting systems. That’s because we want to measure the effectiveness of advertising along the customer journey and across platforms. Marketers want to influence behavior, but the data related to the marketing and the data related to the behavior aren’t always easily connected. The marketing takes place using one system, like advertising using Google for instance. But behavior we want to influence takes place on other, a purchase on a website, an app download, a purchase in a store, and so on. Many of the systems we talked about so far will help marketers to make connections. But these systems often have dependencies that are a bit out of our control. We already saw how users can delete or block cookies, for instance. Or how browsers may prevent certain data collection. That’s why we’ll often rely on making more direct connections through the use of APIs. Let’s explore what those are. An API or an Application Programming Interface is a tool that establishes a connection between two pieces of software. An API allows two applications to talk to each other. The API is a little bit like a courier who transports information, requests and so on from one system to another. Think of something as simple as sharing a news article on your Twitter feed. You are on the news article and you would like to share it on Twitter. An API will ensure that your request to share goes to Twitter, who will then in turn make sure the article appears on your Twitter feed. APIs really fuel our online experience today. Loads of connections are being made constantly between different websites and different systems online. How exactly does the API play a role when it comes to data? Well, APIs make it possible to share data directly with certain marketing or advertising platforms. For instance, I can use the Facebook API to directly provide Facebook with data about what happened on my website, on my app, in my stores, and so on. Why does that matter? Well, I can send data to Facebook about purchases for instance. Facebook can then help me figure out whether the ads I ran on their platform led to those purchases. I can use the API to pass on information about website purchases, but I can also pass on information from in-store purchases. Through an API, I can establish a connection and send the data that’s relevant over that connection. Later in this program, we will dig deeper to see how APIs are relevant for marketers. To use an API, you would usually involve a developer who can help make the connection between your software and the software of the platform or tool you would like to connect with. Platforms like Facebook, Google, Twitter, and so on, will provide a developer with all the information and code needed to integrate the API.
APIs are an excellent tool for marketers to connect information from different platforms. They are especially powerful because they establish a direct connection between the publisher of a site or app, and the platform it wants to connect to without depending on other tools like browsers or operating systems. As a marketing analyst, using APIs to connect information from different platforms will give you more reliable information and prevent broken data connections.
1.3.2.7 Use of UIDs
UID stands for Unique User ID. It is a unique text or number string that identifies a person and it is created when the user logs in. So there are Facebook UIDs, Google UIDs and many others. There are also device IDs associating a unique person with a device like a phone, for instance. When you create an account on an online platform like Facebook, Google, Amazon and so on. You provide them with some information and you create a log in, a user ID, and a password. Let’s say you create an account with Google for example. Google will create a text or number string to associate with your unique account and link that string to the information you provided. This text or number string is your unique ID, in this case your Google ID and it will help the company associate data with your account. Now, no matter through which browser or device you access Google, all your behavior can be associated with that ID as long as you’re logged in. That’s super helpful as it enables the platform, in this case Google, to connect what you do on your computer with your usage on mobile for instance. So anywhere you use your Google log in, your behavior can be stored and linked to your Google ID. For publishers or platforms that use UIDs, it helps them to overcome some of the challenges they face when they use cookies. Remember that cookies are stored in the browser but sometimes people use different browsers or they access a site on a computer and later again on a mobile phone, for instance. The UID makes it possible for publishers to link behavior from a person on different browsers and on apps so that it’s possible to get a clearer and more complete picture of the users across platforms. With a UID, interests, behaviors, demographics and other information can be stored. Anytime a person logs in, behaviors can be associated with them. In the introduction to this video, I also refer to device IDs. These are also important for marketers. In both iOS and android, an ID is stored in the settings of the device and that ID can be accessed by advertisers if the user allows it. When users interact with advertising in an app and they have opted in to share their device ID with advertisers, it makes it possible for the advertisers to advertise to you across different apps. That way, they can show ads and track behavior on one app, like Instagram for instance, and then use that information to advertise to you while you’re using a different app. This type of tracking used to be a default setting in most phones. Nowadays, apps on Apple devices have to prompt their users to ask for permission to track their behavior and use that data across apps. You may have seen this message pop up on your phone when you are opening an app. It specifically asks you whether you are okay with this app tracking you and sharing your data with advertisers. This is a recent effort by Apple to protect users’ privacy. It’s important to note that UIDs use a number or text file, but they don’t store personally identifiable information. Personally identifiable information is any information that would help someone to identify a person, like your name for instance. You will hear marketers also refer to this as PII. Actually these days, most companies go through a lot of efforts to protect your personal data. Companies that work with online data will often use data hashing for that. In data hashing, the original data item gets translated into a hashed data item by applying an algorithm. As a result, what’s stored is unrecognizable unless you have the hashing key, another algorithm that lets you reverse the data back to its original form. This is an extra safety measure to help keep people’s privacy intact. Now you know the main ways in which behavioral data is collected online. We looked at web server logs, cookies, pixels or tags, SDKs and APIs. And now you also understand what UIDs are. That was a lot to cover, and these concepts can be a bit confusing at times. Don’t worry. We’ll repeat them again in other parts of the program and gradually their different applications will become increasingly clear. In the next lesson, we’ll go through an example of how a company might use all of these different ways to collect data. It’s a good opportunity to further practice these concepts. I’ll see you there.
1.3.3 Collecting Data for Marketing: Application
1.3.3.1 Intro
We have looked at web server logs, cookies, tags or pixels, SDKs, APIs and we have also talked about UIDs.
Each of these methods can help you achieve slightly different things and I think they can be quite confusing. To do that, we’re going to take a look at another fictitious company, Calla & Ivy. Calla & Ivy is a flower shop in Amsterdam; Imra is the owner of the store. She always loved fresh flowers, but she is particularly known for her handbound bouquets. A few years ago Imra started selling these bouquets online and her business has expanded quite a bit as a result. She now employs a few people who help her focus on the website and they also help her with her marketing. Recently, Imra introduced a flower subscription service. Subscribers can schedule monthly deliveries of bouquets to their home. In the next video, we’ll explore how Imra and her team collect and use data to help them market their products.
1.3.3.2 Collecting Data
There are different ways in which data is collected online. Different methods serve different purposes. And in this video we’ll walk through an example of how these different methods can be used in a real life scenario.
For the scenario, I’ll refer to our florist in Amsterdam, Calla & Ivy. Imra, the owner of Calla & Ivy, is introducing a new product, a flower subscription. People can subscribe on the website or on the Calla & Ivy app to receive a flower delivery every month. It’s a way for people to get the fresh flower bouquets Imra is known for delivered on a regular basis and add some seasonal color to their homes. As the first step in making the product available, Imra and her team created a landing page on the Calla & Ivy website where they explain what is included in the subscription and where people can enroll.
When people click to enroll, they provide all their details for shipping and payment. It’s important to Imra and her team to track the appeal of this new product. She wants to know how many people check out the landing page and how many people subscribe.
In theory, she could get this information from her web server logs. Remember, the web server logs keep track of all the interactions between people and the website. But in practice accessing and going through those logs it’s not easy or practical. Instead, the team at Calla & Ivy tracks website traffic and website behavior using Google Analytics.
For Imra and her team to use Google Analytics to see the interaction of people on their website, they first need to add the Google Analytics tag to the website. This tag or pixel is a piece of code that gets added to every page of the Calla & Ivy website, which sends information over to Google Analytics. The information sent by the tag includes things like how many times the landing page for the subscription service was viewed, how many people clicked on the subscription button, how many people filled out the subscription information and so on.
Google Analytics sorts all this information in neat reports that help Imra to understand whether the new product is a success and whether its popularity is growing. From studying the Google Analytics reports, Imra learns that many people visit a landing page but far fewer subscribe.
After discussing this with her team, they decide that it may be a good idea to present people who hesitate with a coupon on their next visit, making their first bouquet free when they subscribe. To make this work, Imra decides to use a cookie.
Now, when people access the Calla & Ivy website, a cookie or a piece of formatted text, is added to their browser. It stores information about a user’s visit to the site and it stores whether people accessed the page that describes the subscription. Now, when people leave the site and come back at a later point, the cookie will help recall that this person had already been to the site and had shown an interest in the subscription. If that is the case, a large overlay with the coupon comes up on the page the user is visiting, encouraging them to subscribe and get the free bouquet.
Calla and Ivy also has an app that makes it easy for people to order flowers or subscribe, monitor and manage their subscription. Of course, Imra wants to understand the behavior of people that use the app as well. So, her team installed the Google Analytics SDK.
Remember that an SDK or software developer kit is a library of different pieces of code that you can integrate in your app to make certain functions possible. By installing the Google Analytics SDK, the interactions of people with the app can be sent to Google Analytics so Imra can get a full picture of all the online usage, whether on the app or on the website. On the website, she can use the Google Analytics tag, but on the app that doesn’t work, hence the SDK.
For those people who live in Amsterdam, there’s a good chance that they dropped by Imra’s physical store to buy flowers. In some cases, people may learn about some of the new seasonal bouquets from ads the marketing team is running on Instagram. Instead of clicking on the ads to buy online, they decide to buy the flowers in person. Imra’s marketing team is eager to connect the dots here. They would really like to know how effective their ads on Instagram are and they want to count all the resulting purchases, not just the purchases that happened online. That’s where the marketing team relies on an API, or a system that lets them connect two pieces of software.
In this case, they connect their in-store customer management software to the Facebook API. That way, every time a purchase is made in-store, data can be sent to Facebook. Based on some information that may be known about the purchaser, like the email address for instance, it may be possible to link the purchaser to a known Facebook user Id. Imra’s team can then assess whether the purchaser saw an Instagram ad and may thus have been influenced by that ad to make a purchase. This information is important to the marketing team to have access to, so they are able to better understand their ads’ effectiveness.
This is just a brief view into what may go on in a company and its marketing department on a daily basis. And as you can see, data plays a crucial role in many tasks that get executed on a regular basis. Marketers rely on different methods to get to the data they need.
1.3.3.3 Implementing the Facebook Pixel, SDK and API
Implementing Data Collection Tools
In previous lessons, we’ve learned about the many different types of data that can be collected, from the different websites people visit to the ads they see along the way. Just like there are many different data points, there are various ways to collect data, each with their own features and purposes. Although much of the information these tools collect and organize for you can be found on the web server logs for your site, that’s not always easy or practical. In this reading, we’ll take a look at three tools that will help you collect data.
Facebook Pixel
Pixels, also referred to as tags, are used for tracking, measurement and advertising. As mentioned in the introduction section above, it’s not always easy or practical to look at data from your web server logs or even cookies. Luckily you can work with companies to help you track user behavior or advertise products. A pixel is a small piece of code that you can add to your website that instructs it to send some information to an identified third party, in other words, these companies looking to help you utilize your data. The Facebook pixel is one example. Here, the information is used to help connect advertising on Facebook with actions taken on the site, allowing you to check how effective your advertising on Facebook was.
SDKs
An SDK, or software development kit, is a library of pieces of code that you can integrate into your app to add certain functions. Where on a website you can use a pixel or tag, you would use an SDK for your app. The SDK sends the information about people’s interaction with the app to, for example, Google Analytics where both website and app information can be aggregated. A great example of an SDK that you might see every day is an app that asks you to sign in with your Google or Facebook account. For your website, you might use an SDK to create a smooth checkout experience for customers, as well as, of course, tracking various data points of your browsers’ activity. If users have signed into your app using their Facebook account, for instance, you can also see whether an ad they saw on Facebook inspired them to download your app and make in-app purchases.
APIs
An API, or an application programming interface, is a tool that establishes a connection between two pieces of software. Remember the example of sharing a news article on Twitter from the previous lesson? In the same way APIs make it possible to share your article, they make it possible to share data directly with certain marketing or advertising platforms. This is useful because you can then use these connections to learn more about the results of your marketing activities. For example, you can send purchase data to Facebook, which can then help you figure out whether the ads you placed on their platform lead to the purchases.
Implementing the tools
How to do it
Using these tools is often as easy as integrating them into your website’s already existing code. Most of these tools, for example the Facebook pixel and the Google Ads Remarketing, have the code readily available in the help documentation or other information for web developers. This allows you or your content developers to install the code easily so you can start tracking.
Why they’re useful
Implementing tools such as these can allow you to encourage browsers of your website and app to purchase, subscribe and more. The Facebook pixel, for example, can also create custom target audiences consisting of people who have engaged with the website and who you would like to target with more specific advertising messages.
These tools also allow you to integrate your website experience with that of your app. For example, you can use an SDK to instruct your app to send certain information over to the platform that created the SDK, and that data can then be connected to other actions marketers take using the platform, like advertising.
1.3.3.4 Review
First, you learned about different sources of data marketers use. You now know that marketers use both offline and online data. And you also understand that in some cases, it isn’t realistic to work with all the data for a particular event, in which case you can turn to sampling. You also learned what marketers mean when they talk about first, second, and third party data.
And you know which tools marketing analysts use to collect data about online user behavior. You learned about browser cookies used to collect web browsing behavior data, pixels or tags to collect event data on specific websites, SDKs for data on app usage, and you learned how APIs are used to connect data from different sources to online platforms.
And finally, you saw how Imra at Calla & Ivy uses all these tools to collect the data she needs for her marketing. Now, you know which data sources you can tap into for different data needs, and you know which tools to use to collect online data for different use cases. Given how many different data sources there are, knowing when different data collection tools apply is incredibly helpful. Now, you’re ready to start looking into the tools you can use to categorize and analyze all that data. That’s what’s next. See you there!
1.3.4 Analyzing and Visualizing Data
Now that you know about the importance of marketing analytics and the many sources of data collection, it’s time to learn about the tools of the trade.
These are the software applications and techniques used by marketing analysts all over the world every day.
In our first lesson, we’ll look at some common tools that are used to analyze and visualize data. We’ll talk about spreadsheets and some popular visualization tools.
Then in Lesson 2, we’ll take a closer look at specific tools analysts use to evaluate online data, like Google Analytics for instance.
Finally, in Lesson 3, we will look at common ways in which marketing analysts evaluate the success of their marketing campaigns. We’ll look at specific reports that are provided by big marketing platforms like Facebook Ads Manager or Google Ads for instance.
At the end of this week, you’ll have a good idea of the tools that marketing analysts use every day. Later in the program, we’ll cover them in lot more detail. But it’s always good to start with the big picture. Marketing analysts have a suite of very powerful tools at their disposal, and I hope some of them will pique your interest. Let’s get started.
1.3.4.1 Spreadsheets
Spreadsheets are a staple for marketing analysts everywhere. They are the most basic way to access, sort, categorize, report results and even run analyses.
We will dive deeper into spreadsheets later in this course, but for now we will introduce you to the basic concepts. Today we’ll cover labelling, sorting and filtering, calculated cells, and visualizations.
Before we start, there are several different software programs that allow you to access and manipulate spreadsheets, but the two main programs are Microsoft Excel and Google Sheets. Both programs work well and are similar enough that if you can use one, you can use both.
Microsoft Excel has more features, but Google Sheets is a free program. No matter which you choose, the basic format is the same.
1.3.5 Tools to Evaluate Digital Data
1.3.5.1 Website Data
Your website is one of the biggest assets your business will have. New customers will learn about you through your website and old customers return because they’re engaged with your brand.
Your marketing efforts, both organic and paid, are going to funnel to your website. Even if you have a brick and mortar store, you’re probably also generating sales on your website. But many businesses today are online only and do all of their business through their website. This means that you need to thoroughly understand how users interact with your website. This allows you to optimize your site content to not only provide the best value and experience for your visitors, but to also maximize your conversions from signing up for a newsletter to a sale.
Who is your audience?
How do they get to the website?
what content do they engage with?
Are they staying or leaving?
This is why analyzing data associated with your website can help you with your marketing efforts. Marketers need to know where to find website data and analytics, how to read them, and how to glean insights from them, so they can make data-backed decisions on whether to change marketing efforts going forward, or whether what they’re doing now is paying off.
Google will provide you with some tracking codes to add to your website, also known as the Google Analytics global site tag, to collect data from your website. Then you can simply log into the Google Analytics dashboard to view metrics. Once the code is embedded in your site, anytime someone visits, Google Analytics attaches a unique identifier to the user, usually in the form of a cookie and tracks their movement throughout the site. Each uniquely identified visitor is called a user and each time they visit your site is called a session.
1.3.5.2 Terminology
A/B Testing: A method of testing two different versions of a web page or mobile app to determine which version performs better with users.
Abandon/Abandonment: When a user or customer leaves an action on a web page uncompleted and doesn’t return in the session, like a contact form or a purchase.
Acknowledgement/Thank You/Receipt Page or Pop-Up: A page or pop-up that displays after a user has completed an action, like completing a form or submitting a purchase. This signals the end of a conversion event that may be tracked.
Acquisition: The different ways in which customers find a website, whether it be by entering a URL directly, through a social media link, through an email link, through another website referral, or other methods.
Ad Click: The act of clicking on an ad on a webpage. Ad View: The act of viewing an ad on a webpage.
Audience: The visitors that come to a website, and their demographics, interests, location, behavior, and more.
Banner Ad: An ad that is embedded on a webpage. It can present as full width, within the sidebar, or as other sizes on the page.
Behavior: The way in which a visitor moves through a website.
Benchmark: A standard measurement for a metric — for example, the average number of monthly visitors — against which to measure other measurements of the same metric. Can be within a site or across an industry.
Bounce Rate: The measurement of the percentage of visitors to a website that leave the website after viewing only one page, and don’t navigate to other pages within the site.
Cascading Style Sheets (CSS): A style sheet language for websites that dictates the site-wide styles and presentation.
Click Through/Click Through Rate (CTR): The act of clicking on an ad to access a website or landing page. The rate is determined by dividing the number of clicks an ad received by the number of times the ad was shown.
Content Management System (CMS): Software that manages the contents and back-end of a website.
Conversion/Conversion Rate: When a customer completes an action, like makes a purchase, subscribes to a newsletter, or other actions. The conversion rate is the percentage of visitors to a website that complete a conversion.
Cookie: A piece of data used to track a visitor’s behavior throughout a website. Cookies are used to track and store information about a user’s online behavior and preferences. They play a significant role in digital marketing by providing insights into user interactions with websites and helping marketers deliver personalized and targeted content.
Cost Per Click (CPC): How much a business pays for one click on their ad.
Cost Per Mille (CPM): How much a business pays for their ad to be seen by 1000 (mille) people.
Crawl/Crawler: A method by which search engines “crawl” the internet, and read pages for indexing.
Creative: The contents and design of an ad.
Demographics: The age, gender, location, and other individual identifiers of a website’s visitors in general, or target audiences in particular.
Direct Referral: One of the ways a visitor accesses a website, by typing in the URL directly into their web browser or accessing the URL via a bookmark.
Domain: The web address of a website, specified by the name in the URL. eCommerce: Selling products or services online only, as opposed to in a brick-and-mortar store.
Entry Page: The first page a visitor sees when they get to a website (i.e., the entry point).
Exit Page: The page from which a visitor exits the website after navigating through the site.
Hit: Anytime any image or files is accessed from the web server. Not to be confused with a page visit (i.e., there could be multiple hits per one page view).
Home Page: The main or starting page of a website, typically located at the root URL.
Hypertext Transfer Protocol (HTTP): A protocol for the way data and hypermedia is transferred between a web server and a web browser.
Hypertext Markup Language (HTML): A programming language that communicates text formatting and hyperlinks on a web page.
Impression: One view of a piece of content, whether it be an ad, a social media post, or some other post or call to action.
Inbound Links/Back Link: Links from other websites into a website, which are evaluated by search engines when ranking.
Internet Protocol (IP) Address: A unique address that identifies a computer or other device connected to the internet.
Keyword: A word or phrase that not only describes the contents of a particular website, but that search engine users can search that will result in that website appearing in the results.
Landing Page: A single webpage used to detail a product or offering (typically different from a home page).
Link Referral: A way by which visitors find a website through links from other websites.
Load Time: The time it takes for a web page to load in the browser.
Meta Tags: Text that can be added to a page via hidden HTML to help with search engine ranking.
Navigation: The way in which visitors move throughout a website, via a menu or navigation bar.
New Visitor: A visitor who is accessing a website for the first time, and has no previous sessions.
Organic Traffic: A way in which visitors find a website via organic means, like through a search engine, organic posts on social media, or via non-paid links.
Outbound Links: Links going out to other websites from a website.
Page Duration: The time spent by a visitor browsing one page.
Page View/Page Views Per Visit: One point-in-time access of a webpage by a visitor, or one rendering or request of a web page. Page Views Per Visit is how many pages were accessed by one visitor during a period of time.
Page: A static page or full HTML document on a website.
Paid Referrals: A way to access a website by a paid referral via an ad, sponsored content, or other paid call to action.
Path: The way a visitor travels through a website.
Reach: The number of people reached by a specific ad, social media post, or website; typically a size of an audience.
Redirect: Sending a visitor from one URL to another to find the page they’re looking for; too many redirects can reflect negatively on a website’s rankings.
Return on Ad Spend (ROAS): Evaluating how successful an ad campaign is by seeing how cost effective it was.
Return Visitor: A visitor who returns multiple times to a website, identified either through cookies or through a log-in authentication.
Sampling: A selection of an audience that represents the whole, to use for data analysis.
Search Engine Optimization (SEO): An approach by which a website can better optimize itself for higher rankings and search returns by using keywords, header tags, links, and more.
Search Engine: A service that allows users to search the web for keywords and phrases, and returns both paid and organic website listings from their indexing that matches the search.
Session: The time in which one visitor browses through a website.
Site Content: A list of all content, including pages or other assets, that are contained within a website.
Site Performance: The overall success of a website in terms of conversions, page visits, content views, and more.
Site Search: The search within a website (not a search engine).
Site Traffic: The number of visitors that access a website over a given time period.
Social Referrals: A way in which visitors find a website via social media links.
Stickiness: The ability of a website to keep visitors from leaving, and to continue to navigate within the site instead of exiting.
Traffic: Visitors to a website.
Uniform Resource Locator (URL): A specific, named address with which to identify and access a website.
Unique Visitor: One specific visitor to a website, identified by a cookie, authentication, or IP address.
Visits: An individual visit to a website or a web page; may not be a unique visitor, but the number of visits (of which a unique visitor may be counted multiple times).
Web Analytics: The measure of visitors on a website, which pages they visit, how they accessed the website, and more, in order to give insight into site success or improvements. Web Analytics Dashboard: An online dashboard (via Google Analytics or a native web platform) that displays all visitor and usage metrics for a website.
1.3.6 Data and Privacy
Access to data is important for marketers. It makes it possible to deliver the advertising messages to the right audience, optimize marketing campaigns, and measure the outcome of marketing action. But we shouldn’t forget that all that data is related to user behavior and while users may be okay with sharing some data, there are limits.
1.3.6.1 Consumer Perspective
Our online experience heavily relies on advertising, with 86 percent of the media consumed in the US being supported by advertisements.
Many of these ads are customized to individual preferences using data provided by users like you and me. This data is instrumental in making inexpensive and free content available online. However, it’s important to recognize that not all data is equal in this context.
Advertising serves as the cornerstone of the majority of our online interactions and typically either partially or fully funds our digital experiences. This business model is not new; it has long been prevalent in various forms of media such as newspapers, magazines, TV, and radio. While users may pay for access to these platforms, advertising remains the primary source of revenue supporting content creation.
While some of us might wish for an ad-free experience, the reality is that without advertising, accessing content would likely come at a cost or be less freely available. Many individuals have come to accept ads as an integral part of their online experience. As you browse the web, you’ve probably encountered ads that felt tailored to your interests and others that seemed irrelevant. The spectrum of online advertisements ranges from captivating to irritating.
Often, the ads that resonate most with us are those customized to our specific interests. Surprisingly, these personalized ads also tend to be the most effective. However, for some individuals, even these tailored ads can evoke a sense of intrusion.
As we’ve come to understand, personalized ads tailor their content to our individual interests by leveraging data collected from our online activities and privacy settings.
A clear differentiation can be made between data consciously provided by consumers—such as when filling out interests or demographic details during the sign-up process for social media platforms—and data derived from individuals’ browsing habits, which is often collected unintentionally. It is this latter type of data that consumers are typically most concerned about when considering their online privacy.
Based on a survey conducted by the Pew Research Center, 72% of Americans express a sense of being constantly monitored by advertisers, technology firms, or other companies while engaging in online activities or using their cell phones. Additionally, a significant 81% of Americans believe that the potential downsides of data collection by companies concerning their personal information outweigh any benefits.
Seventy-nine percent of adults express concern, to varying degrees, about how companies utilize the data they gather about them.
A study published in the Harvard Business Review, discovered that consumers exhibit less apprehension when companies utilize information directly provided by them, in contrast to conclusions drawn from their browsing activity. The distinction between engaging, personalized advertisements and intrusive ones is exceedingly delicate.
As we embark on our journey as a marketing analyst, it’s crucial to prioritize consumer privacy concerns. While access to data in advertising is undoubtedly advantageous and enhances marketing effectiveness, it’s equally essential to maintain a balance between data utilization and consumer trust.
Consumers are increasingly aware that their online activities are being tracked, yet they often feel uneasy about the sharing and utilization of their browsing behavior data for advertising purposes. This discomfort may prompt some consumers to take action in response to these concerns.
As a marketing analyst, being aware of how consumers can assert control over their own data and content is essential, as it provides insight into the limitations of the data available for analysis.
Consumers have access to several tools that enable them to restrict the information they share online and control how it is utilized. Three of the most prevalent and easily accessible tools include ad blockers, cookie blockers, and VPNs (Virtual Private Networks). These tools empower consumers to manage their online privacy effectively.
Ad blockers
are software designed to prevent advertisements from displaying on webpages. Typically, these tools are browser plugins that users can install to block ads while browsing.
For instance, two popular ad blockers are Adblock Plus and uBlock Origin, both of which can be easily added to commonly used web browsers. While consumers enjoy the ad-free browsing experience these tools offer, it’s important to note that a significant portion of content publishers and creators rely on ads to fund their work.
Both Adblock Plus and uBlock Origin offer the option for users to whitelist
specific sites, allowing ads to be displayed on those sites while still blocking ads on others. Some publishers and websites are transparent about this process and directly request users to disable their ad blocker or whitelist their site to support ad revenue.
Another tool consumers can utilize to control the data they share online are cookie blockers.
Similar to ad blockers, cookie blockers are browser plugins that prevent data from being stored through cookies. Unlike ad blockers, which primarily focus on blocking ads, cookie blockers limit or entirely prevent the collection of data about users’ browsing behavior. This reduces the amount of information advertisers receive about users and their online habits, even though users may still see ads tailored to their interests.
Two popular cookie blockers are Privacy Badger and Ghostery, both of which are browser plugins that users can add to enhance their online privacy.
As mentioned earlier, the landscape of cookies and their utility is evolving alongside changes in browser settings and user acceptance. Consequently, the perceived value of cookies to advertisers is shifting, and over time, consumers may find less necessity in blocking them.
Another privacy tool consumers can utilize is a VPN, or virtual private network. A VPN anonymizes all internet traffic leaving a user’s device by routing it through an intermediary server. This effectively conceals the user’s IP address, location, and personal information. However, it’s important to note that VPNs may still allow for tracking based on browsing habits and patterns.
There are numerous VPN options available, including both free and paid services. Some VPNs are integrated directly into devices and browsers. Despite the variety of options, the fundamental functionality of all VPNs remains the same.
Next, let’s delve into an overview of some settings consumers can adjust on their devices to enhance their privacy. If you’ve ever explored the settings on your phone, you’re likely aware of the numerous specific settings and privacy preferences available on all major devices.
For our discussion, we’ll focus on three overarching concepts: location, tracking, and permissions. Consider the multitude of modern devices capable of tracking a consumer’s location, including cell phones, tablets, laptops, and smart-watches. On each of these devices, users have the ability to control how, when, and to whom their location data is accessible. Typically, these settings can be found within the location services menu.
Another setting consumers have some control over is how and by whom they are tracked online. While these settings may vary across platforms, in Apple’s iOS, for example, users can access a tracking menu under the Privacy settings. This menu enables users to specify which apps are permitted to track their activity.
Permissions, as an umbrella term on most devices, allow users to fine-tune what information or functions apps and websites can access. For example, you may want your favorite photo-sharing app to access your camera, but you might not want to grant the same permission to a grocery shopping app.
Similarly, various online platforms offer a range of privacy controls. On social media, users have control over their content, communication, and interactions. They can typically delete posts, disable incoming messages, and customize the personal information visible to the public. These specific controls complement the platform’s privacy policy, offering users greater autonomy over their online presence.
Websites and apps that collect data about users are required to have a privacy policy, typically accessible alongside the terms of service. Social media platforms recognize the importance of consumer privacy and often provide tools to facilitate user control.
For instance, Facebook offers a privacy checkup tool within its app, making it convenient for users to manage their settings. Despite advertising being a significant component of social media platforms, users can still determine the extent to which their data is used for advertising purposes. These controls are usually found in personal settings.
As consumers online, individuals have considerable control over the information they share. You might wonder why a course on marketing analytics would emphasize ways to block or control data sharing. While marketers generally prefer access to more data, they also recognize the importance of maintaining user trust.
Understanding the limitations and gaps in the data available for analysis is crucial for a marketing analyst. Recognizing these limitations resulting from consumer controls provides insights into the boundaries of data accessibility and helps maintain transparency and trust in marketing practices.
1.3.6.2 Advertisers Perspective
Now, let’s shift our focus to the advertiser’s perspective and examine what it means to be a responsible advertiser when working with consumer data.
First, we’ll delve into the concept of responsible advertising and the obligations advertisers have towards consumers when utilizing their data.
Then, we’ll explore the advertising ecosystem, highlighting the various parties involved in handling user data and their shared responsibilities.
Lastly, we’ll discuss the ownership and management of data, shedding light on who ultimately controls and oversees the data used in advertising practices.
This shift in perspective will underscore the importance of data in advertising and elucidate the ethical responsibilities that advertisers bear when leveraging consumer data.
Since the early 2000s, advertisers have made a significant shift in their approach. They’ve moved away from relying solely on contextual advertising, which involves placing ads on pages or in locations with relevant content. Instead, they’ve embraced data-based advertising, which targets consumers based on their individual interests and behaviors.
To illustrate, imagine a contextual advertising scenario where an ad for running shoes appears on a website dedicated to running enthusiasts.
However, with data-based advertising, the targeting becomes much more refined. For instance, an ad for running shoes could pop up on a news website, but it would be shown specifically to a consumer who has previously shown interest in running. In this model, the context of where the ad is displayed becomes less critical than ensuring it reaches the right audience.
The effectiveness of data-based advertising relies heavily on the availability of data. Without access to information about consumers’ interests, habits, or needs, advertisers would be limited to using only contextual advertising strategies.
Data-based advertising is highly effective, and advertisers are eager to utilize data to target their ads to the right audience. However, when advertisers handle and utilize data, they bear the responsibility of providing consumers with three key elements: value, transparency, and control.
First and foremost, it’s crucial for consumers to perceive the value in allowing their personal data to be used for advertising purposes. This could manifest through access to inexpensive or free content, personalized experiences, or innovative marketplace offerings, all of which enhance the consumer’s overall experience.
Secondly, advertisers must prioritize transparency regarding the origin and usage of data. For instance, if you’ve recently encountered a personalized ad, you might have noticed additional information indicating why you’re seeing that specific ad. An example of this is the ad choice flag, provided by the Digital Advertising Alliance (DAA), a consortium of advertisers striving to enhance transparency in data-based advertising. When users click on this icon, they gain insight into how and why they’re being targeted with that particular ad. This initiative by the DAA represents a genuine effort to make advertising practices more transparent and consumer-friendly.
Finally, an advertiser has a responsibility to highlight consumer’s control over their data. Consumers might be fine with their personal data being used in one situation, but as we’ve covered in an earlier lesson, they worry about losing control over who else might have access to that data. Consumers own their personal data and reminding them of that control might actually be a win-win. Research shows that when consumers are reminded about the controls they have over their personal data, they’re actually more likely to engage with an ad.
To summarize, advertisers are expected to provide value, transparency, and control to consumers when working with the data users provide them.
In this lesson, we’re going to explore the four major components of the advertising ecosystem and how personal data and consumer privacy reef through it.
The advertising ecosystem can be broken down into four sections:
- consumers, - advertisers, - publishers and partners, - regulators and gatekeepers.
These four groups all interact with each other in different ways and they create a complex web that’s at the core of data-based advertising.
Let’s look at these four groups individually and how they influence the ads you see as a consumer. 1. consumers: First, the consumer is the person that’s most important to a business. We’ve already covered in previous lessons what a consumer is comfortable sharing and how an advertiser can responsibly use that shared information.
advertisers: Next, on the other side of the advertising ecosystem web is the advertisers. We’ve broken down the many ways that data can be useful to advertisers to make ads more relevant.
publishers and their partners: After an advertiser has crafted the ad and identified its target audience, it is then sent to publishers and their partners. Publishers refer to websites and apps that produce and distribute digital content, often partially funded by advertisements. These ads are optimized using data provided by consumers to these platforms.
When publishers effectively connect brands with individuals likely to be interested in them, advertisers are more inclined to invest in their platforms or sites. This connection is facilitated by data. Publishers serve as the most direct link to consumers and are well-attuned to the needs and feedback of their audience.
Publishers must carefully balance the interests of their consumers, their own brand, financial considerations, and technical requirements with each ad displayed alongside their content. Thus, making informed decisions is crucial. Publishers frequently collaborate with partner companies such as ad exchanges and measurement providers.
Ad exchanges act as systems and intermediaries that facilitate automated buying and selling of advertising inventory between advertisers and publishers. These systems rely on data gathered from publishers, advertisers, and various ad inventory platforms.
Measurement providers encompass third-party entities and ad providers offering solutions for evaluating the effectiveness of advertising. This enables publishers and advertisers to assess the performance of ads and refine their strategies accordingly.
- gatekeepers and regulators: The final stakeholders in the advertising ecosystem are the gatekeepers and regulators. This encompasses a broad array of entities, including companies, industry groups, and government agencies, all of which play a role in determining how private data can be utilized in advertising. We’ll specifically examine three types of gatekeepers: browsers and device platforms, governments, and industry organizations.
In the digital realm, any advertisements encountered by consumers ultimately appear on a device or browser. Therefore, these browsers and devices hold significant importance within the advertising ecosystem. Rules or restrictions implemented by browsers and devices can profoundly impact their user base as well as advertisers aiming to engage with them.
Governments increasingly play a regulatory role in overseeing the collection and usage of personal data in advertising. Legislation such as GDPR, CCPA, and COPPA places limitations on the extent of personal data that can be collected and dictates how such data may be utilized. We’ll delve deeper into these regulations later on.
Additionally, industry organizations contribute to shaping data and privacy regulations. These groups comprise members from various sectors of the advertising ecosystem and work towards fostering a better understanding of privacy issues and encouraging their adoption within the advertising industry.
The digital advertising ecosystem can be intricate, so let’s illustrate it with an example. Previously in this course, we discussed DDC Cleaning and their launch of SnackWall, a subscription snack service for businesses. James, representing DDC Cleaning, aims to advertise SnackWall to reach his target audience.
James takes on the role of the advertiser, while the target audience represents the consumers. Recognizing Facebook as an ideal platform for his ads, James selects it as the publisher he plans to collaborate with. Facebook will leverage its user data and insights to ensure James’s ads reach the intended audience.
However, James acknowledges that not all potential consumers may be active on Facebook. To broaden his reach, James seeks advice from an advertising agency, which recommends utilizing an advertising exchange to place ads across a wide range of websites. This allows James to tap into the browsing behavior data of numerous consumers to effectively target his ads.
James’s agency partners with OpenX, an example of an advertising exchange, which assists publishers in monetizing their content by placing ads alongside it. While James’s ads won’t reach every individual in his target audience, they will be delivered through browsers and devices, subject to certain data collection limitations imposed by publishers and their partners. Compliance with regional laws and regulations regarding data usage is also essential for the publishers and partners James collaborates with.
1.3.6.3 who owns data
Let’s take another look at data-ownership in light of data privacy depending on who collects and manages the data. There are different restrictions and regulations of course, as a consumer, you are always the owner of your data, but you can engage in a relationship with a publisher or an advertiser where you allow them to use your data.
As we saw earlier in this course, the parties that you allow to collect and store your data fall into three different groups depending on how direct your relationship is with them, first party, second party or third party.
First party data
is the data a company receives from the people it’s interacting with directly. These people could be customers, visitors to the website or followers on social media. For example, if you’re using a social media platform and you click on an ad for Pizza, the platform may infer that you are interested in Pizza, that information is considered first party data that the social media platform is now managing about you. With first party data, there is an implicit or explicit agreement between the consumer and the data receiving company that it can use your data. Of course, with certain restrictions, as we saw earlier.
Second party data
consists of the same type of data is the first party data, but in this case the data has been passed on to a second party, often a trusted partner of the first party. Continuing the metaphor, if the social media platform you signed up for earlier gives information to a partner, that partner company also knows that you like Pizza then both managed that data about you. In this scenario, ideally the partner is trustworthy and responsible with the data that was passed on from the first party social media platform and it’s the responsibility of the first party to ensure that the second party won’t misuse your data. If not, the repercussions would involve both parties first and the second party.
The final bucket that personal data can fall into his third party data
. Third party data is collected by a company or entity that doesn’t have a clear relationship with the company a person is interacting with or the first party. Third parties may track a person’s behavior across sites, for instance, using browser cookies, which we call third party cookies
. Information gathered this week can be bundled together to create a profile about a person that can then be sold to advertisers. Using our metaphor one more time, if a third party tracks your behavior across the web, they might learn about your interest in Pizza, even if you didn’t specifically give that information to them. This type of information gathered is the kind that makes consumers most uncomfortable, since people don’t have an explicit agreement with these third parties that allows them to collect and store this information. Third party data tracking often raises privacy concerns. As a result, several browsers block third party cookies to help limit this kind of data collection.
It’s important that all parts of the advertising ecosystem, first, second or third parties set and maintain high privacy standards. That’s the only way that a consumer can trust and have confidence that their information is being handled appropriately.
1.3.6.4 Regulations
Around the globe, governments are taking on a role in regulating how personal data is gathered and used.
It’s important for marketing analysts to know about these laws as they regulate how user data can be used, and of course, as a consumer, it’s also good to know what your rights are. All three of these regulations affect how much and what type of data an advertiser can use for data-based advertising.
GDPR
: GDPR stands for the General Data Protection Regulation. It’s a law that protects data and online privacy in the European Union. GDPR is a very detailed and pretty complex law. At a high level, we can group some of its key requirements in two buckets: European citizen data rights and data protection obligations for companies who collected the data.
Right to access: The law specifies that people have the right to access the personal information received about them,
Right to correct: they have the right to correct that information
Right to erase: the right to have all the information that was collected and saved about them erased
Right to data portability: they have the right to data portability
In other words, they have the rights to get a file of their personal information and pass it on to another party. As for the data protection obligations, the law specifies how companies must protect the information they receive. It also specifies that companies must alert people within 72 hours if their personal data was leaked, and companies must designate people whose job it will be to protect the data they received.
The law also says that companies should limit their data collection and that certain data categories are prohibited, for instance, ethnicity
and sexual orientation
.
Before receiving any data, companies should ask for permission. This is the right to prior consent. GDPR was developed and is managed by the European Union. Any company that works with the personal data of EU residents must comply no matter where the business is based.
In practice, this means that most digital businesses have to comply with this law. The different EU countries each have their own supervisory authorities that monitor compliance. Fines related to GDPR can be substantial, up to four percent of the annual revenue of a company. In October 2020, clothing retailer H&M was issued of $41 million GDPR fine after several hundred employees were found to be illegally under surveillance. The company kept extensive profiles of employees, families, illnesses, and religious beliefs. The Data Protection Authority of Hamburg, Germany found that this case showed a disregard for GDPR data protection rules. Laws like this and fines of this magnitude have reinforced how seriously the European Union is taking personal data privacy.
CCPA
: In 2019, the Governor of California signed into law the California Consumer Privacy Act. The strictest of any consumer privacy laws in the US, this law aims to give residents of California more privacy and protection. This law is not dissimilar from the GDPR, the law wants to give consumers more insight into what data is collected about them and say whether or not they want their data collected.
There are five distinct rights the CCPA gives to consumers. The right to know what information
is collected about them, the right to know whether they’re data is sold
and to whom, and the right to opt out
of that sale, the right to access
to personal information that was collected about them, the right to require the business to delete
their personal information, and finally, the right to not be discriminated against for exercising their rights under the act.
The CCPA falls under the responsibility of the California State Attorney General’s office. Any business that collects and controls the personal information of California residents should comply with the CCPA. The California Attorney General and residents of California can initiate lawsuits. Fines under the CCPA can be up to $7500 for intentional violations and $2500 for unintentional violations. The law went into effect in January 2020. It’s still a bit too early for any high-profile lawsuits under this law.
COPPA
: COPPA or the Children’s Online Privacy Protection Act. This US law took effect in 2000, and limits the collection and use of personal information of people under the age of 13. COPPA was specifically designed to protect children. It requires that notice be given and parental consent is obtained before any personal information is collected from children.
It also requires that companies have a clear and comprehensive privacy policy, and companies that collect data from minors need to keep that data confidential and secure. COPPA is managed by the US Federal Trade Commission or the FTC. All companies that interact with children under the age of 13 in the US must comply with COPPA. The FTC relies on people to alert them to violations of COPPA and those complaints can prompt an investigation. Fines related to COPPA can be fairly substantial, up to $40,000 per violation.
Here’s one example of a high profile case that violated COPPA. In 2019, Google was find $170 million for collecting and saving personal information from children and using it for advertising on YouTube. Needless to say, violations can be costly, not only in terms of the fines, but also in terms of the consumer trust that is lost. This was just a high-level overview of the most prominent laws governments have established to regulate the collection and use of personal data. There may be other local regulations, so depending on the region you work in, it’s worth checking which laws are in place.
1.4 Marketing Analytics
Here are three types of marketing analytics:
Predictive Analytics: Predictive analytics enables you to forecast the potential outcomes of marketing campaigns and estimate the likelihood of future events. This type of analysis provides insights into future trends and behaviors, allowing you to anticipate how the market is likely to respond to your advertising efforts. For example, it can predict customer responses, sales increases, or changes in market share based on past data and statistical models.
Descriptive Analytics: Descriptive analytics focuses on understanding past and present data to identify patterns and trends. It provides a comprehensive view of what has happened in your business, such as customer behaviors, sales performance, and market conditions. By analyzing historical survey data from customers, you can gain valuable insights into their preferences and past behaviors, although this analysis doesn’t guide you on future actions.
Prescriptive Analytics: Prescriptive analytics goes beyond predicting future outcomes by recommending specific actions to achieve desired results. It analyzes data to suggest the best courses of action to take. For instance, if your company needs to address customer concerns about high salt content in products, prescriptive analytics can provide actionable strategies to mitigate these concerns and improve customer satisfaction. It helps you understand the optimal decisions to make in order to achieve your business objectives.
1.4.0.1 AIRBNB EXAMPLE
Strategic challenge: how do we improve rental prospects for our hosts and identify better rental options for our guests?
Mental Model:
Profit per Property Gross Margin (%) Profit per Property <- Price, # of rentals, minimum stay
What is the value of brands?
Brand is not just a name, color, shape or logo. It is a complex entity.
Brand personality is part of larger concept of brand architecture. Analytics help us identify how marketing affects brand architecture.
Marketers use data to tweak features and benefits to create a stronger connection between customers and the brand
Marketers analyze data around marketing campaigns and their impact on all the components of brand architectures.
Product attributes: features of the product.
- What a brand means to a consumer.
Customers buy benefits, not features. When building up brand architecture, knowing the benefits helps with consistent messaging to consumers.
Brand personality contributes to the strength of a brand, but isn’t used to determine brand strength.
The architecture of a brand is not part of calculating brand value, but it is important to understand what the brand means to consumers.
1.5 Data Analytics
We can define data analytics as the process of collecting, cleaning, organizing, analyzing, and interpreting data to uncover insights and make informed decisions.
1.5.1 Data Analytics vs. Data Science
As you continue to learn about data, you will come across tons of data-related terminology. Among these are two often-confused terms: data analysts and data scientists. These two roles are similar in the sense that they both work with data to gather insights, but how they work with data is what sets them apart. In this reading, you will learn the differences between these two disciplines by reviewing their roles, responsibilities, skills, and backgrounds.
Data Analysts
Data analysts work with structured data to identify patterns, build visualizations, and extract meaningful insights that help organizations make informed decisions.
Responsibilities
Data analysts are typically responsible for maintaining databases, interpreting data sets, and creating reports that effectively present data trends, patterns, and predictions. Some common tasks include gathering data from various sources, cleaning and organizing data, and presenting findings in easy-to-understand visualizations.
Skills and Tools
Foundational mathematics and statistics
Analytical thinking and data visualization
Basic fluency in R, Python, and SQL
SAS, Excel, and business intelligence software
Background
Data analysts are commonly experienced in mathematics and statistics. They might also have a degree in mathematics, statistics, computer science, or finance.
Data Scientists
Data scientists work with various data types, including structured and unstructured data. They use advanced data techniques, including machine learning and predictive modeling, to design processes, develop models, and extract insights from data.
Responsibilities
Data scientists are typically responsible for arranging undefined datasets, writing algorithms, building automation systems, and statistical models. Some of their common tasks include gathering and cleaning raw data, creating data visualization tools, dashboards, and reports, and developing code to automate data collection and processing.
Skills and Tools
Advanced statistics and predictive analytics
Machine learning and data modeling
High-level, object-oriented programming
Hadoop, MySQL, TensorFlow, and Spark
Background
Data scientists are commonly experienced in computer science and are generally required to have a master’s or doctoral degree in data science, information technology, mathematics, or statistics.
Although their titles are similar, data analysts and data scientists have distinct roles, requirements, and career paths. Now that you know the difference between the two, consider this as you continue your journey in data analysis.
Conclusion
Although these two disciplines are very similar and often go hand-in-hand in terms of skills and how they work with data, there are some subtle differences to keep in mind as you explore which focus you’d like to pursue in how you work with data.
1.5.2 OSEMN Framework Overview
Obtain: Gather the data
• Determine what data would be useful
• Evaluate what data are available
• Decide on how the data can be gathered
Scrub: Clean the data to prepare it for analysis
• Correct inconsistent formatting
• Remove duplicate records
• Handle missing values
• Remove inaccurate information
Explore: Search for interesting patterns and statistics that stand out
• Examine variable distributions
• Examine variable relationships
• Perform statistical tests
Model: Generate predictions and insights
• Select a model type for your goals (often in cooperation with a partner)
• Categories of models include:
o Classification - Is this “A” or “B”?
o Regression - How much or how many?
o Clustering - What natural segments can we find in our data?
iNterpret: Help others to understand the results of your analysis
• Build visualizations
• Construct stories
• Create presentations of your findings
1.5.3 Obtaining Data
1.5.3.1 Sampled Data
Sampled data is data from a subset of a larger population or a larger data set that’s used to represent the entire population or data. In other words, you are a smaller number of data, that’s a good representation of the total data set you would like to study.
It’s a common practice in data analytics to sample data, because analyzing the entire population can be costly, time consuming, or even impossible. Sampling allows us to draw conclusions about the population while only analyzing a fraction of it.
Let’s run through some situations where sampled data is necessary. First, the population might be too large. When dealing with large data sets or populations, analyzing the entire data set can be impractical or impossible.
Imagine you are a market researcher and you work for a large auto manufacturer, the company wants to get a better understanding of their customer satisfaction. They would like to know how satisfied their car buyers are with the performance of the cars, as well as with the service they’re getting from the dealerships.
You know, you can get all this information using a survey in which you ask customers questions about their experience and have them rate how satisfied they are. But because your company has hundreds of thousands of customers, it’s not practical to send a survey to each individual who owns one of your cars. Instead, you decide to send a survey to a sample of your customer base, a smaller group of customers that you will select as a representation of the larger customer population.
Second, and related to the first point, you might have cost constraints, collecting data from an entire population can be expensive. Sampling can be more cost effective, allowing researchers to allocate resources to other important tasks. In our example, for the car company, surveying everyone would cost a lot of money just in getting the surveys out to everyone. But it would cost you even more to get responses from all these customers. You might need to use incentives to get customers to answer, which could cost a lot of money if you’re talking to hundreds of thousands of people.
You could also face time constraints, collecting data from an entire population can also be time consuming. Sampling can save time and allow researchers to collect and analyze data quickly. For our car company, collecting responses from a very large group of people would take you quite a bit more time than focusing on a smaller number of customers who bought a car from your company.
In some cases, it might not be possible to analyze the entire population because it would be destroyed in the process. Data analysts refer to this as destructive sampling. For example, testing every product manufactured in a candy factory would result in the destruction of the entire inventory and the destruction of your teeth, but that’s not really what’s referred to here.
So for all of these reasons, you might be dealing with a subset of the data instead of the whole data set, or in other words, a sample. But when you work with sample data, there are a few things you should consider to help you evaluate the quality of your data.
First, the sample size. Sample size is the number of observations in the sample. A larger sample size generally provides more accurate estimates of the population. If the sample size is small, it might not give you enough to work with, and the data might not accurately represent the population.
A smaller sample will also limit the types of analyses you can perform on your data, because certain analyses require many observations to provide an accurate result.
In other words, if you want a reliable analysis, larger samples are usually a must. The sample should also be representative of the population being studied. If the sample is biased, the conclusions drawn from the sample might not be valid for the entire population.
Let’s say you are conducting a study on the average salary of employees in a certain company. The company has a total of 1000 employees, and you decide to survey 50 of them to collect data on their salaries. However, you only choose to survey employees who work in the finance department because you assume that their salaries will be representative of the entire company. In this scenario, the sample is not representative of the entire population of the company because it only includes employees from one department.
The sample is biased towards finance employees and might not be accurate for the entire company. This can lead to incorrect conclusions being drawn from the data, such as assuming that the entire company has a higher average salary than it actually does.
To ensure representativeness, it would be better to use a more random sampling method that includes employees from different departments and positions. Such a sample would better represent the entire company, and it will give you a more accurate read on the average salary.
Another important thing to think about when using samples is the generalizability of your sample. The conclusions drawn from the sample might not be applicable to other populations. It’s important to be aware of the limitations of the sample and not over generalize the findings.
Let’s say you are conducting a study on the eating habits of college students in a particular university. You decide to collect data by surveying 100 randomly selected students who attend to university. However, your study only includes students who attend that specific university, which means that the conclusions drawn from the study might not be generalizable to all college students or to the population at large.
This is because the students attending that university might have different demographics, cultural backgrounds, and socioeconomic statuses compared to students at other universities. For example, the university in question might be located in an urban area and attract more students from low income households. This means that the study’s findings might not be applicable to students attending universities in suburban or rural areas or students from higher income households. Therefore, it’s important to consider the generalizability of the sample when conducting research to avoid drawing incorrect conclusions or making incorrect generalizations about a larger population patient.
Conclusion: Sampled data is a useful tool in data analytics when dealing with large populations or data sets. However, it’s important to consider the sample size, representativeness, and generalizability when working with sampled data to ensure that the conclusions drawn are valid.
1.5.3.2 First-Party Data
Definition: First-party data refers to information collected directly by a company or organization from its own audience, such as customers, website visitors, or app users. This data is considered highly valuable because it is unique to the company and typically involves direct interactions with the brand.
How It Is Collected:
- Website and App Analytics: Data collected from user interactions on a company’s website or mobile app, such as page views, time spent on pages, click-through rates, and purchase history.
- Customer Relationship Management (CRM) Systems: Information gathered from customer interactions, including contact details, purchase history, and customer service interactions.
- Surveys and Feedback Forms: Direct input from customers through surveys, feedback forms, and user reviews.
- Email Marketing Campaigns: Data collected from email open rates, click-through rates, and responses to email campaigns.
- Loyalty Programs: Information from customer participation in loyalty and rewards programs.
Who Collects It: - Businesses: Retailers, e-commerce platforms, service providers, and any organization with a direct relationship with customers. - Organizations: Non-profits, educational institutions, and government agencies collecting data from their members or service users.
1.5.3.3 Third-Party Data
Definition: Third-party data refers to information collected by entities that do not have a direct relationship with the users from whom the data is derived. This data is aggregated from various sources and sold to other companies for marketing and analytical purposes.
How It Is Collected: - Data Aggregators: Companies that specialize in collecting data from multiple sources, such as websites, apps, public records, and social media platforms. - Purchase and Usage Data: Information gathered from third-party apps and services that users interact with. - Cookies and Tracking Pixels: Data collected through cookies and tracking technologies placed on various websites to monitor user behavior across the web. - Public Records and Social Media: Information pulled from public databases and social media platforms where users share data publicly. - Surveys and Panels: Data collected from third-party surveys and consumer panels, where participants opt-in to share their information.
Who Collects It: - Data Brokers: Companies like Acxiom, Experian, and Nielsen that gather, compile, and sell data to other businesses. - Advertising Networks: Ad networks like Google AdSense and Facebook Audience Network collect data to target advertisements more effectively. - Market Research Firms: Organizations that conduct market research and compile data from various sources to provide insights to businesses.
1.5.3.4 Key Differences
- Source of Data:
- First-Party: Directly from the company’s own customers or users.
- Third-Party: Indirectly from external sources without a direct relationship with the data subjects.
- Data Quality:
- First-Party: Typically more accurate and relevant, as it comes from direct interactions.
- Third-Party: Can be broad and extensive but might lack accuracy and specificity due to aggregation from various sources.
- Control and Privacy:
- First-Party: Companies have full control over data collection and usage, often leading to better compliance with privacy regulations.
- Third-Party: Companies must rely on the data provider’s compliance with privacy laws, which can introduce risks.
- Usage:
- First-Party: Primarily used for enhancing customer experience, personalization, and direct marketing.
- Third-Party: Used for broadening customer insights, expanding reach, and enhancing advertising targeting.
1.5.3.5 An Overview of Helpful Free Datasources
Accessing data is simpler than ever, and there is a wide range of helpful data sources at your disposal. Here’s a list of free data sources to help you gather information and insights more effectively, along with links to those resources.
Like Google Scholar, Google Dataset Search provides access to millions of datasets hosted on public websites, such as Kaggle and OGD Platform India, in thousands of locations on the internet.
The United States Census Bureau provides access to quality and essential data about the United States’ population, economy, and geography.
The Pew Research Center provides insights and analysis on a wide range of social, political, and technological issues through surveys and research.
As the European Union’s statistical office, Eurostat provides comprehensive economic, social, and environmental data.
The Organization for Economic Co-Operation and Development (OECD)
The OECD is a reliable source for comparative data and analysis on global economic and social matters.
Kaggle hosts hundreds of thousands of high-quality public datasets from several industries to explore, analyze, and share.
National Centers for Environmental Information (NCEI)
NCEI is part of NOAA’s Office of Oceanic and Atmospheric Research and provides environmental data regarding climate change and global chemical measurements.
This comprehensive dataset includes indicators such as population size and unemployment rates collected from hundreds of countries worldwide, offering insights into global economic, social, and environmental trends.
1.5.3.6 Summary: Validity of Data
When obtaining data, it is important to check the validity of your dataset, or in other words, ensuring your data are of high quality so you can move on to the explore and analyze phase.
Here is a checklist you can use to ensure the validity of your data
Source credibility:
⏹ Authorship: Is the data provided by a reputable author or organization? What are the credentials of the author or organization?
⏹ Publication date: Is the data current and up-to-date?
Methodology:
⏹ Sample size: Was the data collected from a large enough sample?
⏹ Sampling method: Was the sampling method unbiased and representative?
⏹ Data collection: Were the data collection methods clearly described and appropriate?
Objectivity:
⏹ Bias: Are there any apparent biases in the data or its presentation?
⏹ Conflicts of interest: Are there any potential conflicts of interest that could influence the data?
Accuracy:
⏹ Consistency: Are the data consistent with other reputable sources?
⏹ Error rate: Are there any obvious errors or inconsistencies in the data?
Relevance:
⏹ Scope: Is the data relevant to the research question or topic?
⏹ Context: Is the data presented within a meaningful context?
1.5.4 Scrub the Data
Clean your data to ensure that it is usable for the next phases where you will explore, model, and interpret your data.
What does it mean to scrub your data and why is it so important?
Scrubbing data is sometimes also referred to as cleaning data. The scrubbing face transforms raw, dirty data into clean data.
The scrubbing process can be divided into four main tasks:
- Removing duplicates,
- Formatting records,
- Solving for missing values,
- Checking records for mistakes or wrong values.
Dirty data is any data that contains duplicate records, is inconsistently formatted, has missing values, or contains inaccurate information. In contrast, clean data contains only unique records, has a consistent structure, has no missing values and contains reliable and accurate information. Ensuring your data is clean is an essential step before analyzing the data for further insights.
Without the scrubbing stage, the errors in your data might affect your exploration and analysis, and they can lead you to drawing the wrong conclusions. Or in some cases, they can mess up any analysis you want to do on the data or make any model you apply to your data gets stuck, making it impossible to draw any conclusions at all.
1.5.4.1 Scrubbing Checklist
The scrubbing stage is all about cleaning your data and getting your dataset ready for analysis. You can use this checklist to help you in the process.
- Removing Duplicates
⏹ Identifying duplicate records: inspect records for duplicates and verify that they are actually a duplicate record.
⏹ Remove duplicate records: remove the duplicate records from your dataset
- Formatting records**
⏹ Ensure consistency: check all data follow a consistent format and adjust the format if necessary
- Solving for missing values
⏹ Identify the missing values
⏹ Solve for the missing values: Replace the missing values with text (e.g. NA) or delete the entire record with the missing value
- Checking for wrong values
⏹ Identify wrong values
⏹ Solve for the wrong values: Replace the wrong values with the correct ones if you can or delete the entire record with the wrong values
1.5.5 Exploring Data
1.5.5.1 Visualize data
Anscombe’s quarter:
4 sets of data with same mean and standard deviation but all has different distribution. It is important to visualize data to really understand the relationships.
1.5.5.4 Summary: Exploring Data
Explore Checklist
What is your data telling you?
⏹ Inspect your data: If your dataset isn’t too large, read through your data to assess whether interesting information jumps out
⏹ Use summary statistics: Evaluate your data by summarizing it (categorize, use statistics like average, standard deviation, etc.)
⏹ Inspect a random sample of your data: if your dataset is too large, a random sample may give you some initial information
Visualizing data
⏹ Visualize your data using bar charts, line charts or scatter plots to examine information hidden in your dataset.
Bar charts Line charts Scatter plots
Examine variable distributions
⏹ Inspect the distribution of your data
Categorize the data
Plot the categorized data
Common data distributions:
Normal Bimodal Log-normal Exponential Uniform
Learn more about your data:
⏹ Evaluate the minimum
⏹ Evaluate the maximum
⏹ Evaluate the mode
⏹ Evaluate the standard deviation
Examine variable relationships
⏹ Visualize variables to understand their correlation
Common visualizations:
Scatter plot Line chart
⏹ Calculate the correlation coefficient to understand the strength of the correlation
0 = no correlation
1 = perfect positive correlation
-1 = perfect negative correlation
Feature engineering
⏹ Evaluate whether we can create new features or modify existing ones to better understand our data
1.5.6 Modeling
We’ve obtained the data we needed, scrubbed it, and explored it thoroughly. Now, we’re ready for stage 4 “Modeling”. This phase is about using our data to make predictions with mathematical models. These models can be anything from simple linear regressions to advanced machine learning algorithms depending on the project. Although there are many different models, they all work by discovering hidden patterns in data and using it to make predictions on any new data we give the model. For example, you might build a model to predict how many conversions you expect a campaign to deliver. You would do that by using data from the past to predict the future. Our discussion of modeling is going to be broken down into the following 3 sections: What are models, how the models work, and the types of models? All of the steps of the OSEMN process are important, but modeling is a central piece of data analysis.
’’ All models are wrong but some are useful’’ Statistician Box
1.5.6.1 Real world Example
In previous videos, we followed Keira, a data analysts working with Carlos, the owner of Inu and Neko, a dog and cat care company. Carlos had approached Keira to help and launch a new subscription meal service for cats and dogs and wanted to select the 10 best products to offer as part of the subscription. Keira started by obtaining the necessary data from their e-commerce software. She downloaded last year’s sales data into Google Sheets to analyze which products were most popular and purchased repeatedly. She obtained the data properly by making sure it was credible, collected accurately, objective, accurate, and relevant to Carlos’ questions. Keira proceeded to scrub the dataset, checking for duplicates, inconsistent formatting in the zip codes, and missing values like phone numbers and sales totals. She standardized zip codes for consistency and remove the phone number column because it was not crucial for analysis. Keira filled in missing sales stores by calculating product price multiplied by quantity. Some records lacked customer IDs, preventing the assessment of repeat buyers. So Keira deemed them unhelpful and removed them from the dataset. Lastly, she identified inaccurate information such as negative sales amounts and unusually high prices, recognizing them as glitches in Inu+Neko’s system and excluded them from consideration to maintain data accuracy for future analysis. Now that Keira has completed scrubbing the data successfully while addressing duplicates, inconsistencies, proportionate gaps, and inaccuracies, she’s ready to dive into our next topic; explore and model. In the explore and model stages of the OSEMN Framework, Kiera is now tasked with uncovering patterns and trends within the data and creating a model to predict subscription bundle preferences for Inu and Neko’s customers. During the explorer stage, Keira will perform various exploratory data analysis techniques to gain insights. This might involve using charts and visualizations to identify correlations between different product categories or demographic segments. She’ll also dive deeper into customer behavior by analyzing purchase frequency, average basket sizes, and seasonal trends. Once Keira has explored the data thoroughly. She can move on to modeling. Modeling involves using algorithms to create a predictive model based on historical data. Keira will develop models to help Carlos meet his goal of hitting 500 subscriptions. In the upcoming videos, we’ll follow Keira through the explore and model stages of the OSEMN Framework! Let’s get started!
1.5.6.2 Exploring Data
Kira now has a clean set of data that she considers relevant for the question she got from Carlos at Inu+Neko: Which products should he include in the new dog and cat food subscription product to help him reach 500 subscribers by the end of the year? Kira gets started with Step 3 of the OSEMN cycle, she starts to explore the data. She thinks that a good place to start is to see what types of data are in each column. She sees that she has both categorical and numerical data. There’s a date column and text columns like order numbers and customer IDs. She notices that the order number and customer ID columns are very long and take up a lot of screen space and are hard to type. So, she decides to map each unique value to a number in a process called encoding. Encoding is the process of turning a string of data into numerical data by mapping each unique string to a unique number. Encoding can be used to reduce the amount of memory needed to work with data, or to make large, complex strings easier to work with. Can also be used to turn text data into numbers when a model needs numerical inputs instead of text inputs. In this example, Kira’s using encoding to make the data easier to view and understand. She also notices that there are quite a few columns that contain redundant information. A column is redundant if you could look at one column and always guess what’s in the other column. In this case, the customer ID and name are redundant. They might not always be, but for this product she knows it’s safe to assume, so she hides customer name. She also notices that SKU and product name are redundant as well, so she hides the SKU. She also decides that it is unlikely that this project will look at data at the street address level, so she only keeps the state and zip code. Great, now she can see all of the useful columns at a glance. And not having a large text columns even helps her computer run more smoothly. Next, she decided to run some summary statistics for the remaining columns. She looks at things like the most common value, how many times it appears, and what percentage of the rows it shows up in. She also looks at the minimum, maximum, range, and mean of the numerical values. From this, she sees that the data spans 899 days from the spring of 2019 through the summer of 2021. She sees that Texas is the most popular state, which makes sense because it is quite populous. There are also some numerical columns that she thinks could come in handy, like price and quantity. Kira decides to add up the total sales for each product and use a bar chart to look at the distribution of these total sales. This gives her exactly what she was looking for, the top selling products. She can also see from this chart that cat products tend to sell more than dog products. She realizes that it’s great to know what top selling products are, but it might also be important to know what causes those products to sell more. To answer that question, she creates a few scatter plots to observe the relationships between the quantities sold and different variables. One relationship that stands out to her is quantity sold and price. She notices that if she looks at just cat or dog products in isolation, the quantity sold is low for low and high-priced items, but high for items in the middle. This means that the two variables have a positive correlation for low prices, and a negative correlation for high values. This is great information to pass on to Carlos. Maybe adding more medium-priced items will help him find additional products for his subscription service. Kira now knows the top selling products and a possible reason for why those products are the top sellers. Kira’s ready to move on to the next stage in her analysis, the model stage.
1.5.6.3 Type of data
- Ordinal Data
Ordinal data refers to a type of data that involves order or rank. Unlike nominal data, which is purely categorical and lacks any inherent order, ordinal data has a meaningful order among its categories, but the intervals between the categories are not necessarily consistent or known.
An example of ordinal data is a survey response scale for customer satisfaction:
- Very Unsatisfied
- Unsatisfied
- Neutral
- Satisfied
- Very Satisfied
In this example, the responses have a clear order (from very unsatisfied to very satisfied), but the difference between “Very Unsatisfied” and “Unsatisfied” is not necessarily the same as the difference between “Neutral” and “Satisfied.” This makes it ordinal because while the ranking is meaningful, the exact differences between ranks are not quantified.
1.5.7 Interpreting the Results
The interpret stage is where you interpret your analysis. It’s arguably the most important. Without it, all we would have is data and statistics.
The interprets stage translates your analytical findings back to a business context. After successful modeling stage, you’ll have a new tool like a regression model that can be used to generate predictions.
The answers generated from these sorts of models are very specific and usually aren’t immediately interpretable or understandable by non-technical team members.
During the interprets stage, our goal is to close the loop of the OSEMN cycle by using the models and insights we generated during the exploration and modeling phases to try and answer the business question driving the entire project.
In other words here you look back at your objective for your analysis. Your goal here is twofold.
First and foremost you need to understand the results of your model and all the insights it can provide. These might be the actual predictions the model makes, like forecasting the results of sales from a campaign or they might be information contained within your model like an insight that shows that mailing lists sign-ups are strong predictors of people spending more money with your company.
Second, you need to be able to explain your findings to a non-technical audience in a clear concise way. Simply understanding the implications of your model isn’t enough. You need to be able to make others understand it and trust your results.
Remember, analytics projects are about generating actionable insights or information that can be used to make better decisions that help the company.
- What was the objective of this analysis?
It’s important to go back to your starting point because that will remind you of the questions you set out to answer. It’s quite easy to get lost in the data during the model and explore stages and lose sight of your initial question.
- How does the data answer my questions?
Maybe the data shows you that the business goal you set is currently unattainable, or maybe it gives you a plan that you could use to move forward.
- What other learnings do I have?
In the process of answering one business question, you will often find new pieces of potentially useful information that help solve the problem at hand in a different way. Or maybe they open up new potential business objectives you can address in later analysis.
- How can I apply this to a business context?
Gaining new knowledge is great, but it is important to focus on information that’s actionable and moves your business forward in meaningful ways. It will often be someone else that takes action based on your information, so think about how that will happen.
- How confident should I be in my results?
If you see an improvement in a business metric, was it due to the changes you made or was it due to random chance? Many data analysts are overconfident in their results, and when they implement what they have learned, they quickly discover that something was wrong with their analysis. That brings us to the topic of, how do you know if you should be confident in the results of your model?
Earlier, during the modeling stage, we briefly discussed using a separate set of test data to check your trained model against. The testing process is all about ensuring you have the right amount of confidence in your model. By running your test data through your model, you can answer questions like, how wrong is the model on average? If the model predicts something, how likely is it to be correct or incorrect? Are there particular scenarios that cause the model to be incorrect?
Even the best models have limitations, so it’s important to know what they are. On top of those basic questions, you can also use a tool called statistical testing to quantify how confident you should be.
Statistical tests are mathematical methods of ensuring that differences are not caused by random chance. Sometimes this is called the significance of the results.
For example, you’re trying to improve an email campaign. Your model recommends a change that you implement, and you see a 5% increase in sales. Great, you just made the company 5% more money, right? Well, not so fast, it’s possible that the increase in sales was random or because of some other factor. How can you know? You might run a statistical test and see that you should be 80% confident that the change in revenue was due to your new improved emails. It’s then up to you or your organization to decide how confident you need to be to take action.
Sometimes organizations want to be between 90 and 95% confident to take action, other times, they’re happy with greater than 50%. It often depends on how risky it is for your business to be wrong.
This is why statistical tests are so useful to businesses. So how do statistical tests work? To be honest, there’s a lot of complicated math involved that we won’t get into here. But there are some things they generally measure, including the differences in the averages of the datasets.
If the averages are very different, the difference is less likely to be caused by randomness, the size of the dataset. The more data you have, the more confident you should be that the difference in averages isn’t random, even if it’s small. And the distributions of the datasets, this is often measured using standard deviation.
A high standard deviation indicates high variability or that data values on average fall far from the mean. And a low standard deviation would mean that data in general is closer to the mean. If your data sets have high standard deviations, even large differences in their averages can simply be due to randomness. It’s important to note here that none of these metrics in isolation provide a quantifiable measure of confidence. Only by combining them using statistical tests can you measure confidence.
The interpret phase of the awesome framework is crucial because it’s where data driven insights are evaluated and communicated. You must revisit your initial analysis objectives, understand how the data answers your questions, and uncover any additional findings. Moreover, it’s vital to ensure these insights are actionable within your business context.
Tools like statistical testing play a key role, helping quantify your confidence in the results. And ultimately, the goal of the interpret phase and data analysis overall is not just to gain insights, but to make informed decisions that propel your business forward.
the interprets stage of the OSEMN process
This is where all the data explorations and number crunching finally pay off. This is also where we explain our findings and generate concrete recommendations for our organizations. Now that we’ve explored the entire OSEMN process to understand our model, we want to make the hard numbers tell a story. It needs to be a story that anyone in our team, not just the data experts can understand and act upon. This is a crucial step because it’s not just about having insights, it’s about communicating them effectively.
An essential aspect of effective data storytelling is choosing the right medium to communicate your findings, and there are many different mediums you could use. Some examples could be a slide presentation or an interactive notebook, or an in-depth report for an executive review. Each of these comes with its own strengths and challenges, and being able to adapt your story-telling approach to fit each medium is an important skill in data analytics.
Slide presentations are universally relevant across industries and job roles and provide a structured, visual and engaging way to take your audience through your findings and recommendations. Your presentation should recap the original goal, review how you went through the steps of the OSEMN cycle, visualize critical data points, and importantly, explain your findings and recommendations.
Key components of a slide presentation
- Original problem
Start your presentation by taking your audience back to the original problem that initiated your analysis. Re-introduce the issue at hand. What were we trying to solve? Why did we consider it necessary to undertake this analysis? Highlight the potential implications of not addressing this issue, illustrating why it was significant enough to warrant such an in-depth investigation. The purpose of this segment is to establish the context, helping your audience comprehend the relevance of the forthcoming findings and recommendations.
- The Method
Now that you’ve established the context, take the audience through the method you used. In this example you’re going to take the audience through the steps of the OSEMN process that was followed. You want to maintain high-level overview, provide enough context for each step so the audience can understand the methods that lead to the insights. Remember, the goal here isn’t to dive into highly technical details. You simply want to build a clear picture of the process leading up to your findings. If you’re presenting to a more technical audience though, you might want to include more of the technical details. Now, guide your audience through a visual tour of the data, using aesthetically appealing and easy to understand visuals such as graphs, charts, and tables. You should try to encapsulate the core data points that lead to your findings. Make sure these visuals are accessible to a broad audience and designed in a way that even non data analysts can comprehend. The role of visualizations it’s not just to display data, but to make the data speak for itself. Highlighting trends, anomalies, patterns, and correlations that underscore your findings. With visuals presented, it’s time to describe them for your audience. This is where you act as the translator, decoding the visuals and turning data into a narrative. Elaborate on the key observations drawn from the data, explaining what they signify in the context of the original problem. Try to link the patterns and trends into visuals to the story you’re trying to tell. Make sure your explanations are straightforward and relatable so that the audience can easily understand the story coming from the data. As you approach the end of your presentation, present your recommendations based on the findings, what actions should be taken to address the problem identified at the beginning of the presentation. If your data reveals a potential for improvement or modification in certain areas, outline those recommendations clearly. Discuss why you believe these steps would make a difference. Drawing connections between your findings and the recommended actions.
Let’s imagine we’ve been analyzing a recent decline in web traffic for an e-commerce store. We start our presentation with a recap.
We remind the audience about the initial problem; a significant drop in website traffic over the past three months, which we noticed during our routine metrics review.
Next, reshare the OSEMN process. We obtained website analytics data, scrubbed it to ensure accuracy, then explore that to identify trends and anomalies.
Our exploration led us to the modeling phase where we created a model that identifies potential costs.
We then present a visualization, a graph illustrating the downward trend in website traffic alongside the increase in page load time, which we discovered was the primary cause for our model output. The clear inverse correlation visually substantiates our key finding. We can use the graph to explain what happens and when the problem started. As the page load time increased, the web traffic decreased significantly. We’re able to tell the story of how our users started leaving our website because of longer low tides causing the website traffic to decline. And finally, we conclude with our recommendation. Given the correlation between page load time and traffic, we suggest optimizing the website’s performance in an effort to reduce low time, which should help recover and possibly increase our website traffic. Thus, our data journey comes full circle, offering actionable insights that can be used to improve our business. Interpreting your findings and explaining them effectively to your audience is what turns data into action. It’s the crucial final step in the OSEMN cycle, one that bridges the gap between raw data and real-world decisions, and that’s the power of data analytics
Explain, Enlighten and Engage
After you’ve analyzed your data and drawing your conclusions, you will often need to present your findings. You’ll want to make sure that you can convey your findings well and persuade people to understand and believe that. In order to tell a persuasive story, we need to focus on the three Es; explain, enlighten, and engage.
This can be achieved by a combination of data, narratives and visuals. Both the data and a narrative will serve to explain the situation. The narrative provides context for the data, specifically, where it comes from, why stakeholders should care about it and what was done with it. Data and visualization are part of enlightening your audience, as we’ve mentioned before, raw data are simply a bunch of values. It’s hard for most people to appreciate what the data I have to say in their raw form. By combining the data with good visuals, one can show clearly what the data are and what they mean to the overall big picture. You want to lead your audience to that aha moment to point where everything clicks. Engagement comes from the narrative and visualizations. If the visualizations are good and the narrative is clear and concise, people can internalize what’s being said. This internalization gets them invested in the story. As the Venn diagram shows all three parts together, the data visualizations and narrative can lead to audience understanding and persuasion as they’re brought together using the three Es of explain, enlighten, and engage. That’s frequently the goal of the story. Play video starting at :1:59 and follow transcript1:59 Suppose you’re presenting data on decreasing honeybee populations. You have data on the global decline in honeybee populations over the last decade gathered from various environmental research agencies. This data includes annual honeybee colony counts across several countries. You prepare a story around the importance of honeybees in the equal system, highlighting their role in pollinating a majority of the food crops we consume. You discuss the potential consequences of their declining numbers and why it matters to everyone, not just environmentalists. You create a line graph that vividly illustrates the decrease in honeybee populations over time across different countries. This visual representation allows for a clearer understanding of the magnitude of the problem. The numbers alone would illustrate. Now, how do these elements combine to explain, enlightened, and engage? You use the narrative to clarify the data’s origin and its significance. Stating disfigures derived from multiple environmental agencies showed the alarming rate of decline in honeybee populations over the past 10 years. The visual of the line graph is used to illustrate the data. You’d see this graph powerfully depicts the downward trend allowing us to visualize the severity of the issue at hand. Finally, you combine the narrative and visuals saying, imagine a future where many fruits and vegetables becomes scarce due to the lack of pollination. This graph isn’t just lines and dips. It represents potential problems to our food supplies that might need to be addressed. This makes the story memorable and drives home the urgency of the issue. In future videos, we’ll go deeper into data storytelling and how to tell an effective and compelling story. For now, remember that a good data story persuades your audience by transforming the data into stories that explain, enlighten, and engage.
So far, we focused quite a bit on data and we also saw how to create compelling visuals. But how about a narrative? How can we build a good narrative to help people understand and be persuaded by the data and the visuals? Every compelling story, from novels to movies to data analysis, typically has four key parts. Setup, buildup, climax, and conclusion. To create an impact, your data story should incorporate these elements. Let’s look at them one by one. As with any good story, we should start our story with a hook. Something to get the audience interested in following along. Frequently, these hooks are questions derived from curiosity. Is there a sudden change? Are we missing an opportunity? What should we expect moving forward? What we want to convey in the setup is the theme of the story. It could be that there is an issue or concern that needs to be addressed, or an opportunity to be seized. For our example hook, suppose that we notice a sudden dip in Inu and Neku’s sales for the last couple of months. An obvious and compelling hook is why? What could be causing this downturn? After we’ve set up the story, we want to create a build up. Build up is where the story unfolds. It’s here that we describe the steps taken in investigating the hook from the setup. We also want to communicate the findings from our investigations. The actions taken that lead us to the key findings are particularly important. The key finding is the insight from our analysis that has the greatest explanatory power. In our example, after some data exploration, we realized that the sales figures are from multiple channels. They can be broken up into Internet, wholesale, and retail. It looks like the change in sales is stemming from our Internet sales. We then investigated the web data and found that while Internet sales numbers went down, the number of customers visiting the online store did not. Digging further, we discovered that for Internet customers, there was a high abandonment rate of shopping carts, meaning many customers never checked out even though they had items in their cart. This appears to have started at the same time as the downturn started. So that would be our key finding. Online shopping cart abandonment rates increased during the downturn. If this was a mystery story, the climax happens when the villain is unmasked. For us, it’s when we explain the hook’s root cause with our key finding. Ideally, this is where the audience’s light bulb goes off. If you engaged with the audience adequately, they should now understand the dynamic between the hook and the cause and want to act on your insight. In our story, it’s strange that 92% of customers abandoned their cards while in the process of making a purchase. This didn’t happen in prior months or years. It’s at this point that we uncover the fact that customers abandoned their cards due to our lack of inventory for the products that they want to buy. If we don’t have the product in stock, of course our customers can’t buy it. Consequently, we received no sales for those products. Now we finish the story. If there is action that needs to be taken to remedy the issue, it should be revealed at this point. We should also discuss the cause if we have an idea of what it is. Going back to our example with Inu and Neku, now that we’ve uncovered the cause for the decline in sales, how do we fix it? Well, if we work to increase our stock of in demand products, we should be able to reverse the downward sales trend. If the products were in stock, our sales would have likely looked much better and the decline wouldn’t have happened.
Should be mentioned that you don’t always need to tell a nice and neat story when interpreting data. But the most impactful and meaningful discoveries tend to. When the insights are hard to understand or when the impact on your business is large, storytelling with data is essential. Expect an underlying story when data shows something unexpected, unpleasant, complex, costly, or especially surprising. These situations tend to point to a good data story waiting to be told.
Summary Reading: iNterpreting Data & Storytelling iNterpret Checklist
Step 1: Understand the results of your analysis
Ask the following questions:
⏹ What was the objective for this analysis?
⏹ How does the data answer my questions?
⏹ What other learnings do I have?
⏹ How can I apply this to a business context?
⏹ How confident should I be?
How wrong is the model?
How likely is the model to be correct?
What scenarios cause the model to be incorrect?
Step 2: Explain your findings
Build a presentation with these key components:
⏹ Recap
⏹ Method
⏹ Visualization
⏹ Explanation
⏹ Recommendation