Tuesday, October 5, 2021

Are We There Yet? The State of the Web & Core Web Vitals [Part 1]

No, please, do read on. This is a post about what has gone wrong with Core Web Vitals and where we stand now, but also why you still need to care. I also have some data along the way, showing how many sites are hitting the minimum level, both now and back at the original intended launch date.

At the time of writing, it’s nearly a year and a half since Google told us that they were once again going to pull their usual trick: tell us something is a ranking factor in advance, so that we improve the web. To be fair, it’s quite a noble goal all told (albeit one they have a significant stake in). It’s a well trodden playbook at this point, too, most notably with “mobilegeddon” and HTTPS in recent years. 

Both of those recent examples felt a little underwhelming when we hit zero-day, but the “Page Experience Update”, as Core Web Vitals’ rollout has been named, has felt not just underwhelming, but more than a little fumbled. This post is part of a 3-part series, where we’ll cover where we stand now, how to understand it, and what to do next.

Fumbled, you say?

Google was initially vague, telling us back in May 2020 that the update would be “in 2021”. Then, in November 2020, they told us it’d be in May 2021 — an unusually long total lead time, but so far, so good.

The surprise came in April, when we were told the update was delayed to June. And then in June, when it started rolling out “very slowly”. Finally, at the start of September, after some 16 months, we were told it was done.

So, why do I care? I think the delays (and the repeated clarifications and contradictions along the way) suggest that Google’s play didn’t quite work out this time. They told us that we should improve our websites’ performance because it was going to be a ranking factor. But for whatever reason, perhaps we didn’t improve them, and their data was a mess anyhow, so Google was left to downplay their own update as a “tiebreaker”. This is confusing and disorientating for businesses and brands, and detracts from the overall message that yes, come what may, they should work on their site performance.

As John Mueller said, “we really want to make sure that search remains useful after all”. This is the underlying bluff in Google’s pre-announced updates: they can’t make changes that cause the websites people expect to see, to not rank.

Y’all got any data?

Yes, of course. What do you think we do here?

You may be familiar with our lord and savior, Mozcast, Moz’s Google algorithm monitoring report. Mozcast is based on a corpus of 10,000 competitive keywords, and back in May I decided to look at every URL ranking top 20 for all of these keywords, on desktop or on mobile, as tracked from a random location in the suburban USA.

This was some 400,000 results, and (surprisingly, I felt) ~210,000 unique URLs. 

At the time, only 29% of these URLs had any CrUX data — this is data collected from real users in Google Chrome, and the basis of Core Web Vitals as a ranking factor. It’s possible for a URL to not have CrUX data because a certain sample size is needed before Google can work with the data, and for many lower traffic URLs, there is not enough Chrome traffic to fill out this sample size. This 29% is an especially depressingly low number when you consider that these are, by definition, higher traffic pages than most — they rank top 20 for competitive terms, after all.

Google has made various equivocations around generalizing/guesstimating results based on page similarity for pages that don’t have CrUX data, and I can imagine this working for large, templated sites with long tails, but less so smaller sites. In any case, in my experience working on large, templated sites, two pages on the same template often had vastly different performance, particularly if one was more heavily trafficked, and therefore more thoroughly cached.

Anyhow, leaving that rabbit hole to one side for a moment, you might be wondering what the Core Web Vitals outlook actually was for this 29% of URLs.

Some of these stats are quite impressive, but the real issue here is that “all 3” category. Again Google has gone and contradicted itself back and forth on whether you need to pass a threshold for all three metrics to get a performance boost, or indeed whether you need to pass any threshold at all. Still, what they have told us concretely is that we should try to meet these thresholds, and what we haven’t done is hit that bar.

30.75% passed all thresholds, of the 29% that even had data in the first place. 30.75% of 29% roughly equals 9%,  9% of URLs or thereabouts can concretely be said to be doing alright. Applying any significant ranking boost to 9% of URLs probably isn’t good news for the quality of Google’s results — especially as household name brands are very, very likely to be rife among the 91% left out.

So this was the situation in May, which (I hypothesize) led Google to postpone the update. What about August, when they finally rolled it out?

CrUX data availability increased from 29% to 38% between May and August 2021.

The rate of URLs with CrUX data passing all three CWV thresholds increased from 30.75% to 36.3% between May and August 2021.

So, the new multiplication (36.3% of 38%) leaves us at 14% - a marked increase over the previous 9%. Partly driven by Google collecting more data, partly by websites getting their stuff together. Presumably this trend will only increase, and Google will be able to turn up the dial on Core Web Vitals as a ranking factor, right?

More on that in parts 2 and 3 :)

In the meantime, if you're curious about where you stand for your site's CWV thresholds, Moz has a tool for it currently in beta with the official launch coming in mid-to-late October. 

Sign up for Moz Pro to access the beta!

Already a Moz Pro customer? Log in to access the beta!

Appendix

And if you really want to nerd out, see how you score against the industry at large on these distribution charts from the August data:

Monday, October 4, 2021

What if the Competition Is Wrong? How to Avoid the Pitfalls of Competitive Content Research

Competitive research is a common and necessary task in any marketing landscape. This practice is particularly crucial in digital marketing because the ecosystem rapidly changes and brands constantly battle against each other for users across multiple platforms.

In the ideal scenario, performing competitive content research illuminates where your brand’s online content falters compared to competitors. With this information, you can solder the frail links in your marketing strategy and try to usurp the competition with superior content. The results should improve your brand’s content authority, keyword rankings, and organic share of voice.

However, competitive research rarely offers cut-and-dry wins. Your best practice acumen must be strong enough to scrounge for insights among multiple sources with varying content quality. You need to understand what content matters and what’s fluff. And ultimately, you have to know why some choices are more valuable and useful than others.

All of these factors make competitive research tricky. Because if you don’t discern the best content decisions, then you’re going to step into a pitfall trap and end up in a worse spot than before the research — especially if you emulate competitors whose approach to content is wrong, inadequate, or a bad fit for your ideal users.

To prevent turning your brand into a cautionary tale, you need to carefully choose what competitors you research, locate relevant pain points, and determine how effective their marketing strategy is.

Identifying competitors: Avoid the narrow path

When we think of competitors, we often think about direct competitors — the brands that offer similar products or solutions and vie for the same users online and in brick-and-mortar stores, such as Patagonia versus Prana.

Evaluating direct competitors’ content is a great place to start competitive research, but this narrow view is only half of the digital marketing equation. You need to widen your path and analyze how your content stacks up against SERP competitors, too. This panoramic view is even more important for small businesses that compete with national chains, like a local independent bookstore versus Barnes & Noble.

Unfortunately, many companies overlook the value of analyzing SERP rankings and organic share of voice for their vertical. Sometimes, this choice is because a brand doesn't directly compete with the websites that rank in the top positions. In other scenarios, a company won’t have the resources to tackle both segments at once and must focus on either direct competition or SERP rankings.

Regardless of the situation, excluding researching SERP competitors in favor of your direct competition is an enormous mistake.

For example, let’s say you’re shopping for rock climbing pants and are indifferent to the brand you buy. Patagonia and Prana both sell climbing pants that you can purchase directly from their website and both brands rank for “rock climbing pants” on the first page. However, neither brand breaks above the fold with its rankings. Patagonia ranks in position seven and Prana is in position eight.

The top organic position is owned by a niche climbing website with a review of different climbing pants. This website has a domain authority of 50, while Prana and Pataongia have domain authorities of 73 and 85, respectively.

The user’s search intent is the same for every result on the first page: buying climbing pants. However, in this example, apparently neither Prana nor Patagonia focused on indirect SERP competitors. If they had, they’d recognize that brand-agnostic users, such as people who use generic search terms, often buy products based on reviews and recommendations.

Google recognizes this user desire, which is why the term is increasingly ranking best-of lists higher than product pages.

Given the domain rating of both companies and their expansive resources compared to a small, niche website, if either brand used their influencers to create unbiased review-focused content for the “rock climbing pants” keyword, they’d likely clinch the number one ranking with relative ease.

Instead, these companies are relegated down the page and must use paid advertising to compete for users’ attention.

Ultimately, accurate content analysis comes from gleaning insights from both SERP and direct competitors.

For example, let’s say you operate a B2B contact center software company for small businesses and want to rank for the ambitious term, “contact center software.” You have three direct competitors with a similar domain ranking and each of them rank somewhere on the first page. The other rankings are dominated by “best software” listicles.

This split search intent creates a delicate ranking environment and fierce competition. To have any chance of ranking on the first page, you’ll need to carefully pick-and-pull the best content aspects of both the listicles and direct competitors. And that requires knowing how to identify the right competitor to review.

How to choose competitors to review

Instead of getting sucked into the trap of balancing the analysis of SERP and direct competitors, focus on competitors who are trying to achieve the same goal and that you have an honest chance of dethroning.

If you want to improve your website’s content, any competitor you research should meet the following criteria:

  • The brand’s services and content are relevant and target your ideal user group

  • The brand follows content strategy and SEO best practices or is innovating effective alternatives

  • The brand ranks well on SERPs for your target keywords
    • The content this brand has ranking is relevant toward your brand’s users and business goals

    • Your brand’s domain rating and page authority are reasonably competitive, so changes have the potential to spur keyword growth

  • You have the resources to directly compete with the brand’s online authority and presence

There are always exceptions to these rules, such as brands that don’t need a robust online presence because they rely on third-party contracts and word-of-mouth to survive, like government contractors. However, for the average B2B and B2C company, choosing competitors with these guidelines in mind will keep your attention focused on worthy competition and not riff-raff.

Identifying pain points

Once you know who your competitors are, you need to know what content to analyze and how to determine why their version is superior to yours. These choices come down to knowing your brand’s pain points.

Not understanding or researching your own pain points before delving into competitive research is a huge mistake. Pain points allow you to focus your competitive analysis. Without knowing what you want to fix, you’re aiming in the dark when you research the competitor’s content. Without light to guide you, it’s extremely easy to emulate ideas you shouldn’t or try to compete with a website that’s incompatible with your goals or organic authority.

What pain points should you focus on?

Ultimately, your business goals and content KPIs should determine what pain points you focus on. Let the slipping conversions, plummeting newsletter sign-ups, or poor website performance metrics guide your path.

Let’s say you run a documentary streaming service and are struggling to get users to sign up for a trial after reading relevant blog posts or research papers. You know one of your competitors doesn't have this churn, so you plan to read their related content and see how the experience is superior.

Before you can dive into the competitor’s service and learn why they earn trialists, you need to know why your users refuse to join.

In this scenario, your best option will be user research, such as:

  • User interviews

  • A/B tests

  • Surveys

  • Usability tests

  • Heatmap tracking

  • Net promoter score analysis

Once you determine why your brand is failing, then you can critically consider how your opponent solves the issues users have with your brand’s service.

The trick to knowing if a competitor’s pain point solution will work for your brand is understanding why it works for the competitor. There are plenty of ways to gain this knowledge, including best practice awareness, running the competition’s idea through a user research gauntlet, and comparing the options side-by-side.

These insights all rely on one common theme: the competition is following best practices and doing everything correct. However, competitors are fallible and often don’t offer users an ideal experience or perfect content. So what happens when the competition is wrong?

What if the competition is wrong?

Even if a competitor passes your initial screening and seems like a great brand from which to discover your weaknesses, first impressions can be deceiving.

There are plenty of mischievous marketing practices businesses can participate in that you wouldn’t notice at first glance, such as black-hat link building or paying users for positive reviews. And there are many innocent mistakes that your competitors may make that will harm your website if you implement them, like lackluster accessibility standards.

The amount of due diligence you perform should correlate with the amount of risk you undertake to emulate an idea or strategy.

For low-risk ideas, like rewriting a competitor’s blog post, the due diligence can be extremely simple, such as checking the post’s sources, keyword targets, and backlinks.

High-risk ideas, like overhauling your product pages or customer journey, need a more robust background check.

Here are a handful of red flags that should encourage you to avoid a competitor or at least do a deeper dive into their website:

  • Content automation (like scraper blogs) or similar signs of low-quality content

  • Link cloaking

  • Guest posting networks or other content sharing ecosystems

  • Link farms, private blog networks, or similar manipulation

  • Multiple domains with duplicate content

  • Paid user reviews or similar manipulation

  • Social media manipulation

  • Comment spam

  • Fraudulent cookies

  • Hidden text

How to spot when the competition is wrong

To prevent adopting erroneous high-risk ideas, you should always ask yourself the following four questions:

  1. Does the brand’s content adhere to content strategy, SEO, and UX best practices?

  2. Is the content meaningful, and how is its value communicated to users?

  3. Why do you think the brand created this content?

  4. If you implemented a similar (or the same) idea, how would your updated website and its content improve user experience?

These four questions act as a check-and-balance system for new ideas. They force you to consider the justifications of why a competitor made its decisions, how users may respond, and the consequences of copying those choices. Although this process isn’t necessary for every improvement you may glean from a competitor, it’s worthwhile when you’re considering significant changes that can swing KPIs toward success or failure.

Now, go avoid competitive research pitfalls

Competitive research is a necessary marketing strategy, and it’s immensely valuable if you take the time to ensure you’re evaluating a worthy competitor. While it’s easy to skimp on the background research and assume your competitors know what they’re doing, based on search rankings or public opinion, they may not be the skilled marketers you presume and you’ll end up wasting time, resources, and users on a faulty idea.

Here’s a quick reminder of what you should do to prepare yourself for competitive research and avoid implementing bad ideas:

  • Identify a mix of direct and SERP competitors that have relevant content, are trying to accomplish the same goal, and target the same users.

  • Determine your brand’s pain points and analyze how the competitors solve similar problems.

  • Do background research on your competitors and their content choices to ensure they follow content strategy, SEO, and UX best practices.

Friday, October 1, 2021

Intro to Python [Part 2]

Want to use Python, but don't know where to begin? Britney and Pumpkin are here in their second episode as co-hosts with more great tips on how to get started! 

Photo of the whiteboard outlining the basics of Python.
Click on the whiteboard image above to open a larger version in a new tab!

Video Transcription

Hi, Moz fans. Welcome to another edition of Whiteboard Friday. I'm your host Britney Muller. I was previously Moz's Senior SEO Scientist, and now I am freelance consulting and building some data science programs on the side.

This is my very special co-host, Pumpkin. You might remember her from the first Python episode. She's gotten quite a bit bigger. Quarantine was really good to her. She's very healthy and very sweet. I love her so, so much. This is my best buddy right here. 

So we have been hard at work preparing Python 2.0 for you all, and we're so excited to show you what we put together. So let's just get started. 

Why Python?

All right. So we kind of went over this in the first Python video, but just to recap.

On the first video I got to hold her in one hand. It's a bit harder now. That's actually why I'm wearing this. I thought I could maybe BabyBjörn you. Oh, she's fine. 

Okay, So just to recap, why Python? It's talked about so much in the SEO community. Why is this sort of the program that most people prefer? 

Simple syntax

So there's very simple syntax. It's sort of more common sense than other programming languages. It also uses a ton of white space. So you're going to see tabs and sort of white spaces instead of curly brackets like some other common programming languages.

It's concise

Did you have something to say? It's very concise. Often there are fewer lines of code to do one thing than there might be in another language, which is very, very nice. 

It's versatile

It's also very versatile. It works on many different platforms, and it can work in a different variety of ways. As far as procedural, you've probably heard of object-oriented and functional programming.

It kind of covers the gamut in that way, which is really great. You think so too? Pumpkin says she thinks so too, and it's awesome. 

Getting started

So let's get started. So whether you're on a Mac or a Windows, you can open up a terminal and Python should come with your Mac OS system.

There's 2.7 kind of natively installed, and we can just use that. Or go ahead and just open a Colab notebook. So this is a Google property we'll link to down below. You can create a new code cell. All I want you to do is simply type in print parentheses. 

Sorry, what are these? Help me. Parentheses and then quotes, sorry. We're quarantined. You know? It's just been us two. 

So print ("Hello World") and then Shift + Enter. Congratulations, you've just run Python.

So we're off to the races. You're all Python experts basically. 

Python fundamentals

Now, let's kind of cover some of the fundamentals. These are really important especially just to be aware of as you kind of continue exploring — oh, is she on my mic, sorry — as you continue exploring Python.

Basic syntax

So we're first just going to go over some of the basic syntax, and there's obviously a lot more than just this, but some of the common things. 

1. Variables

Variables are super, super important in Python. So this is where you just assign values to words or whatever variables you're working with. This is sort of a silly tax price example here, where we assign a numerical value to tax and we do the same for price.

Pumpkin is showing you. She's very excited about this example. You simply run this within Python, and you will get your price plus the tax that we have stated here. So it's kind of a cool application just to quickly get a feel of how variables work and how when you're dealing with numerical variables, you can do a variety of calculations.

So that's a super powerful thing within Python and really fun to play around with.

2. Comments

Second big, big important syntax is comments. So if you have something to say, like Pumpkin here, you have to put kind of a hash and then write your comment after that.

Commonly people will use these to explain the code that's after the comment. So you can kind of explain what you were trying to do there. It's also very useful if you want to comment out code. So I use this all the time when I'm kind of fumbling and trying to do different things within a Colab notebook and it's not working.

I will just comment out different things and try different ways, and oftentimes that helps me kind of find solutions quickly. 

3. Data types

The next and perhaps the most powerful thing, especially if you want to start using Python for data analysis, so let's say you want to start pulling in Google Search Console data or Google Analytics, so, so important to be aware of the different data types.

So if you're pulling in text, like keywords from Search Console, it should be picked up in Python as string (str). Sometimes this gets screwed up when you import data. So it's really important to have the proper data types assigned to your different types of data so that you can perform correct calculations.

So for numeric values, you have integer or just int, float, and complex. If your numbers aren't in these data types, you won't be able to run different calculations on them. So again, just to be aware that these exist and really the gist of it is you basically just want your data to be reflective of the proper Python data types.

So sequence is listed as those three — list, tuple, and range. Mapping is really common if you're using dictionary type things within different programs. Then, of course, our most common, Boolean, which is true or false, is just bool. 

So is Pumpkin a big, happy girl? True. She's actually a boy. That's a long story. But you can call her whatever. She's having so much fun, and she's so happy to be here. 

4. If...else

Lastly, the if...else statement. So there's a number of different statements that you can use.

But arguably one of the more common is if else. So just a really silly example, let's say you have Website A ranking 13 for a keyword and Website B ranking 28. You can say print ("A") if A < B else print ("B"). So this is just a really silly, quick example to kind of show you how you can use this.

But once you get into that, you get into elif and loops, and it gets really, really fun and exciting. 

Conclusion

So hopefully, you start to play around with some of these and stay tuned for when we apply this to Google Search Console data. So thank you so much for kind of checking out this Python 2.0 basics.

My co-host is hiding behind my back right now, but she is really grateful that you all came to check out the second video of Python. So thank you, guys, so much and Pump and I will see you guys soon. Bye.

Video transcription by Speechpad.com

Wednesday, September 29, 2021

Reveal Your Rivals with True Competitor (Beta)

One of the biggest challenges in SEO is trying to convince your client or boss that the competition they face online may not match their legacy competitors and personal grudges. Big Earl across the street at Big Earl’s Widgets may be irritating and, sure, maybe he does have a “stupid, smug face,” but that doesn’t change the fact that WidgetShack.com is eating your lunch (and let’s not even talk about Amazon).

To make matters worse, competitive analysis is time-consuming and tedious work, even if you do have access to the data. Today, after years of rethinking how competitive analysis should work (and, honestly, re-rethinking it on many occasions), I’m proud to announce the first step in expanding Moz’s competitive analysis toolkit — True Competitor.

Try True Competitor

What is True Competitor?

Before I dive into the details, let’s take it out for a spin. Just enter your domain or subdomain and your locale (the beta supports English-language markets in the United States, Great Britain, Australia, and Canada):

Then let the tool do its work. You’ll get back something like this:

True Competitor pulls ranking keywords (by highest-volume) for any domain in our Keyword Explorer database — even your competitors’ and prospects’ domains — and analyzes recent Google SERPs to find out who you’re truly competing against.

What are Overlap and Rivalry?

Hopefully, you’re already familiar with our proprietary Domain Authority (DA) metric, but Overlap and Rivalry are new to True Competitor. Overlap is simple — it’s the percentage of shared keywords where the target site and the competitor both ranked in the top 10 traditional organic results. This is essentially a Share of Voice (SoV) metric. It’s a good first stop, and you can sort by DA or Overlap for multiple views of the data — but what if the keywords you overlap on aren’t particularly relevant, or a competitor is just too far out of reach?

That’s where Rivalry comes in. Rivalry factors in the Click-Thru Rate (CTR) and volume of overlapping keywords, the target site’s ranking (keywords where the target ranks higher are more likely to be relevant), and the proximity of the two sites’ DA scores to help you sort which competitors are the most relevant and realistic.

What can you do with this data?

Hopefully, you can use True Competitor to validate your own assumptions, challenge bad assumptions, and learn about competitors you might not have considered. That’s not all, though — select up to two competitors for in-depth information:

Just click on [ + Analyze Competitors ] and your selections will be auto-filled in our Keyword Overlap tool in Keyword Explorer. Here, you can dive deeper into your keyword overlap and find specific keywords to target with your SEO efforts:

We’re currently working on new ways to analyze this data and help you surface the most relevant keyword and content overlaps. We hope to have more to announce in Q4.

This list doesn’t match my list!

GOOD. Sorry, that’s a little flippant. Ultimately, we hope there’s something new and unexpected in this data. Otherwise, what’s the point? The goal of True Competitor is to help you see who you’re really up against in Google rankings. How you use that information is up to you.

I’d like to challenge you, dear reader, on one point. We have a bad habit of thinking of the “competition” as a single, small set of sites or companies. In the example above, I chose to explore SEMrush and Ahrefs, because they’re our most relevant product competitors. Consider if I had taken a different route:

Looking at our SEO news competitors paints a different but also very useful picture, especially for our content team and writers. We also have multiple Google subdomains showing up in our Top 25 — some Google products (like Google Search Console) are competitors, and some (like Google Analytics) are simply of interest to our readership and topics that we cover.

My challenge to you is to really think about these different spheres of competition and move beyond a singular window of what “competitor” means. You may not target all of these competitors or even care about them all on any given day, and that’s fine, but each window is an area that might uniquely inform your SEO and content strategies.

As a Subject Matter Expert at Moz, I have the privilege of working on multiple parts of our product, but this project is something I’ve been thinking about for a long time and is near and dear to me. I’d like to personally thank our Product team — Igor, Hayley, and Darian — for all of their hard work, leadership, and pushback to make this product better. Many thanks also to our App Front-end Engineering team, and a special shout-out to Maura and Grant for helping port the original prototype into an actual product.

Get started with True Competitor

True Competitor is currently available in beta for all Moz Pro customers and community accounts.

Try True Competitor

We welcome your feedback — please click on the [Make a Suggestion] button in the upper-right of the True Competitor home-page if you have any specific comments or concerns.

Monday, September 27, 2021

Google Local Filler Content Isn't Good UX, and Needs Revisions

Did you ever turn in a school paper full of vague ramblings, hoping your teacher wouldn’t notice that you’d failed to read the assigned book?

I admit, I once helped my little sister fulfill a required word count with analogies about “waves crashing against the rocks of adversity” when she, for some reason, overlooked reading The Communist Manifesto in high school. She got an A on her paper, but that isn’t the mark I’d give Google when there isn’t enough content to legitimately fill them local packs, Local Finders, and Maps.

The presence of irrelevant listings in response to important local queries:

  • Makes it unnecessarily difficult for searchers to find what they need

  • Makes it harder for relevant businesses to compete

  • Creates a false impression of bountiful local choice of resources, resulting in disappointing UX

Today, we’ll look at some original data in an attempt to quantify the extent of this problem, and explore what Google and local businesses can do about it.

What’s meant by “local filler” content and why is it such a problem?

The above screenshot captures the local pack results for a very specific search for a gastroenterologist in Angels Camp, California. In its effort to show me a pack, Google has scrambled together results that are two-thirds irrelevant to the full intent of my query, since I am not looking for either an eye care center or a pediatrician. The third result is better, even though Google had to travel about 15 miles from my specified search city to get it, because Dr. Eddi is, at least, a gastroenterologist.

It’s rather frustrating to see Google allowing the one accurate specialist to be outranked by two random local medical entities, perhaps simply because they are closer to home. It obviously won’t do to have an optometrist or children’s doctor consult with me on digestive health, and unfortunately, the situation becomes even odder when we click through to the local finder:

Of the twenty results Google has pulled together to make up the first page of the local finder, only two are actually gastroenterologists, lost in the weeds of podiatrists, orthopedic surgeons, general MDs, and a few clinics with no clarity as to whether their presence in the results relates to having a digestive health specialists on staff . Zero of the listed gastroenterologists are in the town I’ve specified. The relevance ratio is quite poor for the user and shapes a daunting environment for appropriate practitioners who need to be found in all this mess.

You may have read me writing before about local SEO seeking to build the online mirror of real-world communities. That’s the ideal: ensuring that towns and cities have an excellent digital reference guide to the local resources available to them. Yet when I fact-checked with the real world (calling medical practices around this particular town), I found that there actually are no gastroenterologists in Angels Camp, even though Google’s results might make it look like there must be. What I heard from locals is that you must either take a 25 minute drive to Sonora to see a GI doctor, or head west for an hour and fifteen minutes to Modesto for appropriate care.

Google has yoked itself to AI, but the present state of search leaves it up to my human intelligence to realize that the SERPs are making empty promises, and that there are, in fact, no GI docs in Angels Camp. This is what a neighbor, primary care doctor, or local business association would tell me if I was considering moving to this community and needed to be close to specialists. But Google tells me that there are more than 23 million organic choices relevant to my requirements, and scores of local business listings that so closely match my intent, they deserve pride of place in 3-packs, Finders and Maps.

The most material end result for the Google user is that they will likely experience unnecessary fatigue wasting time on the phone calling irrelevant doctors at a moment when they are in serious need of help from an appropriate professional. As a local SEO, I’m conditioned to look at local business categories and can weed out useless content almost automatically because of this, but is the average searcher noticing the truncated “eye care cent…” on the above listing? They’re almost certainly not using a Chrome extension like GMB Spy to see all the possible listing categories since Google decided to hide them years ago.

On a more philosophical note, my concern with local SERPs made up of irrelevant filler content is that they create a false picture of local bounty. As I recently mentioned to Marie Haynes:

The work of local businesses (and local SEOs!) derives its deepest meaning from providing and promoting essential local resources. Google’s inaccurate depiction of abundance could, even if in a small way, contribute to public apathy. The truth is that the US is facing a severe shortage of doctors, and anything that doesn’t reflect this reality could, potentially, undermine public action on issues like why our country, unlike the majority of nations, doesn’t make higher education free or affordable so that young people can become the medical professionals and other essential services providers we unquestionably need to be a functional society. Public well-being depends on complete accuracy in such matters.

As a local SEO, I want a truthful depiction of how well-resourced each community really is on the map, as a component of societal thought and decision-making. We’re all coping with public health and environmental emergencies now and know in our bones how vital essential local services have become.

Just how big is the problem of local filler content?

If the SERPs were more like humans, my query for “gastroenterologist Angels Camp” would return something like a featured snippet stating, “Sorry, our index indicates there are no GI Docs in Angels Camp. You’ll need to look in Sonora or Modesto for nearest options.” It definitely wouldn’t create the present scenario of, “Bad digestive system? See an eye doctor!” that’s being implied by the current results. I wanted to learn just how big this problem has become for Google.

I looked at the local packs in 25 towns and cities across California of widely varying populations using the search phrase “gastroenterologist” and each of the localities. I noted how many of the results returned were within the city specified in my search and how many used “gastroenterologist” as their primary category. I even gave Google an advantage in this test by allowing entries that didn’t use gastroenterologist as their primary category but that did have some version of that word in their business title (making the specialty clearer to the user) to be included in Google’s wins column. Of the 150 total data points I checked, here is what I found:

42% of the content Google presented in local packs had no obvious connection to gastroenterology. It’s a shocking number, honestly. Imagine the number of wearying, irrelevant calls patients may be making seeking digestive health consultation if nearly half of the practices listed are not in this field of medicine.

A pattern I noticed in my small sample set is that larger cities had the most relevant results. Smaller towns and rural areas had much poorer relevance ratios. Meanwhile, Google is more accurate as to returning results within the query’s city, as shown by these numbers:

The trouble is, what looks like more of a win for Google here doesn’t actually chalk up as a win for searchers. In my data set, where Google was accurate in showing results from my specified city, the entities were often simply not GI doctors. There were instances in which all 3 results got the city right, but zero of the results got the specialty right. In fact, in one very bizarre case, Google showed me this:

Welders aside, it’s important to remember that our initial Angels Camp example demonstrated how the searcher, encountering a pack with filler listings in it and drilling down further into the Local Finder results for help may actually end up with even less relevance. Instead of two-out-of-three local pack entries being useless to them, they may end up with two-out-of-twenty unhelpful listings, with relevance consigned to obscurity.

And, of course, filler listings aren’t confined to medical categories. I engaged in this little survey because I’d noticed how often, in category after category, the user experience is less-than-ideal.

What should Google do to lessen the poor UX of irrelevant listings?

Remember that we’re not talking about spam here. That’s a completely different headache in Googleland. I saw no instances of spam in my data. The welder was not trying to pass himself off as a doctor. Rather, what we have here appears to be a case of Google weighting location keywords over goods/services keywords, even when it makes no sense to do so.

Google needs to develop logic that excludes extremely irrelevant listings for specific head terms to improve UX. How might this logic work?

1. Google could rely more on their own categories. Going back to our original example in which an eye care center is the #1 ranked result for “gastroenterologist angels camp”, we can use GMB Spy to check if any of the categories chosen by the business is “gastroenterologist”:

Google can, of course, see all the categories, and this lack of “gastroenterologist” among them should be a big “no” vote on showing the listing for our query.

2. Google could cross check the categories with the oft-disregarded business description:

Again, no mention of gastroenterological services there. Another “no” vote.

3. Google could run sentiment analysis on the reviews for an entity, checking to see if they contain the search phrase:

Lots of mentions of eye care here, but the body of reviews contains zero mentions of intestinal health. Another “no” vote.

4. Google could cross check the specified search phrases against all the knowledge they have from their crawls of the entity’s website:

This activity should confirm that there is no on-site reference to Dr. Haymond being anything other than an ophthalmologist . Then Google would need to make a calculation to downgrade the significance of the location (Angels Camp) based on internal logic that specifies that a user looking for a gastroenterologist in a city would prefer to see gastroenterologists a bit farther away than seeing eye doctors (or welders) nearby. So, this would be another “no” vote for inclusion as a result for our query.

5. Finally, Google could cross reference this crawl of the website against their wider crawl of the web:

This should act as a good, final confirmation that Dr. Haymond is an eye doctor rather than a gastroenterologist, even if he is in our desired city, and give us a fifth “no” vote for bringing his listing up in response to our query.

The web is vast, and so is Google’s job, but I believe the key to resolving this particular type of filler content is for Google to rely more on the knowledge they have of an entity’s vertical and less on their knowledge of its location. A diner may be willing to swap out tacos for pizza if there’s a Mexican restaurant a block away but no pizzerias in town, but in these YMYL categories, the same logic should not apply.

It’s not uncommon for Google to exclude local results from appearing at all when their existing logic tells them there isn’t a good answer. It’s tempting to say that solving the filler content problem depends on Google expanding the number of results for which they don’t show local listings. But, I don’t think this is a good solution, because the user then commonly sees irrelevant organic entries, instead of local ones. It seems to me that a better path is for Google to expand the radius of local SERPs for a greater number of queries so that a search like ours receives a map of the nearest gastroenterologists, with closer, superfluous businesses filtered out.

What should you do if a local business you’re promoting is getting lost amid filler listings?

SEO is going to be the short answer to this problem. It’s true that you can click the “send feedback” link at the bottom of the local finder, Google Maps or an organic SERP, and fill out form like this, with a screenshot:

However, my lone report of dissatisfaction with SERP quality is unlikely to get Google to change the results. Perhaps if they received multiple reports…

More practically-speaking, if a business you’re promoting is getting lost amid irrelevant listings, search engine optimization will be your strongest tool for convincing Google that you are, in fact, the better answer. In our study, we realized that there are, in fact, no GI docs in Angels Camp, and that the nearest one is about fifteen miles away. If you were in charge of marketing this particular specialist, you could consider:

1. Gaining a foothold in nearby towns and cities

Recommend that the doctor develop real-world relationships with neighboring towns from which he would like to receive more clients. Perhaps, for example, he has hospital privileges, or participates in clinics or seminars in these other locales.

2. Writing about locality relationships

Publish content on the website highlighting these relationships and activities to begin associating the client’s name with a wider radius of localities.

3. Expanding the linktation radius

Seek relevant links and unstructured citations from the neighboring cities and towns, on the basis of these relationships and participation in a variety of community activities.

4. Customizing review requests based on customers’ addresses

If you know your customers well, consider wording review requests to prompt them to mention why it’s worth it to them to travel from X location for goods/services (nota bene: medical professionals, of course, need to be highly conversant with HIPPA compliance when it comes to online reputation management).

5. Filling out your listings to the max

Definitely do give Google and other local business listing platforms the maximum amount of information about the business you’re marketing (Moz Local can help!) . Fill out all the fields and give a try to functions like Google Posts, product listings, and Q&A.

6. Sowing your seeds beyond the walled garden

Pursue an active social media, video, industry, local news, print, radio, and television presence to the extent that your time and budget allows. Google’s walled garden, as defined by my friend, Dr. Pete, is not the only place to build your brand. And, if my other pal, Cyrus Shepard, is right, anti-trust litigation could even bring us to a day when Google’s own ramparts become less impermeable. In the meantime, work at being found beyond Google while you continue to grapple with visibility within their environment.

Study habits

It’s one thing for a student to fudge a book report, but squeaking by can become a negative lifelong habit if it isn’t caught early. I’m sure any Google staffer taking the time to actually read through the local packs in my survey would agree that they don’t rate an A+.

I’ve been in local SEO long enough to remember when Google first created their local index with filler content pulled together from other sources, without business owners having any idea they were even being represented online, and these early study habits seem to have stuck with the company when it comes to internal decision making that ends up having huge real-world impacts. The recent title tag tweak that is rewriting erroneous titles for vaccine landing pages is a concerning example of this lack of foresight and meticulousness.

If I could create a syllabus for Google’s local department, it would begin with separating out categories of the greatest significance to human health and safety and putting them through a rigorous, permanent manual review process to ensure that results are as accurate as possible, and as free from spam, scams, and useless filler content as the reviewers can make them. Google has basically got all of the money and talent in the world to put towards quality, and ethics would suggest they are obliged to make the investment.

Society deserves accurate search results delivered by studious providers, and rural and urban areas are worthy of equal quality commitments and a more nuanced approach than one-size-fits all. Too often, in Local, Google is flunking for want of respecting real-world realities. Let’s hope they start applying themselves to the fullest of their potential.

Friday, September 24, 2021

Crawl Budget

In today’s episode of Whiteboard Friday, Tom covers a more advanced SEO concept: crawl budget. Google has a finite amount of time it's willing to spend crawling your site, so if you’re having issues with indexation, this is a topic you should care about.

Photo of the whiteboard describing crawl budget.
Click on the whiteboard image above to open a larger version in a new tab!

Video Transcription

Happy Friday, Moz fans, and today's topic is crawl budget. I think it's worth saying right off the bat that this is somewhat of a more advanced topic or one that applies primarily to larger websites. I think even if that's not you, there is still a lot you can learn from this in terms of SEO theory that comes about when you're looking at some of the tactics you might employ or some of the diagnostics you might employ for a crawl budget.

But in Google's own documentation they suggest that you should care about crawl budget if you have more than a million pages or more than 10,000 pages that are updated on a daily basis. I think those are obviously kind of hard or arbitrary thresholds. I would say that if you have issues with your site getting indexed and you have pages deep on your site that are just not getting into the index that you want to, or if you have issues with pages not getting indexed quickly enough, then in either of those cases crawl budget is an issue that you should care about.

What is crawl budget? 

Drawing of a spider holding a dollar bill.

So what actually is crawl budget? Crawl budget refers to the amount of time that Google is willing to spend crawling a given site. Although it seems like Google is sort of all-powerful, they have finite resources and the web is vast. So they have to prioritize somehow and allocate a certain amount of time or resource to crawl a given website.

Now they prioritize based on — or so they say they prioritize based on the popularity of sites with their users and based on the freshness of content, because Googlebot sort of has a thirst for new, never-before-seen URLs. 

We're not really going to talk in this video about how to increase your crawl budget. We're going to focus on how to make the best use of the crawl budget you have, which is generally an easier lever to pull in any case. 

Causes of crawl budget issues

So how do issues with crawl budget actually come about? 

Facets

Now I think the main sort of issues on sites that can lead to crawl budget problems are firstly facets.

So you can imagine on an e-comm site, imagine we've got a laptops page. We might be able to filter that by size. You have a 15-inch screen and 16 gigabytes of RAM. There might be a lot of different permutations there that could lead to a very large number of URLs when actually we've only got one page or one category as we think about it — the laptops page.

Similarly, those could then be reordered to create other URLs that do the exact same thing but have to be separately crawled. Similarly they might be sorted differently. There might be pagination and so on and so forth. So you could have one category page generating a vast number of URLs. 

Search results pages

A few other things that often come about are search results pages from an internal site search can often, especially if they're paginated, they can have a lot of different URLs generated.

Listings pages

Listings pages. If you allow users to upload their own listings or content, then that can over time build up to be an enormous number of URLs if you think about a job board or something like eBay and it probably has a huge number of pages. 

Fixing crawl budget issues

Chart of crawl budget issue solutions and whether they allow crawling, indexing, and PageRank.

So what are some of the tools that you can use to address these issues and to get the most out of your crawl budget?

So as a baseline, if we think about how a normal URL behaves with Googlebot, we say, yes, it can be crawled, yes, it can be indexed, and yes, it passes PageRank. So a URL like these, if I link to these somewhere on my site and then Google follows that link and indexes these pages, these probably still have the top nav and the site-wide navigation on them. So the link actually that's passed through to these pages will be sort of recycled round. There will be some losses due to dilution when we're linking through so many different pages and so many different filters. But ultimately, we are recycling this. There's no sort of black hole loss of leaky PageRank. 

Robots.txt

Now at the opposite extreme, the most extreme sort of solution to crawl budget you can employ is the robots.txt file.

So if you block a page in robots.txt, then it can't be crawled. So great, problem solved. Well, no, because there are some compromises here. Technically, sites and pages blocked in robots.txt can be indexed. You sometimes see sites showing up or pages showing up in the SERPs with this meta description cannot be shown because the page is blocked in robots.txt or this kind of message.

So technically, they can be indexed, but functionally they're not going to rank for anything or at least anything effective. So yeah, well, sort of technically. They do not pass PageRank. We're still passing PageRank through when we link into a page like this. But if it's then blocked in robots.txt, the PageRank goes no further.

So we've sort of created a leak and a black hole. So this is quite a heavy-handed solution, although it is easy to implement. 

Link-level nofollow

Link-level nofollow, so by this I mean if we took our links on the main laptops category page, that were pointing to these facets, and we put a nofollow attribute internally on those links, that would have some advantages and disadvantages.

I think a better use case for this would actually be more in the listings case. So imagine if we run a used car website, where we have millions of different used car individual sort of product listings. Now we don't really want Google to be wasting its time on these individual listings, depending on the scale of our site perhaps.

But occasionally a celebrity might upload their car or something like that, or a very rare car might be uploaded and that will start to get media links. So we don't want to block that page in robots.txt because that's external links that we would be squandering in that case. So what we might do is on our internal links to that page we might internally nofollow the link. So that would mean that it can be crawled, but only if it's found, only if Google finds it in some other way, so through an external link or something like that.

So we sort of have a halfway house here. Now technically nofollow these days is a hint. In my experience, Google will not crawl pages that are only linked to through an internal nofollow. If it finds the page in some other way, obviously it will still crawl it. But generally speaking, this can be effective as a way of restricting crawl budget or I should say more efficiently using crawl budget. The page can still be indexed.

That's what we were trying to achieve in that example. It can still pass PageRank. That's the other thing we were trying to achieve. Although you are still losing some PageRank through this nofollow link. That still counts as a link, and so you're losing some PageRank that would otherwise have been piped into that follow link. 

Noindex, nofollow

Noindex and nofollow, so this is obviously a very common solution for pages like these on ecomm sites.

Now, in this case, the page can be crawled. But once Google gets to that page, it will discover it's noindex, and it will crawl it much less over time because there is sort of less point in crawling a noindex page. So again, we have sort of a halfway house here.

Obviously, it can't be indexed. It's noindex. It doesn't pass PageRank outwards. PageRank is still passed into this page, but because it's got a nofollow in the head section, it doesn't pass PageRank outwards. This isn't a great solution. We've got some compromises that we've had to achieve here to economize on crawl budget.

Noindex, follow

So a lot of people used to think, oh, well, the solution to that would be to use a noindex follow as a sort of best of both. So you put a noindex follow tag in the head section of one of these pages, and oh, yeah, everyone is a winner because we still get the same sort of crawling benefit. We're still not indexing this sort of new duplicate page, which we don't want to index, but the PageRank solution is fixed.

Well, a few years ago, Google came out and said, "Oh, we didn't realize this ourselves, but actually as we crawl this page less and less over time, we will stop seeing the link and then it kind of won't count." So they sort of implied that this no longer worked as a way of still passing PageRank, and eventually it would come to be treated as noindex and nofollow. So again, we have a sort of slightly compromised solution there. 

Canonical

Now the true best of all worlds might then be canonical. With the canonical tag, it's still going to get crawled a bit less over time, the canonicalized version, great. It's still not going to be indexed, the canonicalized version, great, and it still passes PageRank.

So that seems great. That seems perfect in a lot of cases. But this only works if the pages are near enough duplicates that Google is willing to consider them a duplicate and respect the canonical. If they're not willing to consider them a duplicate, then you might have to go back to using the noindex. Or if you think actually there's no reason for this URL to even exist, I don't know how this wrong order combination came about, but it seems pretty pointless.

301

I'm not going to link to it anymore. But in case some people still find the URL somehow, we could use a 301 as a sort of economy that is going to perform pretty well eventually for... I'd say even better than canonical and noindex for saving crawl budget because Google doesn't even have to look at the page on the rare occasion it does check it because it just follows the 301.

It's going to solve our indexing issue, and it's going to pass PageRank. But obviously, the tradeoff here is users also can't access this URL, so we have to be okay with that. 

Implementing crawl budget tactics

So sort of rounding all this up, how would we actually employ these tactics? So what are the activities that I would recommend if you want to have a crawl budget project?

One of the less intuitive ones is speed. Like I said earlier, Google is sort of allocating an amount of time or amount of resource to crawl a given site. So if your site is very fast, if you have low server response times, if you have lightweight HTML, they will simply get through more pages in the same amount of time.

So this counterintuitively is a great way to approach this. Log analysis, this is sort of more traditional. Often it's quite unintuitive which pages on your site or which parameters are actually sapping all of your crawl budget. Log analysis on large sites often yields surprising results, so that's something you might consider. Then actually employing some of these tools.

So redundant URLs that we don't think users even need to look at, we can 301. Variants that users do need to look at, we could look at a canonical or a noindex tag. But we also might want to avoid linking to them in the first place so that we're not sort of losing some degree of PageRank into those canonicalized or noindex variants through dilution or through a dead end.

Robots.txt and nofollow, as I sort of implied as I was going through it, these are tactics that you would want to use very sparingly because they do create these PageRank dead ends. Then lastly, a sort of recent or more interesting tip that I got a while back from an Ollie H.G. Mason blog post, which I'll probably link to below, it turns out that if you have a sitemap on your site that you only use for fresh or recent URLs, your recently changed URLS, then because Googlebot has such a thirst, like I said, for fresh content, they will start crawling this sitemap very often. So you can sort of use this tactic to direct crawl budget towards the new URLs, which sort of everyone wins.

Googlebot only wants to see the fresh URLs. You perhaps only want Googlebot to see the fresh URLs. So if you have a sitemap that only serves that purpose, then everyone wins, and that can be quite a nice and sort of easy tip to implement. So that's all. I hope you found that useful. If not, feel free to let me know your tips or challenges on Twitter. I'm curious to see how other people approach this topic.

Video transcription by Speechpad.com

Thursday, September 23, 2021

A Statement of Land Acknowledgement, Published Today With Gratitude

From today, if you visit the Contact page of Moz.com to look up our office locations, you will see that we have included the following Statement of Land Acknowledgement with the permission of the Tribes, Nations, and Bands in whose homelands our teams live and work:

We at Moz acknowledge that our offices in Seattle and Vancouver exist in the traditional, ancestral, current, and unceded lands of Tribes, Nations, and Bands including the dxʷdəwʔabš (Duwamish), suq̀wabš/dxʷəq̓ʷabš (Suquamish), bəqəlšuł (Muckleshoot), sdukʷalbixʷ (Snoqualmie), dxʷlilap (Tulalip), Sḵwx̱wú7mesh (Squamish), Səl̓ílwətaʔ/Selilwitulh (Tsleil-Waututh), xʷməθkʷəy̓əm (Musqueam), and Stz’uminus Peoples. We respect their sovereignty, their right to self-determination, and their sacred connection to the land and water. We offer our thanks to the Peoples, the land, and the water.

We are deeply grateful to the many members of the Tribes, Nations, and Bands for the time they generously gave over the past year in consulting with us on the accuracy of this statement and in granting permission to publish it. Thank you.

What is a Statement of Land Acknowledgement?

A Statement of Land Acknowledgement is an oral or written act of honoring the Indigenous Peoples in whose homelands something is taking place.

In the words of the Duwamish Tribe:

It is important to note that this kind of acknowledgement is not a new practice developed by colonial institutions. Land acknowledgement is a traditional custom dating back centuries for many Native communities and nations. For non-Indigenous communities, land acknowledgement is a powerful way of showing respect and honoring the Indigenous Peoples of the land on which we work and live. Acknowledgement is a simple way of resisting the erasure of Indigenous histories and working towards honoring and inviting the truth.”

Why is Moz publishing this statement?

“There have always been Indigenous peoples in the spaces we call home, and there always will be. The acknowledgement process is about asking, What does it mean to live in a post-colonial world? What did it take for us to get here? And how can we be accountable to our part in history?” Kanyon Sayers-Roods (Mutsun Ohlone)

At Moz, our longtime TAGFEE code calls on our company to be transparent, and we consider it essential to speak openly about the factual history of the places we live and work. Colonization, genocide, broken treaties, theft of lands, federal failure to recognize legal status, erasure, and racism are all part of the past and present of the Pacific Northwest. We believe it’s the bare minimum requirement of all local people to speak about this candidly and with a determination to act from a place of truth.

At the same time, we hold in the highest possible regard the Tribes, Nations and Bands who continuously set examples of caring for the human community and for the lands and waters they have protected since time immemorial. We are grateful for this vital leadership on human rights, Climate Change, ethics, sustainability, and so many other foundational matters. In accordance with the Universal Declaration of the Rights of Mother Earth, we at Moz are also thanking the beautiful lands and waters, themselves.

Is your company also thinking of publishing a Statement of Land Acknowledgement?

We’d like to take this opportunity to invite all of our good peers and colleagues in the SEO and SaaS spaces to start team discussions about the importance of acknowledging and honoring the local Tribes, Nations, and Bands whose members are included on your staff or who are your nearest neighbors.

These were the main steps in our own journey:

  • We consulted the map at Native-Land.ca to form a first idea of the traditional homelands in which our offices are located. This map is a work in progress and is not a substitute for direct dialogue with Indigenous Peoples.

  • We visited the websites of each of the Tribes, Nations, and Bands we had seen on the maps, and read any statements they had published regarding Land Acknowledgement protocols. For example, this guide from the Duwamish Tribe was extremely helpful.

  • We searched for Indigenous-authored commentary on the process of Land Acknowledgement to help us become better-informed. Articles like this one taught us a great deal.

  • We looked at statements that had been published by other local businesses, organizations, and educational institutions. For example, the City of Vancouver made this motion, and a nearby YWCA had posted this page.

  • We spent some time learning the decolonized spellings of the names of each of the Tribes, Nations and Bands, where available, and watched their videos on YouTube to hear these names pronounced in hopes of making our address respectful.

  • We phoned or emailed the offices of each Tribe, Nation, or Band to inquire if, given our office locations, it would be appropriate to include them in an acknowledgement and if they had any specific requests as to how they would prefer inclusion. We were so appreciative of the kind responses we received, particularly given how difficult things have been during the pandemic.

It has been such an honor to spend time learning about this process, and we close with our grateful acknowledgement to each of the Tribes, Nations, and Bands for their permission, guidance, and presence.