Plumbing Contractor: September 2021

Wednesday, September 29, 2021

Reveal Your Rivals with True Competitor (Beta)

One of the biggest challenges in SEO is trying to convince your client or boss that the competition they face online may not match their legacy competitors and personal grudges. Big Earl across the street at Big Earl’s Widgets may be irritating and, sure, maybe he does have a “stupid, smug face,” but that doesn’t change the fact that WidgetShack.com is eating your lunch (and let’s not even talk about Amazon).

To make matters worse, competitive analysis is time-consuming and tedious work, even if you do have access to the data. Today, after years of rethinking how competitive analysis should work (and, honestly, re-rethinking it on many occasions), I’m proud to announce the first step in expanding Moz’s competitive analysis toolkit — True Competitor.

Try True Competitor

What is True Competitor?

Before I dive into the details, let’s take it out for a spin. Just enter your domain or subdomain and your locale (the beta supports English-language markets in the United States, Great Britain, Australia, and Canada):

Then let the tool do its work. You’ll get back something like this:

True Competitor pulls ranking keywords (by highest-volume) for any domain in our Keyword Explorer database — even your competitors’ and prospects’ domains — and analyzes recent Google SERPs to find out who you’re truly competing against.

What are Overlap and Rivalry?

Hopefully, you’re already familiar with our proprietary Domain Authority (DA) metric, but Overlap and Rivalry are new to True Competitor. Overlap is simple — it’s the percentage of shared keywords where the target site and the competitor both ranked in the top 10 traditional organic results. This is essentially a Share of Voice (SoV) metric. It’s a good first stop, and you can sort by DA or Overlap for multiple views of the data — but what if the keywords you overlap on aren’t particularly relevant, or a competitor is just too far out of reach?

That’s where Rivalry comes in. Rivalry factors in the Click-Thru Rate (CTR) and volume of overlapping keywords, the target site’s ranking (keywords where the target ranks higher are more likely to be relevant), and the proximity of the two sites’ DA scores to help you sort which competitors are the most relevant and realistic.

What can you do with this data?

Hopefully, you can use True Competitor to validate your own assumptions, challenge bad assumptions, and learn about competitors you might not have considered. That’s not all, though — select up to two competitors for in-depth information:

Just click on [ + Analyze Competitors ] and your selections will be auto-filled in our Keyword Overlap tool in Keyword Explorer. Here, you can dive deeper into your keyword overlap and find specific keywords to target with your SEO efforts:

We’re currently working on new ways to analyze this data and help you surface the most relevant keyword and content overlaps. We hope to have more to announce in Q4.

This list doesn’t match my list!

GOOD. Sorry, that’s a little flippant. Ultimately, we hope there’s something new and unexpected in this data. Otherwise, what’s the point? The goal of True Competitor is to help you see who you’re really up against in Google rankings. How you use that information is up to you.

I’d like to challenge you, dear reader, on one point. We have a bad habit of thinking of the “competition” as a single, small set of sites or companies. In the example above, I chose to explore SEMrush and Ahrefs, because they’re our most relevant product competitors. Consider if I had taken a different route:

Looking at our SEO news competitors paints a different but also very useful picture, especially for our content team and writers. We also have multiple Google subdomains showing up in our Top 25 — some Google products (like Google Search Console) are competitors, and some (like Google Analytics) are simply of interest to our readership and topics that we cover.

My challenge to you is to really think about these different spheres of competition and move beyond a singular window of what “competitor” means. You may not target all of these competitors or even care about them all on any given day, and that’s fine, but each window is an area that might uniquely inform your SEO and content strategies.

As a Subject Matter Expert at Moz, I have the privilege of working on multiple parts of our product, but this project is something I’ve been thinking about for a long time and is near and dear to me. I’d like to personally thank our Product team — Igor, Hayley, and Darian — for all of their hard work, leadership, and pushback to make this product better. Many thanks also to our App Front-end Engineering team, and a special shout-out to Maura and Grant for helping port the original prototype into an actual product.

Get started with True Competitor

True Competitor is currently available in beta for all Moz Pro customers and community accounts.

Try True Competitor

We welcome your feedback — please click on the [Make a Suggestion] button in the upper-right of the True Competitor home-page if you have any specific comments or concerns.

Monday, September 27, 2021

Google Local Filler Content Isn't Good UX, and Needs Revisions

Did you ever turn in a school paper full of vague ramblings, hoping your teacher wouldn’t notice that you’d failed to read the assigned book?

I admit, I once helped my little sister fulfill a required word count with analogies about “waves crashing against the rocks of adversity” when she, for some reason, overlooked reading The Communist Manifesto in high school. She got an A on her paper, but that isn’t the mark I’d give Google when there isn’t enough content to legitimately fill them local packs, Local Finders, and Maps.

The presence of irrelevant listings in response to important local queries:

Makes it unnecessarily difficult for searchers to find what they need
Makes it harder for relevant businesses to compete
Creates a false impression of bountiful local choice of resources, resulting in disappointing UX

Today, we’ll look at some original data in an attempt to quantify the extent of this problem, and explore what Google and local businesses can do about it.

What’s meant by “local filler” content and why is it such a problem?

The above screenshot captures the local pack results for a very specific search for a gastroenterologist in Angels Camp, California. In its effort to show me a pack, Google has scrambled together results that are two-thirds irrelevant to the full intent of my query, since I am not looking for either an eye care center or a pediatrician. The third result is better, even though Google had to travel about 15 miles from my specified search city to get it, because Dr. Eddi is, at least, a gastroenterologist.

It’s rather frustrating to see Google allowing the one accurate specialist to be outranked by two random local medical entities, perhaps simply because they are closer to home. It obviously won’t do to have an optometrist or children’s doctor consult with me on digestive health, and unfortunately, the situation becomes even odder when we click through to the local finder:

Of the twenty results Google has pulled together to make up the first page of the local finder, only two are actually gastroenterologists, lost in the weeds of podiatrists, orthopedic surgeons, general MDs, and a few clinics with no clarity as to whether their presence in the results relates to having a digestive health specialists on staff . Zero of the listed gastroenterologists are in the town I’ve specified. The relevance ratio is quite poor for the user and shapes a daunting environment for appropriate practitioners who need to be found in all this mess.

You may have read me writing before about local SEO seeking to build the online mirror of real-world communities. That’s the ideal: ensuring that towns and cities have an excellent digital reference guide to the local resources available to them. Yet when I fact-checked with the real world (calling medical practices around this particular town), I found that there actually are no gastroenterologists in Angels Camp, even though Google’s results might make it look like there must be. What I heard from locals is that you must either take a 25 minute drive to Sonora to see a GI doctor, or head west for an hour and fifteen minutes to Modesto for appropriate care.

Google has yoked itself to AI, but the present state of search leaves it up to my human intelligence to realize that the SERPs are making empty promises, and that there are, in fact, no GI docs in Angels Camp. This is what a neighbor, primary care doctor, or local business association would tell me if I was considering moving to this community and needed to be close to specialists. But Google tells me that there are more than 23 million organic choices relevant to my requirements, and scores of local business listings that so closely match my intent, they deserve pride of place in 3-packs, Finders and Maps.

The most material end result for the Google user is that they will likely experience unnecessary fatigue wasting time on the phone calling irrelevant doctors at a moment when they are in serious need of help from an appropriate professional. As a local SEO, I’m conditioned to look at local business categories and can weed out useless content almost automatically because of this, but is the average searcher noticing the truncated “eye care cent…” on the above listing? They’re almost certainly not using a Chrome extension like GMB Spy to see all the possible listing categories since Google decided to hide them years ago.

On a more philosophical note, my concern with local SERPs made up of irrelevant filler content is that they create a false picture of local bounty. As I recently mentioned to Marie Haynes:

The work of local businesses (and local SEOs!) derives its deepest meaning from providing and promoting essential local resources. Google’s inaccurate depiction of abundance could, even if in a small way, contribute to public apathy. The truth is that the US is facing a severe shortage of doctors, and anything that doesn’t reflect this reality could, potentially, undermine public action on issues like why our country, unlike the majority of nations, doesn’t make higher education free or affordable so that young people can become the medical professionals and other essential services providers we unquestionably need to be a functional society. Public well-being depends on complete accuracy in such matters.

As a local SEO, I want a truthful depiction of how well-resourced each community really is on the map, as a component of societal thought and decision-making. We’re all coping with public health and environmental emergencies now and know in our bones how vital essential local services have become.

Just how big is the problem of local filler content?

If the SERPs were more like humans, my query for “gastroenterologist Angels Camp” would return something like a featured snippet stating, “Sorry, our index indicates there are no GI Docs in Angels Camp. You’ll need to look in Sonora or Modesto for nearest options.” It definitely wouldn’t create the present scenario of, “Bad digestive system? See an eye doctor!” that’s being implied by the current results. I wanted to learn just how big this problem has become for Google.

I looked at the local packs in 25 towns and cities across California of widely varying populations using the search phrase “gastroenterologist” and each of the localities. I noted how many of the results returned were within the city specified in my search and how many used “gastroenterologist” as their primary category. I even gave Google an advantage in this test by allowing entries that didn’t use gastroenterologist as their primary category but that did have some version of that word in their business title (making the specialty clearer to the user) to be included in Google’s wins column. Of the 150 total data points I checked, here is what I found:

42% of the content Google presented in local packs had no obvious connection to gastroenterology. It’s a shocking number, honestly. Imagine the number of wearying, irrelevant calls patients may be making seeking digestive health consultation if nearly half of the practices listed are not in this field of medicine.

A pattern I noticed in my small sample set is that larger cities had the most relevant results. Smaller towns and rural areas had much poorer relevance ratios. Meanwhile, Google is more accurate as to returning results within the query’s city, as shown by these numbers:

The trouble is, what looks like more of a win for Google here doesn’t actually chalk up as a win for searchers. In my data set, where Google was accurate in showing results from my specified city, the entities were often simply not GI doctors. There were instances in which all 3 results got the city right, but zero of the results got the specialty right. In fact, in one very bizarre case, Google showed me this:

Welders aside, it’s important to remember that our initial Angels Camp example demonstrated how the searcher, encountering a pack with filler listings in it and drilling down further into the Local Finder results for help may actually end up with even less relevance. Instead of two-out-of-three local pack entries being useless to them, they may end up with two-out-of-twenty unhelpful listings, with relevance consigned to obscurity.

And, of course, filler listings aren’t confined to medical categories. I engaged in this little survey because I’d noticed how often, in category after category, the user experience is less-than-ideal.

What should Google do to lessen the poor UX of irrelevant listings?

Remember that we’re not talking about spam here. That’s a completely different headache in Googleland. I saw no instances of spam in my data. The welder was not trying to pass himself off as a doctor. Rather, what we have here appears to be a case of Google weighting location keywords over goods/services keywords, even when it makes no sense to do so.

Google needs to develop logic that excludes extremely irrelevant listings for specific head terms to improve UX. How might this logic work?

1. Google could rely more on their own categories. Going back to our original example in which an eye care center is the #1 ranked result for “gastroenterologist angels camp”, we can use GMB Spy to check if any of the categories chosen by the business is “gastroenterologist”:

Google can, of course, see all the categories, and this lack of “gastroenterologist” among them should be a big “no” vote on showing the listing for our query.

2. Google could cross check the categories with the oft-disregarded business description:

Again, no mention of gastroenterological services there. Another “no” vote.

3. Google could run sentiment analysis on the reviews for an entity, checking to see if they contain the search phrase:

Lots of mentions of eye care here, but the body of reviews contains zero mentions of intestinal health. Another “no” vote.

4. Google could cross check the specified search phrases against all the knowledge they have from their crawls of the entity’s website:

This activity should confirm that there is no on-site reference to Dr. Haymond being anything other than an ophthalmologist . Then Google would need to make a calculation to downgrade the significance of the location (Angels Camp) based on internal logic that specifies that a user looking for a gastroenterologist in a city would prefer to see gastroenterologists a bit farther away than seeing eye doctors (or welders) nearby. So, this would be another “no” vote for inclusion as a result for our query.

5. Finally, Google could cross reference this crawl of the website against their wider crawl of the web:

This should act as a good, final confirmation that Dr. Haymond is an eye doctor rather than a gastroenterologist, even if he is in our desired city, and give us a fifth “no” vote for bringing his listing up in response to our query.

The web is vast, and so is Google’s job, but I believe the key to resolving this particular type of filler content is for Google to rely more on the knowledge they have of an entity’s vertical and less on their knowledge of its location. A diner may be willing to swap out tacos for pizza if there’s a Mexican restaurant a block away but no pizzerias in town, but in these YMYL categories, the same logic should not apply.

It’s not uncommon for Google to exclude local results from appearing at all when their existing logic tells them there isn’t a good answer. It’s tempting to say that solving the filler content problem depends on Google expanding the number of results for which they don’t show local listings. But, I don’t think this is a good solution, because the user then commonly sees irrelevant organic entries, instead of local ones. It seems to me that a better path is for Google to expand the radius of local SERPs for a greater number of queries so that a search like ours receives a map of the nearest gastroenterologists, with closer, superfluous businesses filtered out.

What should you do if a local business you’re promoting is getting lost amid filler listings?

SEO is going to be the short answer to this problem. It’s true that you can click the “send feedback” link at the bottom of the local finder, Google Maps or an organic SERP, and fill out form like this, with a screenshot:

However, my lone report of dissatisfaction with SERP quality is unlikely to get Google to change the results. Perhaps if they received multiple reports…

More practically-speaking, if a business you’re promoting is getting lost amid irrelevant listings, search engine optimization will be your strongest tool for convincing Google that you are, in fact, the better answer. In our study, we realized that there are, in fact, no GI docs in Angels Camp, and that the nearest one is about fifteen miles away. If you were in charge of marketing this particular specialist, you could consider:

1. Gaining a foothold in nearby towns and cities

Recommend that the doctor develop real-world relationships with neighboring towns from which he would like to receive more clients. Perhaps, for example, he has hospital privileges, or participates in clinics or seminars in these other locales.

2. Writing about locality relationships

Publish content on the website highlighting these relationships and activities to begin associating the client’s name with a wider radius of localities.

3. Expanding the linktation radius

Seek relevant links and unstructured citations from the neighboring cities and towns, on the basis of these relationships and participation in a variety of community activities.

4. Customizing review requests based on customers’ addresses

If you know your customers well, consider wording review requests to prompt them to mention why it’s worth it to them to travel from X location for goods/services (nota bene: medical professionals, of course, need to be highly conversant with HIPPA compliance when it comes to online reputation management).

5. Filling out your listings to the max

Definitely do give Google and other local business listing platforms the maximum amount of information about the business you’re marketing (Moz Local can help!) . Fill out all the fields and give a try to functions like Google Posts, product listings, and Q&A.

6. Sowing your seeds beyond the walled garden

Pursue an active social media, video, industry, local news, print, radio, and television presence to the extent that your time and budget allows. Google’s walled garden, as defined by my friend, Dr. Pete, is not the only place to build your brand. And, if my other pal, Cyrus Shepard, is right, anti-trust litigation could even bring us to a day when Google’s own ramparts become less impermeable. In the meantime, work at being found beyond Google while you continue to grapple with visibility within their environment.

Study habits

It’s one thing for a student to fudge a book report, but squeaking by can become a negative lifelong habit if it isn’t caught early. I’m sure any Google staffer taking the time to actually read through the local packs in my survey would agree that they don’t rate an A+.

I’ve been in local SEO long enough to remember when Google first created their local index with filler content pulled together from other sources, without business owners having any idea they were even being represented online, and these early study habits seem to have stuck with the company when it comes to internal decision making that ends up having huge real-world impacts. The recent title tag tweak that is rewriting erroneous titles for vaccine landing pages is a concerning example of this lack of foresight and meticulousness.

If I could create a syllabus for Google’s local department, it would begin with separating out categories of the greatest significance to human health and safety and putting them through a rigorous, permanent manual review process to ensure that results are as accurate as possible, and as free from spam, scams, and useless filler content as the reviewers can make them. Google has basically got all of the money and talent in the world to put towards quality, and ethics would suggest they are obliged to make the investment.

Society deserves accurate search results delivered by studious providers, and rural and urban areas are worthy of equal quality commitments and a more nuanced approach than one-size-fits all. Too often, in Local, Google is flunking for want of respecting real-world realities. Let’s hope they start applying themselves to the fullest of their potential.

Friday, September 24, 2021

Crawl Budget

In today’s episode of Whiteboard Friday, Tom covers a more advanced SEO concept: crawl budget. Google has a finite amount of time it's willing to spend crawling your site, so if you’re having issues with indexation, this is a topic you should care about.

Video Transcription

Happy Friday, Moz fans, and today's topic is crawl budget. I think it's worth saying right off the bat that this is somewhat of a more advanced topic or one that applies primarily to larger websites. I think even if that's not you, there is still a lot you can learn from this in terms of SEO theory that comes about when you're looking at some of the tactics you might employ or some of the diagnostics you might employ for a crawl budget.

But in Google's own documentation they suggest that you should care about crawl budget if you have more than a million pages or more than 10,000 pages that are updated on a daily basis. I think those are obviously kind of hard or arbitrary thresholds. I would say that if you have issues with your site getting indexed and you have pages deep on your site that are just not getting into the index that you want to, or if you have issues with pages not getting indexed quickly enough, then in either of those cases crawl budget is an issue that you should care about.

What is crawl budget?

Drawing of a spider holding a dollar bill.

So what actually is crawl budget? Crawl budget refers to the amount of time that Google is willing to spend crawling a given site. Although it seems like Google is sort of all-powerful, they have finite resources and the web is vast. So they have to prioritize somehow and allocate a certain amount of time or resource to crawl a given website.

Now they prioritize based on — or so they say they prioritize based on the popularity of sites with their users and based on the freshness of content, because Googlebot sort of has a thirst for new, never-before-seen URLs.

We're not really going to talk in this video about how to increase your crawl budget. We're going to focus on how to make the best use of the crawl budget you have, which is generally an easier lever to pull in any case.

Causes of crawl budget issues

So how do issues with crawl budget actually come about?

Facets

Now I think the main sort of issues on sites that can lead to crawl budget problems are firstly facets.

So you can imagine on an e-comm site, imagine we've got a laptops page. We might be able to filter that by size. You have a 15-inch screen and 16 gigabytes of RAM. There might be a lot of different permutations there that could lead to a very large number of URLs when actually we've only got one page or one category as we think about it — the laptops page.

Similarly, those could then be reordered to create other URLs that do the exact same thing but have to be separately crawled. Similarly they might be sorted differently. There might be pagination and so on and so forth. So you could have one category page generating a vast number of URLs.

Search results pages

A few other things that often come about are search results pages from an internal site search can often, especially if they're paginated, they can have a lot of different URLs generated.

Listings pages

Listings pages. If you allow users to upload their own listings or content, then that can over time build up to be an enormous number of URLs if you think about a job board or something like eBay and it probably has a huge number of pages.

Fixing crawl budget issues

Chart of crawl budget issue solutions and whether they allow crawling, indexing, and PageRank.

So what are some of the tools that you can use to address these issues and to get the most out of your crawl budget?

So as a baseline, if we think about how a normal URL behaves with Googlebot, we say, yes, it can be crawled, yes, it can be indexed, and yes, it passes PageRank. So a URL like these, if I link to these somewhere on my site and then Google follows that link and indexes these pages, these probably still have the top nav and the site-wide navigation on them. So the link actually that's passed through to these pages will be sort of recycled round. There will be some losses due to dilution when we're linking through so many different pages and so many different filters. But ultimately, we are recycling this. There's no sort of black hole loss of leaky PageRank.

Robots.txt

Now at the opposite extreme, the most extreme sort of solution to crawl budget you can employ is the robots.txt file.

So if you block a page in robots.txt, then it can't be crawled. So great, problem solved. Well, no, because there are some compromises here. Technically, sites and pages blocked in robots.txt can be indexed. You sometimes see sites showing up or pages showing up in the SERPs with this meta description cannot be shown because the page is blocked in robots.txt or this kind of message.

So technically, they can be indexed, but functionally they're not going to rank for anything or at least anything effective. So yeah, well, sort of technically. They do not pass PageRank. We're still passing PageRank through when we link into a page like this. But if it's then blocked in robots.txt, the PageRank goes no further.

So we've sort of created a leak and a black hole. So this is quite a heavy-handed solution, although it is easy to implement.

Link-level nofollow

Link-level nofollow, so by this I mean if we took our links on the main laptops category page, that were pointing to these facets, and we put a nofollow attribute internally on those links, that would have some advantages and disadvantages.

I think a better use case for this would actually be more in the listings case. So imagine if we run a used car website, where we have millions of different used car individual sort of product listings. Now we don't really want Google to be wasting its time on these individual listings, depending on the scale of our site perhaps.

But occasionally a celebrity might upload their car or something like that, or a very rare car might be uploaded and that will start to get media links. So we don't want to block that page in robots.txt because that's external links that we would be squandering in that case. So what we might do is on our internal links to that page we might internally nofollow the link. So that would mean that it can be crawled, but only if it's found, only if Google finds it in some other way, so through an external link or something like that.

So we sort of have a halfway house here. Now technically nofollow these days is a hint. In my experience, Google will not crawl pages that are only linked to through an internal nofollow. If it finds the page in some other way, obviously it will still crawl it. But generally speaking, this can be effective as a way of restricting crawl budget or I should say more efficiently using crawl budget. The page can still be indexed.

That's what we were trying to achieve in that example. It can still pass PageRank. That's the other thing we were trying to achieve. Although you are still losing some PageRank through this nofollow link. That still counts as a link, and so you're losing some PageRank that would otherwise have been piped into that follow link.

Noindex, nofollow

Noindex and nofollow, so this is obviously a very common solution for pages like these on ecomm sites.

Now, in this case, the page can be crawled. But once Google gets to that page, it will discover it's noindex, and it will crawl it much less over time because there is sort of less point in crawling a noindex page. So again, we have sort of a halfway house here.

Obviously, it can't be indexed. It's noindex. It doesn't pass PageRank outwards. PageRank is still passed into this page, but because it's got a nofollow in the head section, it doesn't pass PageRank outwards. This isn't a great solution. We've got some compromises that we've had to achieve here to economize on crawl budget.

Noindex, follow

So a lot of people used to think, oh, well, the solution to that would be to use a noindex follow as a sort of best of both. So you put a noindex follow tag in the head section of one of these pages, and oh, yeah, everyone is a winner because we still get the same sort of crawling benefit. We're still not indexing this sort of new duplicate page, which we don't want to index, but the PageRank solution is fixed.

Well, a few years ago, Google came out and said, "Oh, we didn't realize this ourselves, but actually as we crawl this page less and less over time, we will stop seeing the link and then it kind of won't count." So they sort of implied that this no longer worked as a way of still passing PageRank, and eventually it would come to be treated as noindex and nofollow. So again, we have a sort of slightly compromised solution there.

Canonical

Now the true best of all worlds might then be canonical. With the canonical tag, it's still going to get crawled a bit less over time, the canonicalized version, great. It's still not going to be indexed, the canonicalized version, great, and it still passes PageRank.

So that seems great. That seems perfect in a lot of cases. But this only works if the pages are near enough duplicates that Google is willing to consider them a duplicate and respect the canonical. If they're not willing to consider them a duplicate, then you might have to go back to using the noindex. Or if you think actually there's no reason for this URL to even exist, I don't know how this wrong order combination came about, but it seems pretty pointless.

301

I'm not going to link to it anymore. But in case some people still find the URL somehow, we could use a 301 as a sort of economy that is going to perform pretty well eventually for... I'd say even better than canonical and noindex for saving crawl budget because Google doesn't even have to look at the page on the rare occasion it does check it because it just follows the 301.

It's going to solve our indexing issue, and it's going to pass PageRank. But obviously, the tradeoff here is users also can't access this URL, so we have to be okay with that.

Implementing crawl budget tactics

So sort of rounding all this up, how would we actually employ these tactics? So what are the activities that I would recommend if you want to have a crawl budget project?

One of the less intuitive ones is speed. Like I said earlier, Google is sort of allocating an amount of time or amount of resource to crawl a given site. So if your site is very fast, if you have low server response times, if you have lightweight HTML, they will simply get through more pages in the same amount of time.

So this counterintuitively is a great way to approach this. Log analysis, this is sort of more traditional. Often it's quite unintuitive which pages on your site or which parameters are actually sapping all of your crawl budget. Log analysis on large sites often yields surprising results, so that's something you might consider. Then actually employing some of these tools.

So redundant URLs that we don't think users even need to look at, we can 301. Variants that users do need to look at, we could look at a canonical or a noindex tag. But we also might want to avoid linking to them in the first place so that we're not sort of losing some degree of PageRank into those canonicalized or noindex variants through dilution or through a dead end.

Robots.txt and nofollow, as I sort of implied as I was going through it, these are tactics that you would want to use very sparingly because they do create these PageRank dead ends. Then lastly, a sort of recent or more interesting tip that I got a while back from an Ollie H.G. Mason blog post, which I'll probably link to below, it turns out that if you have a sitemap on your site that you only use for fresh or recent URLs, your recently changed URLS, then because Googlebot has such a thirst, like I said, for fresh content, they will start crawling this sitemap very often. So you can sort of use this tactic to direct crawl budget towards the new URLs, which sort of everyone wins.

Googlebot only wants to see the fresh URLs. You perhaps only want Googlebot to see the fresh URLs. So if you have a sitemap that only serves that purpose, then everyone wins, and that can be quite a nice and sort of easy tip to implement. So that's all. I hope you found that useful. If not, feel free to let me know your tips or challenges on Twitter. I'm curious to see how other people approach this topic.

Video transcription by Speechpad.com.

Thursday, September 23, 2021

A Statement of Land Acknowledgement, Published Today With Gratitude

From today, if you visit the Contact page of Moz.com to look up our office locations, you will see that we have included the following Statement of Land Acknowledgement with the permission of the Tribes, Nations, and Bands in whose homelands our teams live and work:

We at Moz acknowledge that our offices in Seattle and Vancouver exist in the traditional, ancestral, current, and unceded lands of Tribes, Nations, and Bands including the dxʷdəwʔabš (Duwamish), suq̀wabš/dxʷəq̓ʷabš (Suquamish), bəqəlšuł (Muckleshoot), sdukʷalbixʷ (Snoqualmie), dxʷlilap (Tulalip), Sḵwx̱wú7mesh (Squamish), Səl̓ílwətaʔ/Selilwitulh (Tsleil-Waututh), xʷməθkʷəy̓əm (Musqueam), and Stz’uminus Peoples. We respect their sovereignty, their right to self-determination, and their sacred connection to the land and water. We offer our thanks to the Peoples, the land, and the water.

We are deeply grateful to the many members of the Tribes, Nations, and Bands for the time they generously gave over the past year in consulting with us on the accuracy of this statement and in granting permission to publish it. Thank you.

What is a Statement of Land Acknowledgement?

A Statement of Land Acknowledgement is an oral or written act of honoring the Indigenous Peoples in whose homelands something is taking place.

In the words of the Duwamish Tribe:

“It is important to note that this kind of acknowledgement is not a new practice developed by colonial institutions. Land acknowledgement is a traditional custom dating back centuries for many Native communities and nations. For non-Indigenous communities, land acknowledgement is a powerful way of showing respect and honoring the Indigenous Peoples of the land on which we work and live. Acknowledgement is a simple way of resisting the erasure of Indigenous histories and working towards honoring and inviting the truth.”

Why is Moz publishing this statement?

“There have always been Indigenous peoples in the spaces we call home, and there always will be. The acknowledgement process is about asking, What does it mean to live in a post-colonial world? What did it take for us to get here? And how can we be accountable to our part in history?” Kanyon Sayers-Roods (Mutsun Ohlone)

At Moz, our longtime TAGFEE code calls on our company to be transparent, and we consider it essential to speak openly about the factual history of the places we live and work. Colonization, genocide, broken treaties, theft of lands, federal failure to recognize legal status, erasure, and racism are all part of the past and present of the Pacific Northwest. We believe it’s the bare minimum requirement of all local people to speak about this candidly and with a determination to act from a place of truth.

At the same time, we hold in the highest possible regard the Tribes, Nations and Bands who continuously set examples of caring for the human community and for the lands and waters they have protected since time immemorial. We are grateful for this vital leadership on human rights, Climate Change, ethics, sustainability, and so many other foundational matters. In accordance with the Universal Declaration of the Rights of Mother Earth, we at Moz are also thanking the beautiful lands and waters, themselves.

Is your company also thinking of publishing a Statement of Land Acknowledgement?

We’d like to take this opportunity to invite all of our good peers and colleagues in the SEO and SaaS spaces to start team discussions about the importance of acknowledging and honoring the local Tribes, Nations, and Bands whose members are included on your staff or who are your nearest neighbors.

These were the main steps in our own journey:

We consulted the map at Native-Land.ca to form a first idea of the traditional homelands in which our offices are located. This map is a work in progress and is not a substitute for direct dialogue with Indigenous Peoples.
We visited the websites of each of the Tribes, Nations, and Bands we had seen on the maps, and read any statements they had published regarding Land Acknowledgement protocols. For example, this guide from the Duwamish Tribe was extremely helpful.
We searched for Indigenous-authored commentary on the process of Land Acknowledgement to help us become better-informed. Articles like this one taught us a great deal.
We looked at statements that had been published by other local businesses, organizations, and educational institutions. For example, the City of Vancouver made this motion, and a nearby YWCA had posted this page.
We spent some time learning the decolonized spellings of the names of each of the Tribes, Nations and Bands, where available, and watched their videos on YouTube to hear these names pronounced in hopes of making our address respectful.
We phoned or emailed the offices of each Tribe, Nation, or Band to inquire if, given our office locations, it would be appropriate to include them in an acknowledgement and if they had any specific requests as to how they would prefer inclusion. We were so appreciative of the kind responses we received, particularly given how difficult things have been during the pandemic.

It has been such an honor to spend time learning about this process, and we close with our grateful acknowledgement to each of the Tribes, Nations, and Bands for their permission, guidance, and presence.

Tuesday, September 21, 2021

The Ultimate Guide to Digital PR

Note: This article was written with the help of Tina Irizarry & RJ Wilson.

A fundamental truth of working in SEO is that link building has become more difficult. As more and more people have devised questionable methods of link building, Google has become much more strict about what constitutes a quality link.

Widget links? Try again.

Scholarship links? Considered “a tricky situation.”

Even the sacred guest post is now frowned upon.

So, how is a website supposed to build links?

While there are still many completely valid ways to build high-quality links, digital PR is beginning to stand out from the rest of the pack. In early 2021, John Mueller infamously praised the efforts of digital PR, which helped further propel its status as a viable form of link building.

Fortunately, our team at Go Fish Digital has a long history of running digital PR campaigns for our clients. Since we’ve developed some great internal processes over time, we wanted to share those with you today.

What is digital PR?

Digital PR is a marketing strategy that combines traditional PR media relations with digital channels such as SEO, social media, and influencer marketing. Digital PR allows brands to develop relationships with influential media outlets in order to earn editorial coverage, thus improving their website backlinks, brand exposure, SEO, and more.

As the world continues to move away from traditional print journalism, brands will need to adapt to develop relationships with influential entities that have large online followings. This is where digital PR comes into play.

The goals of digital PR can undoubtedly differ depending on your brand. Some companies might want to partner with Instagram accounts with large followings containing their target demographics to drive sales. Other companies might like to partner up to an influential entity to generate more brand awareness. However, one of the most common uses of digital PR is to build backlinks.

To obtain editorial backlinks, a lot needs to happen. You need to brainstorm different ideas that a journalist would be interested in covering. You then need to create content that's newsworthy and warrants the journalist's time and attention. Next, you need to research all of the potential journalists interested in covering your content and pitch them. Often, hundreds of pitches are required to place your content successfully.

While this seems like a lot, this is the reality of building backlinks in the present day. For this reason, we wanted to aggregate the various steps in this comprehensive guide to show you how to generate backlinks using digital PR tactics.

Types of digital PR campaigns

In our experience, campaigns that use data points generally work the best in terms of getting coverage. By being data-driven, we’re helping journalists create a story by uncovering fascinating, new data that their readers likely don’t know about.

In addition, taking a data-based approach is a repeatable model. While an organization could partake in some newsworthy event (charitable event, merger, company announcement) these stories are difficult to repeat month after month. However, there is a lot of untapped data out from which to create stories. This approach ensures that you can consistently generate newsworthy campaigns without relying on external, third-party events.

Of course, data campaigns aren’t the only way to generate digital PR and there are many other completely valid methods. However, we find using data tends to be the most consistent and reliable.

Existing data campaign

In an existing data campaign, you’ll want to identify some type of data source you can use to find interesting insights as the premise of your content you’ll be pitching to journalists.

For example, here’s a campaign we created for “The Best US Cities for Baby Boomers”. We gathered data on median home prices, jobs per 100,000 people, and the percentage of population that’s part of the Baby Boomer generation. This campaign ended up getting coverage from The New York Times, Yahoo!, and Reader’s Digest:

There are many instances of public data sources out there that are at your disposal. Want to do a piece of content on the best city to start a career in finance? Use data from LinkedIn Salary to find average salaries within each city. Creating a piece of content around Harry Potter houses? Use Google Trends to find what the most popular house is in every state. Need data on labor cost trends in the US? The Bureau of Labor Statistics and Census.gov have gold mines with potential data to use.

Once you find your data source, you’ll want to collect the raw data and then begin to analyze it for interesting insights that you can use in your campaign.

Survey campaign

Creating surveys is a fantastic way to get data that you can use in your content if you don’t have data readily available. You can do this by using tools such as Google Surveys and getting around 2,500 - 3,500 responses, creating a unique dataset.

For instance, let’s say you’ve decided that you want to create a piece of content around “Cord Cutting”. However, you don’t have any specific data sources that you can find interesting insights from. You could create a survey and ask people about their cord cutting tendencies.

In this survey, you could ask questions such as “How likely are you to cancel your cable subscription in the next two years?” and “At what cable price point would you consider cutting the cord?”.

After the survey is complete, you can analyze the data to find interesting trends. For the question “How likely are you to cancel your cable subscription in the next two years?”, you could analyze the age ranges that are most likely to cut the cord. Below you can see an example of the types of insights you might find from a survey:

From here you can start to draw interesting insights. Using the dummy data, we can see that the groups surveyed in the 65+ age range were the most likely to cut the cord. This would definitely be an interesting (although pretty unlikely) data point that we could use in our content and then pitch to journalists. Of course, you could also look at other demographic data such as gender and location.

When you’re choosing your questions to ask respondents, try to think ahead and ask questions that might yield interesting results. Try to avoid asking questions where the results will be too predictable, and thus not newsworthy. The goal of the survey should be to yield interesting points of data that you didn’t previously have access to.

Map campaign

As the title suggests, a map campaign is a type of digital PR campaign where you overlay your data insights on a map. For example, in this campaign for “The Most Googled Pie In Every State” from Prevention.com, you can see how they overlay a pie icon over every state in the United States.

The data for map campaigns can really be generated from either existing data or surveys. However, the reason we’re giving them their own special category is that they tend to perform really well. With map campaigns, the results are inherently localized to every state. For example, here’s data from a map campaign that we executed that resulted in backlinks from 112 referring domains (81 followed links):

We’ve found that there are many journalists that absolutely love to cover articles that apply to their specific area. This means that map campaigns give you data points where you can pivot the pitch angle for all 50 states (“Texas’s favorite type of pie is pecan”, “Tennessee's favorite type is chess pie”).

The digital PR process

1. Ideation phase

Now that you know the general types of campaigns, it’s time to start thinking about which one you’ll want to create. This starts with the ideation process.

Ideally, you’ll want to ideate anywhere from 3-5 different campaign options. By ideating multiple campaigns, you can then compare them against each other to determine which one will most likely generate the most coverage and backlinks for your brand. To give you more insights on how to ideate multiple topics, you can use the following rules:

Rule #1: Choose a topic tangentially related to your business

It’s important that the topic of your campaign is somehow related to your core business. For instance, it wouldn’t make sense for a gardening retailer to create a piece of content that talks about the most fashionable cities in America and pitch the story to Vogue. A digital PR campaign that covers the cities with the most urban gardeners and pitching to Apartment Therapy will result in much more targeted coverage.

Please note how we used the phrase “tangentially related”. The topic you choose doesn’t have to exactly match your core business. In the flower retailer example above, you might also consider other digital PR campaigns around other outdoor topics, as gardening generally ties into this concept. By not limiting yourself to only campaigns about your products, you’ll open up a large number of campaign possibilities and increase your chances of getting coverage.

Rule #2: Choose a newsworthy topic

Another key point is that in order for your digital PR campaign to be successful, you’ll need to ensure that whatever you create is going to be newsworthy in some way. If it’s not, journalists will have no incentive to cover it, as it won’t help their articles earn clicks and shares.

Choosing a topic that’s newsworthy can be difficult. However, there are strategies that you can employ to come up with campaign that will stand out in the news cycle:

Identify your “dream publications”. What types of topics do they tend to cover? For instance, if you identify Cosmopolitan as a dream publication, regularly check in with the site and make a note of the topics and types of content they publish.
Choose a campaign that will be topical by the time you’re pitching to journalists. For instance, if you’ll be pitching in March, doing a campaign around March Madness would make sense.
Use tools like BuzzSumo and search for keywords related to your industry. Review what types of content tend to get a lot of social shares and interaction.
Browse Reddit to find relevant subreddits related to your industry and see what types of content get the most upvotes.
If you have a team, brainstorm about potential topic ideas.

Rule #3: Check to see if similar campaigns have been done recently

Before moving ahead with a campaign, you’ll want to make sure that it hasn’t been done too recently. Nothing is worse than going through all of the steps to create and pitch a campaign just to find that journalists have already covered it.

When you come up with your idea, quickly perform a search to see if other similar content like yours exists. If it exists, research how recently it was done. If it was just in the past year and you plan on pitching the same journalists, you might want to choose another idea. It’s unlikely that the journalist will want to cover it again.

Rule #4: Rank your campaigns

Once you’ve brainstormed multiple ideas for digital PR campaigns that you want to move forward with, it can also be helpful to rank them. Not all campaigns will be created equal, and some might be naturally stronger than others. We find it helpful to give each campaign a 1-5 rating across five different criteria:

Backlink potential: How likely is this to produce backlinks?
Outreach diversity: How many different publications would be interested in covering this?
Outreach angles: How likely is it that we can find multiple different angles to pitch this from?
Subject topicality: How relevant is this campaign in today’s news cycle?
Audience size: How large is the size of the audience for our target publications

You can see an example of how we rate each content campaign below:

2. Publish the campaign: design & blog phase

Now comes the exciting part! You get to transform your campaign from a bunch of raw data in a spreadsheet to beautifully designed graphics that live on a page of your site.

The campaign should be added as a blog post to your site and consist of two components:

Custom graphics that highlight your data insights
Copy that provides more detail about the research and findings

We generally recommend creating 4-6 unique graphics for your most interesting data points. As an example, we did a digital PR campaign for “How Much For A Case Of Beer By State?”. You can see the graphic that we created for it here:

This graphic clearly illustrates the data in a way that’s easy for users to understand, and gives journalists an asset they can very easily use in their own coverage of the article.

For this part, you’ll likely need to work with a graphic designer. When working with graphic designers, we find it’s best to be as specific as possible when it comes to what the graphic should look like. That way, they don’t have to do any of the analysis themselves and can more quickly create what you’re looking for. If you don’t have a graphic designer on staff, you might consider trying to find a reliable freelancer on sites such as Upwork.

Next, you’ll need to create copy. This copy should explain the research method, how the data was collected, and provide further explanations of each graphic included on the page. This doesn’t need to be a huge long-form blog post, but an introduction, 1-2 paragraphs of copy for each graphic, and a conclusion should suffice.

After the graphics are created and the copy is written, you’ll need to find a place for your campaign to live. We generally recommend adding this to your site’s blog as it’s the most natural place for informational content.

3. Build an outreach list

After publishing your campaign, you’re then ready to start the pitching process by finding relevant journalists to pitch to.

The easiest way to do this is to use some type of media database. For journalist research, we rely heavily on Cision. Their powerful search functionality allows you to search for journalists and outlets across a lot of different categories such as name, subject, location, keyword and many more.

For instance, let’s say I’ve determined that I want something to be covered on Forbes.com. Using Cision, I can select “Outlet Name” and then search for Forbes.com. From there, I’ll get a big list of journalists that write for Forbes.

When clicking on each one, you can find contact information for each journalist along with a biography that can be useful in determining if they would be a good fit to cover your campaign:

You’ll want to send your campaign to as many qualified journalists as possible, so build an outreach list of hundreds of contacts that you can reach out to during the next phase of the campaign.

If you don’t have the budget for a media database, there are other (but slightly more difficult) options that can help you find journalist contact information. For example, if we wanted to find authors who write for Forbes, we could take to Twitter and perform a search for “contributor @Forbes”:

You can then use tools such as Hunter.io’s Bulk Email Finder to find the email information for some of the authors.

With Hunter.io, you’ll only pay based on the number of entries you run through the tool. This can be significantly cheaper than paying for a subscription to a media database. Of course, this process will be much more manual and time-consuming.

4. Pitching phase

The final part of any good digital PR campaign is the pitching process, where we take all of the journalist’s contact information we just collected and begin to reach out to them.

The golden rule of the pitch

One key thing to remember when pitching is that the average journalist receives many, many different pitches every day. Part of their job is to wade through all of them and make decisions about which ones will be the most successful. Because journalists are constantly bombarded with potential stories, you’ll want to follow the golden rule of pitching:

Do as much upfront work for the journalist as possible.

This means that your outreach needs to be short, direct, and easy to read. We’ve found that it’s really helpful to use bullet points to keep things succinct.

For instance, here’s what a pitch email could look like for a data study about dogs in apartments:

Email Subject Line: The Most Popular Dog Breeds In Every State

Hello [Contact Name],

My name is Chris Long and I’m with Go Fish Digital. Recently we performed a study on every state’s favorite breed of dog. Our study found some really interesting insights including:

The Golden Retriever was the most popular dog breed and was chosen as the favorite by 42% of respondents
People in Southern states were 38% more likely to choose larger dog breeds than those in Northern states.
Pugs were one of the least popular breeds and were only chosen as the favorite from 6% of respondents.

You can find our full study at the link here; [insert link].

We would love to see this covered on [outlet name]. Please let us know if you think this is something you would be interested in writing about.

This email is short, to the point, and quickly demonstrates our key findings.

For more great tips, check out Amanda Milligan’s Whiteboard Friday on the topic:

Using outreach software

While it’s completely fine to work out of a Gmail inbox for your pitching, outreach software can help you take things to the next level. Our team uses Yesware. Pitchbox is another widely-used option. Both help you organize your outreach, perform A/B tests, and get analytics on your outreach efforts.

For example, Yesware allows us to compare open rates of one email compared to another when pitching the same campaign. This way, we gain greater insights as to what subject lines are more likely to get traction with journalists.

Building relationships with journalists

This is where digital marketers need to truly think like someone in PR. One of the benefits of a traditional PR firm is the media relationships they’ve built over time. These relationships make it much easier for them to get media attention and coverage for their clients. Thus, you need to be making long-term efforts to build relationships with journalists.

As mentioned above, journalists are constantly bombarded with pitches and have many different options when choosing what stories they want to cover. Therefore, if they see an inbox full of pitches, they’re much more likely to cover one from a person that they’ve worked with before — and trust.

Relationships aren’t built overnight and you’ll need to build trust with the journalists by ensuring that your content is accurate, high-quality, and likely to be successful for them. While there is no single “hack” for developing genuine human relationships, here are some things that can help you along the way:

Try pitching the same journalists for multiple campaigns. This continued contact can help you establish a relationship with them. Just be sure to space them out.
Ensure you’re quick to respond. Journalists are often highly dependent on deadlines, and you need to be sure you’re helping them meet theirs.
Always be pleasant and cordial, even if they turn down your campaign. You never know if they’ll be interested in the next one.
Ensure that your data is accurate. If a journalist discovers that you provided inaccurate information, this will certainly hurt your chances of an ongoing partnership.
Research who you’re pitching to first. You should know what types of content they’re likely to cover.

The results

We’ve been performing digital PR for the better part of a decade now, and we’ve found that it’s one of the most consistent ways to build links in today’s digital ecosystem. While there’s definitely a lot that goes into creating a campaign, they often result in high- quality content that’s actually newsworthy and deserving of coverage. Using these techniques, we consistently get coverage from some of the most trusted publications on the Web:

Digital PR campaigns can help drive immense results over time. For example, below, you can find a client that has been implementing digital PR initiatives since 2015. They have received links from 3,400+ referring domains, including The Washington Post, Inc.com, Fast Company, Entrepreneur.com, and more:

Conclusion

The world of link building is getting more and more challenging. In order to continue to build links and authority, brands may want to consider pivoting to more traditional PR strategies. While digital PR isn’t the only way to build links, we find that it’s one of the most effective and scalable ways to do so.

Friday, September 17, 2021

Cannibalization

In today's episode of Whiteboard Friday, Tom Capper walks you through a problem many SEOs have faced: cannibalization. What is it, how do you identify it, and how can you fix it? Watch to find out!

Video Transcription

Happy Friday, Moz fans, and today we're going to be talking about cannibalization, which here in the UK we spell like this: cannibalisation. With that out of the way, what do we mean by cannibalization?

What is cannibalization?

So this is basically where one site has two competing URLs and performs, we suspect, less well because of it. So maybe we think the site is splitting its equity between its two different URLs, or maybe Google is getting confused about which one to show. Or maybe Google considers it a duplicate content problem or something like that. One way or another, the site does less well as a result of having two URLs.

So I've got this imaginary SERP here as an example. So imagine that Moz is trying to rank for the keyword "burgers." Just imagine that Moz has decided to take a wild tangent in its business model and we're going to try and rank for "burgers" now.

So in position one here, we've got Inferior Bergz, and we would hope to outrank these people really, but for some reason we're not doing. Then in position two, we've got Moz's Buy Burgers page on the moz.com/shop subdirectory, which obviously doesn't exist, but this is a hypothetical. This is a commercial landing page where you can go and purchase a burger.

Then in position three, we've got this Best Burgers page on the Moz blog. It's more informational. It's telling you what are the attributes to a good burger, how can you identify a good burger, where should you go to acquire a good burger, all this kind of more neutral editorial information.

So we hypothesize in this situation that maybe if Moz only had one page going for this keyword, maybe it could actually supplant the top spot. If we think that's the case, then we would probably talk about this as cannibalization.

However, the alternative hypothesis is, well, actually there could be two intents here. It might be that Google wishes to show a commercial page and an informational page on this SERP, and it so happens that the second best commercial page is Moz's and the best informational page is also Moz's. We've heard Google talk in recent years or representatives of Google talk in recent years about having positions on search results that are sort of reserved for certain kinds of results, that might be reserved for an informational result or something like that. So this doesn't necessarily mean there's cannibalization. So we're going to talk a little bit later on about how we might sort of disambiguate a situation like this.

Classic cannibalization

First, though, let's talk about the classic case. So the classic, really clear-cut, really obvious case of cannibalization is where you see a graph like this one.

Hand drawn graph showing ranking consequences of cannibalization.

So this is the kind of graph you would see a lot of rank tracking software. You can see time and the days of the week going along the bottom axis. Then we've got rank, and we obviously want to be as high as possible and close to position one.

Then we see the two URLS, which are color-coded, and are green and red here. When one of them ranks, the other just falls away to oblivion, isn't even in the top 100. There's only ever one appearing at the same time, and they sort of supplant each other in the SERP. When we see this kind of behavior, we can be pretty confident that what we're seeing is some kind of cannibalization.

Less-obvious cases

Sometimes it's less obvious though. So a good example that I found recently is if, or at least in my case, if I Google search Naples, as in the place name, I see Wikipedia ranking first and second. The Wikipedia page ranking first was about Naples, Italy, and the Wikipedia page at second was about Naples, Florida.

Now I do not think that Wikipedia is cannibalizing itself in that situation. I think that they just happen to have... Google had decided that this SERP is ambiguous and that this keyword "Naples" requires multiple intents to be served, and Wikipedia happens to be the best page for two of those intents.

So I wouldn't go to Wikipedia and say, "Oh, you need to combine these two pages into a Naples, Florida and Italy page" or something like that. That's clearly not necessary.

Questions to ask

So if you want to figure out in that kind of more ambiguous case whether there's cannibalization going on, then there are some questions we might ask ourselves.

1. Do we think we're underperforming?

So one of the best questions we might ask, which is a difficult one in SEO, is: Do we think we're underperforming? So I know every SEO in the world feels like their site deserves to rank higher, well, maybe most. But do we have other examples of very similar keywords where we only have one page, where we're doing significantly better? Or was it the case that when we introduced the second page, we suddenly collapsed? Because if we see behavior like that, then that might, you know, it's not clear-cut, but it might give us some suspicions.

2. Do competing pages both appear?

Similarly, if we look at examples of similar keywords that are less ambiguous in intent, so perhaps in the burgers case, if the SERP for "best burgers" and the SERP for "buy burgers," if those two keywords had completely different results in general, then we might think, oh, okay, we should have two separate pages here, and we just need to make sure that they're clearly differentiated.

But if actually it's the same pages appearing on all of those keywords, we might want to consider having one page as well because that seems to be what Google is preferring. It's not really separating out these intents. So that's the kind of thing we can look for is, like I say, not clear-cut but a bit of a hint.

3. Consolidate or differentiate?

Once we've figured out whether we want to have two pages or one, or whether we think the best solution in this case is to have two pages or one, we're going to want to either consolidate or differentiate.

So if we think there should only be one page, we might want to take our two pages, combine the best of the content, pick the strongest URL in terms of backlinks and history and so on, and redirect the other URL to this combined page that has the best content, that serves the slight variance of what we now know is one intent and so on and so forth.

If we want two pages, then obviously we don't want them to cannibalize. So we need to make sure that they're clearly differentiated. Now what often happens here is a commercial page, like this Buy Burgers page, ironically for SEO reasons, there might be a block of text at the bottom with a bunch of editorial or SEO text about burgers, and that can make it quite confusing what intent this page is serving.

Similarly, on this page, we might at some stage have decided that we want to feature some products on there or something. It might have started looking quite commercial. So we need to make sure that if we're going to have both of these, that they are very clearly speaking to separate intents and not containing the same information and the same keywords for the most part and that kind of thing.

Quick tip

Lastly, it would be better if we didn't get into the situation in the first place. So a quick tip that I would recommend, just as a last takeaway, is before you produce a piece of content, say for example before I produced this Whiteboard Friday, I did a site:moz.com cannibalization so I can see what content had previously existed on Moz.com that was about cannibalization.

I can see, oh, this piece is very old, so we might — it's a very old Whiteboard Friday, so we might consider redirecting it. This piece mentions cannibalization, so it's not really about that. It's maybe about something else. So as long as it's not targeting that keyword we should be fine and so on and so forth. Just think about what other pieces exist, because if there is something that's basically targeting the same keyword, then obviously you might want to consider consolidating or redirecting or maybe just updating the old piece.

That's all for today. Thank you very much.

Video transcription by Speechpad.com.

Thursday, September 16, 2021

Tackling 8,000 Title Tag Rewrites: A Case Study

I recently dug into over 50,000 title tags to understand the impact of Google’s rewrite update. As an SEO, this naturally got me wondering how the update impacted Moz, specifically. So, this post will be a more focused examination of a site I have deep familiarity with, including three case studies where we managed to fix bad rewrites.

As an author, I take titles pretty personally. Imagine if you wrote this masterpiece:

… and then you ended up with a Google result that looked like this:

Sure, Google didn’t do anything wrong here, and it’s not their fault that there’s an upper limit on what they can display, but it still feels like something was lost. It’s one thing to do a study across a neutral data set, but it’s quite another when you’re trying to understand the impact on your own site, including articles you spent hours, days, or weeks writing.

Moz rewrites by the numbers

I’m not going to dig deep into the methodology, but I collected the full set of ranking keywords from Moz’s Keyword Explorer (data is from late August) and scraped the relevant URLs to pull the current <title> tags. Here are a few of the numbers:

74,810 ranking keywords
10,370 unique URLs
8,646 rewrites

Note that just under 2,000 of these “rewrites” were really pre-update (...) truncation. The majority of the rest were brand rewrites or removals, which I’ll cover a bit in the examples. The number of significant, impactful rewrites is hard to measure, but was much smaller.

Where did Google get it right?

While I have reservations about Google rewriting title tags (more on that at the end of this post), I tried to go into this analysis with an open mind. So, let’s look at what Google got right, at least in the context of Moz.com.

(1) Removing double-ups

Our CMS automatically appends our brand (“ - Moz”) to most of our pages, a situation that’s hardly unique to our site. In some cases, this leads to an odd doubling-up of the brand, and Google seems to be removing these fairly effectively. For example:

While the CMS is doing its job, “Moz - Moz” is repetitive, and I think Google got this one right. Note that this is not simple truncation — the additional text would have easily fit.

(2) Those darned SEOs!

Okay, I’m not sure I want to admit this one, but occasionally we test title variations, and we still live with some of the legacy of rebranding from “SEOmoz” to “Moz” in 2013. So, some areas of our site have variations of “ | SEO | Moz”. Here’s how Google handled one variety:

While it’s a bit longer, I suspect this is a better extension for our Q&A pages, both for us and for our visitors from search. I’m going to call this a win for Google.

(3) Whatever this is…

I have no idea what the original intent of this <title> tag was (possibly an experiment):

While there’s nothing terribly wrong with the original <title> tag, it’s probably trying too hard to front-load specific keywords and it’s not very readable. In this case, Google opted to use the blog post title (from the <H1>), and it’s probably a good choice.

Where did Google get it so-so?

It may seem strange to cover examples where Google did an okay job, but in some ways these bother me the most, if simply because they seem unnecessary. I feel like the bar for a rewrite should be higher, and that makes the gray areas worth studying.

(4) Shuffling the brand

For some of our more evergreen pieces, we put the Moz brand front-and-center. In a number of cases, Google shuffled that to the back of the title. Here’s just one example:

There’s nothing inherently wrong with this rewrite, but why do it? We made a conscious choice here and — while the rewrite might be more consistent with our other content — I’m not sure this is Google’s decision to make.

(5) Double-brand trouble

This is a variation on #4, conceptually. Some of our Whiteboard Friday video titles end in “- Whiteboard Friday - Moz”, and in this example Google has split that and relocated half of it to the front of the display title:

Whiteboard Friday is a brand in and of itself, but I have a feeling that #4 and #5 are really more about delimiters in the title than the brand text. Again, why did this trigger a rewrite?

You might be thinking something along the lines of “Google has all the data, and maybe they know more than we do.” Put that thought on hold until the end of the post.

(6) The old switcheroo

Here’s an example where Google opted for the post title (in the <H1>) instead of the <title> tag, with the end result being that they swapped “remove” for “delete”:

This isn’t really a single-word substitution (so much as a total swap), and I don’t know why we ended up with two different words here, but what about the original title — which is extremely similar to the post title — triggered the need for a rewrite?

One quick side note — remember that Featured Snippets are organic results, too, and so rewrites will also impact your Featured Snippets. Here’s that same post/rewrite for another query, appearing as a Featured Snippet:

Again, there’s nothing really wrong or inaccurate about the rewrite, other than a lack of clarity about why it happened. In the context of a Featured Snippet, though, rewrites have a greater possibility of impacting the intent of the original author(s).

Where did Google get it wrong?

It’s the moment you’ve been waiting for — the examples where Google made a mess of things. I want to be clear that these, at least in our data set, are few and far between. It’s easy to cherry-pick the worst of the worst, but the three examples I’ve chosen here have a common theme, and I think they represent a broader problem.

(7) Last things first

Here’s an example of rewrite truncation, where Google seems to have selected the parenthetical over the main portion of the title:

Many of the bad examples (or good examples of badness) seem to be where Google split a title based on delimiters and then reconstructed what was left in a way that makes no sense. It seems especially odd in the case of a parenthetical statement, which is supposed to be an aside and less important than what precedes it.

(8) Half the conversation

In other cases, Google uses delimiters as a cutting-off point, displaying what’s before or after them. Here’s a case where the “after” approach didn’t work so well:

This is user-generated content and, granted, it’s a long title, but the resulting cutoff makes no sense out of context. Standard (...) truncation would’ve been a better route here.

(9) And another thing...

Here’s a similar example, but where the cutoff happened at a hyphen (-). The title style is a bit unusual (especially starting the sub-title with “And”), but the cutoff turns it from unusual to outright ridiculous:

Again, simple truncation would’ve been a better bet here.

I get what Google’s trying to do — they’re trying to use delimiters (including pipes, hyphens, colons, parentheses, and brackets) to find natural-language breaks, and split titles at those breaks. Unfortunately, the examples demonstrate how precarious this approach can be. Even the classic “Title: Sub-title” format is often reversed by writers, with the (arguably) less-important portion sometimes being used first.

Three case studies (& three wins)

Ultimately, some rewrites will be good-to-okay and most of these rewrites aren’t worth the time and effort to fix. Over half of the Moz <title> rewrites were minor brand modifications or brand removal (with the latter usually being due to length limits).

What about the objectively bad rewrites, though? I decided to pick three case studies and see if I could get Google to take my suggestions. The process was relatively simple:

Update the <title> tag, trying to keep it under the length limit
Submit the page for reindexing in Google Search Console
If the rewrite didn’t take, update the <H1> or relevant on-page text

Here are the results of the three case studies (with before and after screenshots):

(1) A shady character

This one was really our fault and was an easy choice to fix. Long story short, a data migration led to a special character being corrupted, which resulted in this:

I’m not blaming Google for this one, but the end result was a strange form of truncation that made “Google Won’t” look like “Google Won”, and made it appear that this was the end of the title. I fixed and shortened the <title> tag, and here’s what happened:

Interestingly, Google opted to use the <H1> here instead of the shortened <title> version, but since it fixed the main issue, I’m going to call this a win and move on.

(2) Change isn’t easy

Here’s another one where Google got it wrong, breaking the <title> tag at a parenthetical that didn’t really make any sense (similarly to the examples above):

Since this was a recent and still-relevant post, we were eager to fix it. Interestingly, the first fix didn’t take. I had to resort to changing the post title (<H1>) as well, and removed the parentheses from that title. After that, Google opted for the <title> tag:

This process may require some trial-and-error and patience, especially since the GSC reindexing timeline can vary quite a bit. Most of these updates took about a day to kick in, but I’ve recently heard anywhere from an hour to never.

(3) Don’t ditch Moz!

Our final case study is a complex, multi-delimiter title where Google decided to split the title based on a phrase in quotation marks and then truncate it (without the “...”):

Although the main portion of the rewrite is okay, unfortunately the cutoff makes it look like the author is telling readers to ditch Moz. (Marketing wasn’t thrilled about that). I opted to simplify the <title> tag, removing the quote and the parentheses. Here’s the end result:

I managed to sneak in all of the relevant portion of the title by switching “And” out with an ampersand (&), and now it’s clear what we should be ditching. Cue the sigh of relief.

While there’s potentially a lot more to be done, there are two takeaways here:

You need to prioritize — don’t sweat the small rewrites, especially when Google might change/adjust them at any time.
The bad rewrites can be fixed with a little time and patience, if you understand why Google is doing what they’re doing.

I don’t think this update is cause for panic, but it’s definitely worth getting a sense of your own rewrites — and especially patterns of rewrites — to make sure they reflect the intent of your content. What I found, even across 8,000 rewrites, is that there were only a handful of patterns with maybe a few dozen examples that didn’t fit any one pattern. Separating the signal from the noise takes work, but it’s definitely achievable.

Are rewrites good or bad?

This is an incredibly subjective question. I purposely structured this post into right/so-so/wrong to keep myself from cherry-picking bad examples, and my observations are that most rewrites (even on a site that I take pretty personally) are minor and harmless. That said, I have some misgivings. If you’re happy with the analysis and don’t need the editorializing, you’re welcome to go make a sandwich or take a nap.

It’s important to note that this is a dynamic situation. Some of the rewrites my research flagged had changed when I went back to check them by hand, including quite a few that had reverted to simple truncation. It appears that Google is adjusting to feedback.

This research and post left me the most uncomfortable with the “so-so” examples. Many of the bad examples can be fixed with better algorithms, but ultimately I believe that the bar for rewriting titles should be relatively high. There’s nothing wrong with most of the original <title> tags in the so-so examples, and it appears Google has set the rewrite threshold pretty low.

You might argue that Google has all of the data (and that I don’t), so maybe they know what they’re doing. Maybe so, but I have two problems with this argument.

First, as a data scientist, I worry about the scale of Google’s data. Let’s assume that Google A/B tests rewrites against some kind of engagement metric or metrics. At Google scale (i.e. massive data), it’s possible to reach statistical significance with very small differences. The problem is that statistics don’t tell us anything about whether that change is meaningful enough to offset the consequences of making it. Is a 1% lift in some engagement metric worth it when a rewrite might alter the author’s original intent or even pose branding or legal problems for companies in limited cases?

If you’re comparing two machine learning models to each other, then it makes sense to go with the one that performs better on average, even if the difference is small. Presumably, in that case, both models have access to the same data. With title rewrites, though, we’re comparing the performance of a model to millions of conscious, human decisions that may have a great deal of context Google has no access to. The risk of rewriting is reasonably high, IMO, and that means that small differences in performance may not be enough.

Second — and this is a more philosophical point — if Google has found that certain patterns or title styles result in better performance, then why not be transparent and publish that data? I understand why Google wants to veil the algorithm in secrecy, but they’ve already told us that title rewrites don’t impact rankings. If the goal is to create better titles across the web, then empower writers and content creators to do that. Don’t make those decisions for us.

Ultimately, I think Google moved too far, too fast with this update. I believe they could have communicated (and still could communicate) the reasons more openly without risk to any major secrets and be more conservative about when and if to make changes, at least until these systems have been improved.