The excitement of finishing a competitive keyword research project often gives way to the panic of fleeing from an avalanche of opportunities. Without an organizing principle, a spreadsheet full of keywords is a bottomless to-do list. It’s not enough to know what your competitors are ranking for — you need to know what content is powering those rankings and how you’re currently competing with that content. You need a blueprint to craft those keywords into a compelling structure.
Recently, I wrote a post about the current state of long-tail SEO. While I had an angle for the piece in mind, I also knew it was a topic Moz and others had covered many times. I needed to understand the competitive landscape and make sure I wasn’t cannibalizing our own content.
This post covers one method to perform that competitive content research, using Google’s advanced search operators. For simplicity’s sake, we’ll pare down the keyword research and start our journey with just one phrase: “long tail seo.”
Find your best content (site:)
long tail seo site:moz.com
“long tail seo” site:moz.com
First, what has Moz already published on the subject? By pairing your target keywords with the [site:] operator, you can search for matching content only on your own site. I usually start with a broad-match search, but if your target phrases are made up of common words, you could also use quotation marks and exact-match search. Here’s the first piece of content I see:
Our best match on the subject is a Whiteboard Friday from five years ago. If I had nothing new to add to the subject and/or I was considering doing a video, this might end my journey. I don’t really want to compete with my own content that’s already performing well. In this case, I decide that I’ve got a fresh take, and I move forward.
Target a specific folder (inurl:)
long tail seo site:moz.com inurl:learn
long tail seo site:moz.com/learn
For larger sites, you might want to focus on a specific section, like the blog, or in Moz’s case, our Learning Center. You have a couple of options here. You could use the [inurl:] operator with the folder name, but that may result in false alarms, like:
This may be useful, in some cases, but when you need to specifically focus on a sub-folder, just add that sub-folder to the [site:] operator. The handy thing about the [site:] operator is that anything left off is essentially a wild card, so [site:moz.com/learn] will return anything in the /learn folder.
Find all competing pages (-site:)
long tail seo -site:moz.com
Now that you have a sense of your own, currently-ranking content, you can start to dig into the competition. I like to start broad, simply using negative match [-site:] to remove my own site from the list. I get back something like this:
This is great for a big-picture view, but you’re probably going to want to focus in on just a couple or a handful of known competitors. So, let's narrow down the results ...
Explore key competitors (site: OR site:)
long tail seo (site:ahrefs.com OR site:semrush.com)
By using the [OR] operator with [site:] and putting the result in parentheses, you can target a specific group of competitors. Now, I get back something like this:
Is this really different than targeting one competitor at a time? Yes, in one important way: now I can see how these competitors rank against each other.
Explore related content #1 (-“phrase”)
long tail seo -"long tail seo"
As you get into longer, more targeted phrases, it’s possible to miss relevant or related content. Hopefully, you’ve done a thorough job of your initial keyword research, but it’s still worth checking for gaps. One approach I use is to search for your main phrase with broad match, but exclude the exact match phrase. This leaves results like:
Just glancing at page one of results, I can see multiple mentions of “long tail keywords” (as well as “long-tail” with a hyphen), and other variants like “long tail keyword research” and “long tail organic traffic.” Even if you’ve turned these up in your initial keyword research, this combination of Google search operators gives you a quick way to cover a lot of variants and potentially relevant content.
Explore related content #2 (intext: -intitle:)
intext:"long tail seo" -intitle:"long tail seo"
Another handy trick is to use the [intext:] operator to target your phrase in the body of the content, but then use [-intitle:] to exclude results with the exact-match phrase in the title. While the results will overlap with the previous trick, you can sometimes turn up some interesting side discussions and related topics. Of course, you can also use [intitle:] to laser-target your search on content titles.
Find pages by dates (####..####)
long tail seo 2010..2015
In some cases, you might want to target your search on a date-range. You can combine the four-digit years with the range operator [..] to target a time period. Note that this will search for the years as numbers anywhere in the content. While the [daterange:] operator is theoretically your most precise option, it relies on Google being able to correctly identify the publication date of a piece, and I’ve found it difficult to use and a bit unpredictable. The range operator usually does the job.
Find top X lists (intitle:”#..#”)
intitle:"top 11..15" long tail seo
This can get a little silly, but I just want to illustrate the power of combining operators. Let’s say you’re working on a top X list about long-tail SEO, but want to make sure there isn’t too much competition for the 11-15 item range you’re landing in. Using a combo of [intitle:] plus the range operator [..], you might get something like this:
Note that operator combos can get weird, and results may vary depending on the order of the operators. Some operators can’t be used in combination (or at least the results are highly suspicious), so always gut-check what you see.
Putting all of the data to work
If you approach this process in an organized way (if I can do it, you can do it, because, frankly, I’m not that organized), what you should end up with is a list of relevant topics you might have missed, a list of your currently top-performing pages, a list of your relevant competitors, and a list of your competitors’ top-performing pages. With this bundle of related data, you can answer questions like the following:
Are you at risk of competing with your own relevant content?
Should you create new content or improve on existing content?
Is there outdated content you should remove or 301-redirect?
What competitors are most relevant in this content space?
What effort/cost will it take to clear the competitive bar?
What niches haven’t been covered by your competitors?
No tool will magically answer these questions, but by using your existing keyword research tools and Google’s advanced search operators methodically, you should be able to put your human intelligence to work and create a specific and actionable content strategy around your chosen topic.
If you’d like to learn more about Google’s advanced search operators, check out our comprehensive Learning Center page or my post with 67 search operator tricks. I’d love to hear more about how you put these tools to work in your own competitive research.
Life rushed back into Jayda’s lungs, sharp and unforgiving. To her left, shards of a thousand synonyms. To her right, the crumbling remains of a mountain of long-tail keywords. As the air filled her lungs, the memories came rushing back, and with them the crushing realization that her team was buried beneath the debris. After months of effort, they had finally finished their competitive keyword research, but at what cost?
It's time to get down to business and convince your boss that you HAVE to go to MozCon Virtual 2021.
You're already well acquainted with the benefits of MozCon. Maybe you're a MozCon alumnus, or you may have lurked the hashtag once or twice for inside tips. You’ve likely followed the work of some of the speakers for a while. But how are you going to relay that to your boss in a way that sells? Don’t worry, we’ve got a plan.
(And if you want to skip ahead to the letter template, here it is!)
Alright, so just going in and saying “Have you seen any of Britney Muller’s Whiteboard Fridays lately?!” probably won’t do the trick — we need some cold hard facts that you can present.
MozCon delivers actionable insights
It’s easy to say that MozCon provides actionable insights, but how do you prove it? A quick scroll through our Facebook Group can prove to anyone that not only is MozCon a gathering of the greatest minds in search, but it also acts as an incubator and facilitator for SEO strategies.
If you can’t get your boss on Facebook, just direct them to the blog post written by Croud: Four things I changed immediately after attending MozCon. Talk about actionable! A quick Google (or LinkedIn) search will return dozens of similar recaps. Gather a few of these to have in your tool belt just in case.
Or, if you have the time, pick out some of the event tweets from previous years that relate most to your company. The MozCon hashtag (#MozCon) has plenty of tweets to choose from — things like research findings, workflows, and useful tools are all covered.
The networking is unbeatable
The potential knowledge gain doesn’t end with keynote speeches. Many of our speakers stick around for the entire conference and host niche- and vertical-specific Birds of a Feather sessions. If you find yourself with questions about their strategies, you'll often have the ability to ask them directly.
Lastly, your peers! There's no better way to learn than from those who overcome the same obstacles as you. Opportunities for collaboration and peer-to-peer learning are often invaluable, and can lead to better workflows, new business, and even exciting partnerships.
Step #2 - Break down the costs
This is where the majority of the conversation will be focused, but fear not, Roger has already done most of the heavy lifting. So let’s cut to the chase. The goal of MozCon isn’t to make money — the goal is to break even and lift up our friends in search. Plus, since it’s a virtual conference, the price is unbeatable! If you purchase a ticket before May 31, 2021, you'll get access to Early Bird pricing, and if you're Moz subscribers, you get a $20 discount off General Admission!
You'll also have the option to save 15% if you bundle the ticket with either of Moz Academy's SEO certifications: Technical SEO or SEO Essentials.
Top-of-the-line speakers
Every year we work with our speakers to bring cutting-edge content to the stage. You can be sure that the content you’ll be exposed to will set you up for a year of success.
Videos for everyone
While your coworkers won’t be able to enjoy the live sessions, they will be able to see all of the talks via professional video and audio. Your ticket to MozCon includes a professional video package which allows you (and your whole team) to watch every single talk post-conference, for free.
Step #3 - Be prepared to prove value
It’s important to go into the conference with a plan to bring back value. It’s easy to come to any conference and just enjoy the presentations and events, but it’s harder to take the information gained and implement change.
Make a plan
Before approaching your boss, make sure you have a plan on how you're going to show off all of the insights you gather at MozCon! Obviously, you'll be taking notes — whether it’s to the tune of live tweets, bullet journals, or doodles, those notes are most valuable when they're backed up by action.
Putting it into action
Set expectations with your boss. "After each day, I'll select three takeaways and create a plan on how to execute them." Who could turn down nine potential business-changing strategies?!
And it really isn’t that hard! Especially not with the content that you'll have access to. At the close of each day, we recommend you look back over your notes and do a brain-dump.
How did today's content relate to your business?
Which sessions resonated and would bring the most value to your team?
Which strategies can easily be executed?
Which would make the biggest impact?
After you identify those strategies, create a plan of action that will get you on track for implementing change.
Client briefs
If you have clients on retainer, ongoing training for employees is something those clients should appreciate — it ensures you’re staying ahead of the game. Offer to not only debrief your in-house SEO team, but to also present to your clients. This sort of presentation is a value add that many clients don’t get and can set your business apart.
These presentations can be short blurbs at the beginning of a regular meeting or a chance to gather up all of your clients and enjoy a bit of networking and education.
Still not enough?
Give the boss a taste of MozCon by having them check out some videos from years past to get a taste for the caliber of our speakers.
Lastly, the reviews speak for themselves. MozCon is perfect for SEOs of any level, no matter where they're located!
Our fingers are crossed!
Alright, friend, now is your time to shine. We've equipped you with some super-persuasive tools and we'll be crossing our fingers that the boss gives you the "okay!" Be sure to grab the letter template and make your case the easy way:
Anyone who does SEO as part of their job knows that there’s a lot of value in analyzing which queries are and are not sending traffic to specific pages on a site.
The most common uses for these datasets are to align on-page optimizations with existing rankings and traffic, and to identify gaps in ranking keywords.
However, working with this data is extremely tedious because it’s only available in the Google Search Console interface, and you have to look at only one page at a time.
On top of that, to get information on the text included in the ranking page, you either need to manually review it or extract it with a tool like Screaming Frog.
You need this kind of view:
…but even the above view would only be viable one page at a time, and as mentioned, the actual text extraction would have had to be separate as well.
Given these apparent issues with the readily available data at the SEO community’s disposal, the data engineering team at Inseev Interactive has been spending a lot of time thinking about how we can improve these processes at scale.
One specific example that we’ll be reviewing in this post is a simple script that allows you to get the above data in a flexible format for many great analytical views.
Better yet, this will all be available with only a few single input variables.
A quick rundown of tool functionality
The tool automatically compares the text on-page to the Google Search Console top queries at the page-level to let you know which queries are on-page as well as how many times they appear on the page. An optional XPath variable also allows you to specify the part of the page you want to analyze text on.
This means you’ll know exactly what queries are driving clicks/impressions that are not in your <title>, <h1>, or even something as specific as the first paragraph within the main content (MC). The sky's the limit.
For those of you not familiar, we’ve also provided some quick XPath expressions you can use, as well as how to create site-specific XPath expressions within the "Input Variables" section of the post.
Post setup usage & datasets
Once the process is set up, all that’s required is filling out a short list of variables and the rest is automated for you.
The output dataset includes multiple automated CSV datasets, as well as a structured file format to keep things organized. A simple pivot of the core analysis automated CSV can provide you with the below dataset and many other useful layouts.
… Even some "new metrics"?
Okay, not technically "new," but if you exclusively use the Google Search Console user interface, then you haven’t likely had access to metrics like these before: "Max Position," "Min Position," and "Count Position" for the specified date range – all of which are explained in the "Running your first analysis" section of the post.
To really demonstrate the impact and usefulness of this dataset, in the video below we use the Colab tool to:
[3 Minutes] — Find non-brand <title> optimization opportunities for https://www.inseev.com/ (around 30 pages in video, but you could do any number of pages)
[3 Minutes] — Convert the CSV to a more useable format
[1 Minute] – Optimize the first title with the resulting dataset
Okay, you’re all set for the initial rundown. Hopefully we were able to get you excited before moving into the somewhat dull setup process.
Keep in mind that at the end of the post, there is also a section including a few helpful use cases and an example template! To jump directly to each section of this post, please use the following links:
[Quick Consideration #1] —The web scraper built into the tool DOES NOT support JavaScript rendering. If your website uses client-side rendering, the full functionality of the tool unfortunately will not work.
[Quick Consideration #2] —This tool has been heavily tested by the members of the Inseev team. Most bugs [specifically with the web scraper] have been found and fixed, but like any other program, it is possible that other issues may come up.
If you encounter any errors, feel free to reach out to us directly atjmelman@inseev.comorinfo@inseev.com, and either myself or one of the other members of the data engineering team at Inseev would be happy to help you out.
If new errors are encountered and fixed, we will always upload the updated script to the code repository linked in the sections below so the most up-to-date code can be utilized by all!
One-time setup of the script in Google Colab (in less than 20 minutes)
Things you’ll need:
Google Drive
Google Cloud Platform account
Google Search Console access
Video walkthrough: tool setup process
Below you’ll find step-by-step editorial instructions in order to set up the entire process. However, if following editorial instructions isn’t your preferred method, we recorded a video of the setup process as well.
As you’ll see, we start with a brand new Gmail and set up the entire process in approximately 12 minutes, and the output is completely worth the time.
Keep in mind that the setup is one-off, and once set up, the tool should work on command from there on!
Editorial walkthrough: tool setup process
Four-part process:
Download the files from Github and set up in Google Drive
Set up a Google Cloud Platform (GCP) Project (skip if you already have an account)
Create the OAuth 2.0 client ID for the Google Search Console (GSC) API (skip if you already have an OAuth client ID with the Search Console API enabled)
Add the OAuth 2.0 credentials to the Config.py file
Part one: Download the files from Github and set up in Google Drive
2. Click on the "Get started for free" CTA (CTA text may change over time).
3. Sign in with the OAuth credentials of your choice. Any Gmail email will work.
4. Follow the prompts to sign up for your GCP account.
You’ll be asked to supply a credit card to sign up, but there is currently a $300 free trial and Google notes that they won’t charge you until you upgrade your account.
Part three: Create a 0Auth 2.0 client ID for the Google Search Console (GSC) API
2. After you log in to your desired Google Cloud account, click "ENABLE".
3. Configure the consent screen.
In the consent screen creation process, select "External," then continue onto the "App Information."
Example below of minimum requirements:
Skip "Scopes"
Add the email(s) you’ll use for the Search Console API authentication into the "Test Users". There could be other emails versus just the one that owns the Google Drive. An example may be a client’s email where you access the Google Search Console UI to view their KPIs.
4. In the left-rail navigation, click into "Credentials" > "CREATE CREDENTIALS" > "OAuth Client ID" (Not in image).
5. Within the "Create OAuth client ID" form, fill in:
Application Type = Desktop app
Name = Google Colab
Click "CREATE"
6. Save the "Client ID" and "Client Secret" — as these will be added into the "api" folder config.py file from the Github files we downloaded.
These should have appeared in a popup after hitting "CREATE"
The "Client Secret" is functionally the password to your Google Cloud (DO NOT post this to the public/share it online)
Part four: Add the OAuth 2.0 credentials to the Config.py file
1. Return to Google Drive and navigate into the "api" folder.
2. Click into config.py.
3. Choose to open with "Text Editor" (or another app of your choice) to modify the config.py file.
4. Update the three areas highlighted below with your:
CLIENT_ID: From the OAuth 2.0 client ID setup process
CLIENT_SECRET: From the OAuth 2.0 client ID setup process
GOOGLE_CREDENTIALS: Email that corresponds with your CLIENT_ID & CLIENT_SECRET
5. Save the file once updated!
Congratulations, the boring stuff is over. You are now ready to start using the Google Colab file!
Running your first analysis
Running your first analysis may be a little intimidating, but stick with it and it will get easy fast.
Below, we’ve provided details regarding the input variables required, as well as notes on things to keep in mind when running the script and analyzing the resulting dataset.
After we walk through these items, there are also a few example projects and video walkthroughs showcasing ways to utilize these datasets for client deliverables.
Setting up the input variables
XPath extraction with the "xpath_selector" variable
Have you ever wanted to know every query driving clicks and impressions to a webpage that aren’t in your <title> or <h1> tag? Well, this parameter will allow you to do just that.
While optional, using this is highly encouraged and we feel it "supercharges" the analysis. Simply define site sections with Xpaths and the script will do the rest.
In the above video, you’ll find examples on how to create site specific extractions. In addition, below are some universal extractions that should work on almost any site on the web:
'//title' # Identifies a <title> tag
'//h1' # Identifies a <h1> tag
'//h2' # Identifies a <h2> tag
Site Specific: How to scrape only the main content (MC)?
Chaining Xpaths – Add a "|" Between Xpaths
'//title | //h1' # Gets you both the <title> and <h1> tag in 1 run
'//h1 | //h2 | //h3' # Gets you both the <h1>, <h2> and <h3> tags in 1 run
Other variables
Here’s a video overview of the other variables with a short description of each.
'colab_path' [Required] – The path in which the Colab file lives. This should be "/content/drive/My Drive/Colab Notebooks/".
'domain_lookup' [Required] – Homepage of the website utilized for analysis.
'startdate' & 'enddate'[Required] – Date range for the analysis period.
'gsc_sorting_field' [Required] – The tool pulls the top N pages as defined by the user. The "top" is defined by either "clicks_sum" or "impressions_sum." Please review the video for a more detailed description.
'gsc_limit_pages_number' [Required] – Numeric value that represents the number of resulting pages you’d like within the dataset.
'brand_exclusions' [Optional] – The string sequence(s) that commonly result in branded queries (e.g., anything containing "inseev" will be branded queries for "Inseev Interactive").
'impressions_exclusion' [Optional] – Numeric value used to exclude queries that are potentially irrelevant due to the lack of pre-existing impressions. This is primarily relevant for domains with strong pre-existing rankings on a large scale number of pages.
'page_inclusions' [Optional] – The string sequence(s) that are found within the desired analysis page type. If you’d like to analyze the entire domain, leave this section blank.
Running the script
Keep in mind that once the script finishes running, you’re generally going to use the "step3_query-optimizer_domain-YYYY-MM-DD.csv" file for analysis, but there are others with the raw datasets to browse as well.
Practical use cases for the "step3_query-optimizer_domain-YYYY-MM-DD.csv" file can be found in the "Practical use cases and templates" section.
That said, there are a few important things to note while testing things out:
1. No JavaScript Crawling: As mentioned at the start of the post, this script is NOT set up for JavaScript crawling, so if your target website uses a JS frontend with client-side rendering to populate the main content (MC), the scrape will not be useful. However, the basic functionality of quickly getting the top XX (user-defined) queries and pages can still be useful by itself.
2. Google Drive / GSC API Auth: The first time you run the script in each new session it will prompt you to authenticate both the Google Drive and the Google Search Console credentials.
Google Drive authentication: Authenticate to whatever email is associated with the Google Drive with the script.
GSC authentication: Authenticate whichever email has permission to use the desired Google Search Console account.
If you attempt to authenticate and you get an error that looks like the one below, please revisit the "Add the email(s) you’ll use the Colab app with into the 'Test Users'" from Part 3, step 3 in the process above: setting up the consent screen.
Quick tip: The Google Drive account and the GSC Authentication DO NOT have to be the same email, but they do require separate authentications with OAuth.
3. Running the script: Either navigate to "Runtime" > "Restart and Run All" or use the keyboard shortcut CTRL + fn9 to start running the script.
4. Populated datasets/folder structure: There are three CSVs populated by the script – all nested within a folder structure based on the "domain_lookup" input variable.
Automated Organization [Folders]: Each time you rerun the script on a new domain, it will create a new folder structure in order to keep things organized.
Automated Organization [File Naming]: The CSVs include the date of the export appended to the end, so you’ll always know when the process ran as well as the date range for the dataset.
5. Date range for dataset: Inside of the dataset there is a "gsc_datasetID" column generated, which includes the date range of the extraction.
6. Unfamiliar metrics: The resulting dataset has all the KPIs we know and love – e.g. clicks, impressions, average (mean) position — but there are also a few you cannot get directly from the GSC UI:
'count_instances_gsc' — the number of instances the query got at least 1 impression during the specified date range. Scenario example: GSC tells you that you were in an average position 6 for a large keyword like "flower delivery" and you only received 20 impressions in a 30-day date range. Doesn’t seem possible that you were really in position 6, right? Well, now you can see that was potentially because you only actually showed up on one day in that 30-day date range (e.g. count_instances_gsc = 1)
'max_position' & 'min_position' — the MAXIMUM and MINIMUM ranking position the identified page showed up for in Google Search within the specified date range.
Quick tip #1: Large variance in max/min may tell you that your keyword has been fluctuating heavily.
Quick tip #2: These KPIs, in conjunction with the "count_instances_gsc", can exponentially further your understanding of query performance and opportunity.
Recommended use: Download file and use with Excel. Subjectively speaking, I believe Excel has a much more user friendly pivot table functionality in comparison to Google Sheets — which is critical for using this template.
Alternative use: If you do not have Microsoft Excel or you prefer a different tool, you can use most spreadsheet apps that contain pivot functionality.
For those who opt for an alternative spreadsheet software/app:
Below are the pivot fields to mimic upon setup.
You may have to adjust the Vlookup functions found on the "Step 3 _ Analysis Final Doc" tab, depending on whether your updated pivot columns align with the current pivot I’ve supplied.
Project example: Title & H1 re-optimizations (video walkthrough)
Project description: Locate keywords that are driving clicks and impressions to high value pages and that do not exist within the <title> and <h1> tags by reviewing GSC query KPIs vs. current page elements. Use the resulting findings to re-optimize both the <title> and <h1> tags for pre-existing pages.
Project assumptions: This process assumes that inserting keywords into both the <title> and <h1> tags is a strong SEO practice for relevancy optimization, and that it’s important to include related keyword variants into these areas (e.g. non-exact match keywords with matching SERP intent).
Project example: On-page text refresh/re-optimization
Project description: Locate keywords that are driving clicks and impressions to editorial pieces of content that DO NOT exist within the first paragraph within the body of the main content (MC). Perform an on-page refresh of introductory content within editorial pages to include high value keyword opportunities.
Project assumptions: This process assumes that inserting keywords into the first several sentences of a piece of content is a strong SEO practice for relevancy optimization, and that it’s important to include related keyword variants into these areas (e.g. non-exact match keywords with matching SERP intent).
Final thoughts
We hope this post has been helpful and opened you up to the idea of using Python and Google Colab to supercharge your relevancy optimization strategy.
As mentioned throughout the post, keep the following in mind:
Github repository will be updated with any changes we make in the future.
There is the possibility of undiscovered errors. If these occur, Inseev is happy to help! In fact, we would actually appreciate you reaching out to investigate and fix errors (if any do appear). This way others don’t run into the same problems.
Other than the above, if you have any ideas on ways to Colab (pun intended) on data analytics projects, feel free to reach out with ideas.
Massage Innovation provides continuing education for massage Therapists nationwide, massage therapy classes, massage school, best gua sha tools, online life coach, personal life coaching, business coaching certification, small business coaching, academic life coaching, executive coaching services
Massage Innovation Network for Therapists (NCBTMB Approved Provider #100021) provides an enriched learning environment that has helped countless students learn, develop and grow. Our unparalleled curriculum and teaching methods help students take the next step in their education and approach the future with confidence.