We at FirmCatalyst check this using three methods:
Check your WordPress Settings
First of all, we check whether our WordPress installation is accessible for crawlers at all. To do this, we log in to the admin interface of our WordPress page and go to “Settings” > “Read“. Make sure that under “Visibility for search engines” the item “Stop search engines from indexing this website” is not selected.
Meta Robots Checker from SEO Review Tools
Analyze a URL of your choice in the input mask. The tool shows you whether the page in question has the NoIndex or Nofollow tag, which would prevent the URL from being indexed.
Use the site:domain.de search query
To do this, call up the google.de search. In the input mask type the command: “site:your-website.com”. Now you should see a list of all URLs of your website that are indexed in the Google search.
Check error messages in the Search Console
The Search Console is Google’s hub for informing webmasters of any penalties, errors or other notices that may affect your site. The new version of the Search Console (end of 2019) will also show you how Google indexes your website and how fast the pages of your website load, provided that you have linked the search console to your website.
This is possible if your domain has been verified using Google Analytics or the required Meta Tag.
Make sure that the registered property corresponds exactly to the callable version of your website. For example, if the primary version of your website is available at https://your-website.com, the property you have entered in the Search Console should not be https://www.your-website.com. In such a case, the results would be distorted and you would not be able to access all data.
Hint: If your website is not yet verified for the Google Search Console, follow the corresponding tutorial on growthwizard.de/yoast-seo-instellungen/.
Remove 404 Search Errors
404 errors are pretty much the most harmful thing your website has to deal with. The task of every search engine is to always offer the user the best possible answer to his search query. Therefore, search engines are always trying to adapt their own algorithms to find the best possible result for the user.
If a user clicks on a search result and the page with the respective information can no longer be found, this is not only bad for the user, but also for you as a website operator and also puts the search engine in a bad light. In such a situation there would only be losers.
From the point of view of search engine optimization (SEO), we have to deal with another problem. Every website builds up backlinks over time. These backlinks are an indicator of quality & trust for search engines. You can imagine it like this: Every URL of your website contains a score that evaluates the quality. If the called URL is no longer available and is not redirected properly, the built-up trust fizzles out.
Therefore, it should be your job to always properly redirect such 404 errors to the correct source. For this purpose, there are various status codes that tell search engines what happened to the respective content.
- 301: Content was permanently redirected: This indicates that the content is now permanently located at a different URL.
- 307: Content was temporarily redirected: This indicates that the content is temporarily located at another URL.
- 410: Content permanently deleted: This indicates that the content has been permanently removed from the website.
- There are many more status codes: A list of these can be found in the Ryte Wiki: https://de.ryte.com/wiki/HTTP_Status_Code
Matt Cutts (former Google employee) has described this problem in a YouTube video. There he explains why it is so important to pay attention to correct redirections and how to handle them.
Hint: You can find a list of the 404 errors of your website in the Search Console under “Index > Coverage > Excluded > Not Found (404)“.
To forward URLs correctly in WordPress, you can use plugins:
- Redirection: wordpress.org/plugins/redirection/
- Yoast SEO Premium (79$ / Year): yoast.com/wordpress/plugins/seo/
Check URLs for the Noindex tag
Also in the Search Console (Index > Coverage > Excluded > Excluded by “noindex” tag), you will find a list of all URLs that contain a so-called NoIndex tag. This meta tag tells search engines that the corresponding URL should not be included in the search results.
It will not do any harm to check all URLs at regular intervals to see if the page in question should really not be indexed. Especially when several people are working on a website or plugins are used, the NoIndex tag may be installed by mistake.
Check the location of your Sitemap.xml
A sitemap.xml is a list of all your URLs, images and content, including the time of the last modification. Especially large websites benefit from a sitemap because search engines find it easier to understand the structure of your website. For search engines, a sitemap is a guide to every content of your website.
You have the possibility to store the sitemap of your website in the robots.txt as well as in the Search Console. So search engines know exactly where to find the sitemap.
With the help of the plugin “Yoast SEO” you can easily edit the robots.txt:
- Call the WordPress backend under “yourdomain.com/wp-admin/”.
- Navigate to “SEO > Tools > File Editor“.
- Create a “robots.txt“.
- Add the following entry: Sitemap: https://your-website.com/sitemap_index.xml.
- Save robots.txt.
If you want to add the Sitemap.xml in the Search Console, follow the steps below:
- Access the Search Console at search.google.com/search-console/about?hl=de.
- Navigate to: “Sitemaps > Add new sitemap“.
- Enter the URL of your sitemap (https://yourdomain.com/sitemap_index.xml) in the input field and confirm your entry.
Note: If you are not using Yoast SEO to create your sitemap, you may find your sitemap at “yourdomain.com/sitemap.xml“. This path is the most commonly used for sitemaps.
Check the status of your robots.txt
The robots.txt is an optional text file in the FTP folder of your website, which is usually accessible under “yourdomain.com/robots.txt”. This file is only relevant for crawlers and contains instructions on which URL paths of your domain may be read and which paths are excluded from crawling.
Note for experienced webmasters: The “NoIndex Tag” within robots.txt is no longer supported since 2019. Google advises blocking crawling using alternative instructions such as “Disallow: yourdomain.de/path/“. It is also recommended to use meta tags like “NoIndex, NoFollow, DoFollow“.
Any professional SEO tool for an SEO audit should be able to check the robots.txt Checker von Ryte. If there is no access to professional SEO tools, you can also use the free robots.txt checker from Ryte.
At best, your robots.txt for WordPress should be as minimalistic as possible. Enclosed please find an optimal structure of a robots.txt for WordPress.
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Sitemap: https://ihredomain.de/sitemap_index.xml
Conclusion: Make it easy for crawlers!
Findability belongs to the fundamentals of successful search engine optimization. There are many tags and problems that can hinder the crawlability of a website. We are always working on websites of new clients who were not even aware that certain pages are excluded from indexing.