WordPress Pages Returning 404 Page Not Found Headers
Have you ever wondered why some WordPress-controlled pages do not get indexed by Google? Assuming that this is not being caused by your .htaccess file or robots.txt, while the page may appear to be normal from the surface, maybe Google is still not able to see it from under the hood.
Recently, we had the pleasure of migrating a static website to WordPress. The objective was to migrate each page in small sections so that there was minimal impact on the original layout and design of the webpage. This meant that some static pages had to use the WordPress header include file in order to inherit the chosen template files. After deployment, Google Webmaster tools reported that some pages could not be found. But when we checked the page using a browser, they were being rendered properly.
What we discovered was that WordPress continued to look for posts in the header. If it fails to find a post, it would return a 404 status back to the agent, but continues to render the rest of the webpage. While to a human, the site appears normal, to Googlebot, it sees a 404 Page Not Found. This prevents the page from being indexed.
You can find out what header information your web pages are sending back to the browser by using an HTTP sniffer tool from http://www.web-sniffer.com. Enter the URL of the web page and hit submit. Scroll down to the HTTP Response Header section and look for the “Status”. If the Status is 404, this tells Googlebot that the page does not exist and it will not index that page.
To fix this, you will need to immediately send a Status 200 back to the agent after calling the WordPress header as follows:
/** Loads the WordPress Environment and Template */ require($_SERVER['DOCUMENT_ROOT'].'/wp-blog-header.php'); header("HTTP/1.1 200 OK"); header("Status: 200 All Good") ;
Then test it with the web sniffer tool to validate that the Status now shows 200. You can also use Google Webmaster Tools “Fetch as Googlebot” to validate the page. Go through all of the pages that returned a 404 and apply the fix using the method above.