No, I am not talking about the infamous Fail Whale. And moreover, the big news is that it looks like the biggest fail of the last 5 years has just been fixed – but not due to Twitter’s efforts.
Twitter is huge, this is not news to anyone. A site:twitter.com search in Google returns 1,750,000,000 results. Yup, that’s close to 2 BILLION. Yet, most of those results are actually non-existent pages.
Yes, you heard me right. Google keeps in its index close to 2 billion non-existent pages from one domain alone. How come? Let’s look at the typical URL structure of a Twitter user profile:
http://twitter.com/#!/username
Now, what kind of URLs do we see in the aforementioned SERPs for the site:twitter.com query in Google? Something like:
http://twitter.com/username
Notice the difference? the “/#!” part is missing. Infact, it is not even possible to figure out if Google has at least a single URL with the “/#!” bit indexed as these symbols would get ignored by Google so searching for site:twitter.com inurl:/#!/ just won’t produce any results different from site:twitter.com.
Where did the whole issue come from? Some of you may remember that the new URL structure for user profiles came to exist over a year ago – for some time afterwards, it was still possible to switch back to the old (less-Ajaxy) interface preserved under the old URL. Then, the old interface was killed and all old URLs were redirected to the new ones.
Only, Twitter has never got the redirects right. They use 302 instead of 301! Here is a 2007 blog post by Google’s John Mueller detailing what Twitter’s got wrong and how it should be fixed. Did they ever fix their redirects? – No! Do they think they are too good for SEO? Heck, even CNN has an SEO, and did you ever think CNN should care much about search engines?
Until recently, this profile URL redirect issue used to cause some serious troubles with cached versions of the corresponding pages in Google – all of them appeared as “that page does not exist”:
Lately, however, Google got better at indexing and caching their 302 redirects so the cache screenshots look better. But it was due to Google’s own action only, not Twitter’s. Are the folks at Twitter THAT blind and deaf?