About this Paper
The following paper was made available to Techotopia by the kind permission of Jason Purdy, IT Manager at Journalistic. The paper was developed as the companion to a series of conference talks he gave on the subject of Search Engine Optimization (SEO).
Welcome to "Chopping Trees Loudly", an introduction to Search Engine Optimization (or SEO)
My name is Jason Purdy and I'm the IT Manager for Journalistic. We are a publisher in Durham and we publish two magazines as well as being involved in a current book project. We also operate an e-commerce store for one of the magazines as well as operate websites to act as the online representation of our efforts. Our main focus is on our flagship magazine, QSR magazine, which stands for Quick Service Restaurants. Those are the fast food operations like McDonald's and Starbucks that I've seen here. Session Introduction "Chopping Trees Loudly". There's a saying or riddle in the US: If a tree falls in a forest and no one's around, does it make a sound? That riddle comes to mind for me when I worry about search engine optimization because at Journalistic, we put a lot of effort into putting up webpages. Are they being heard?
One of the things I want to stress at the very beginning of this session is that there are two main techniques or approaches when it comes to search engine optimization. Good and Bad. Or White Hat and Black Hat, if you prefer. Your approach may reflect your own internal moral compass; are you looking to "game the system" and cut corners or do you simply want to make sure your site appears properly? For example, if you operated a site about Linux servers, would you try to also attract searches for Windows? Would you add "Windows" in your keywords tag? Matt Cutts, a prominent Google employee, recently blogged about people who try to cut corners probably do this in other areas of their life and it does catch up with them (he had found out several examples of people landing in jail for fraud).
I highly urge you to look at the pursuit of improving your rankings in an ethical manner with an altruistic goal of creating a good website experience for your users. You will be rewarded over time with more trust and thus, more traffic.
Things not to do
There are several things you should avoid if you don't want to fall on the wrong side of Google. With Google's dominance in Germany, you do not want to end up blacklisted. Google actually rewards doing the right thing by increasing your PageRank and bringing you more traffic. Let's quickly go over the serious no-no's, all of which could land you in hot water and possibly blacklisted:
CloakingCloaking covers a wide range of bad practices. Your server could look at the visitor's IP or User Agent string to determine the if it is a search engine crawler and then provide different content. You could hide content on a page in a hidden
Link Farms and Paid Links
We get unsolicited emails all the time following a similar format. The email subject usually says something like "I like your site" or "Fwd: Your site." In the email, they usually find a particular page in our site and ask for us to add some HTML: a link and the actual link text. Sometimes it's more generic where they want a link to their site anywhere on our site. All of this in exchange for a link on their site. I would look at these very skeptically, if not tossing them altogether. If you want to consider the link, take a look at where the link goes. Do you want your users going there? What is their PageRank? Where are they linking to you and is that a page of 100+ links, where you'll hardly be noticed?
More egregious (to Google) is when you're offered money to link to a site and you do this as part of a scheme to pass PageRank. Google isn't saying you can't take money to post a link; they're saying you need to designate it as such. In the anchor tag, add a rel="nofollow" attribute. You can also redirect the link to an intermediate page that is blocked by the robots.txt file.
If you have a quality website, you really don't need to pursue links, at least in the way I just described. They'll just come naturally. Let's consider that for your Linux server website, you post a new section on Xen virtualization. Google will pick that up pretty quickly (more on that later) and people that follow either Linux servers or Xen virtualization will be alerted with your content. Bloggers usually subscribe to topic-appropriate Google alerts and they'll know about your new content and come check it out. If it's worthy, they'll post about it and create the links for you.
There are appropriate links you should pursue: Wikipedia (free), Yahoo! Directory ($300 USD) and the Open Directory Project (free).
Wikipedia is easy to get into and you should consider relevant areas to add links to your site. Consider adding content at the same time. For example, following the Xen example, you might find a way to add a one or two sentence line to the copy and attribute or reference it to a page in your new Xen section. Wikipedia doesn't help with PageRank, as all of their links are NOFOLLOW's, but we find that Wikipedia drives a lot of referred traffic to our site and along with that, awareness of our offerings.
The Yahoo! Directory is a paid-inclusion directory, so it's not free, but that keeps the directory limited to legitimate sites and that brings trust & legitimacy to your site.
The Open Directory Project is free, but not easy to get into. It's broken into topical sections and each section is tightly controlled by editors. In our experience with the ODP, it took a long time and a lot of effort to get included. Be patience and persevere.
Before September 2007, Google had a "Supplemental Index", which made it clear that some of your content was regarded as redundant or less relevant. Now they have improved their indexing speed and gotten rid of the label as a result. With our QSR magazine site, back in the day of the Supplemental Index, we saw a lot of our stuff in there that was the printer-friendly version of an article or the article under our SSL (the https:// protocol prefix) server. We reacted by adding those URL's to our robots.txt to block or added the NOINDEX meta tag to the appropriate templates.
Those type of examples are benign and left unaddressed, Google claims to do a good job in determining the proper version of the content to index.
The bad type of example is creating multiple versions of the content with different SEO approaches in the hopes of manipulating the rankings. Say you take your Xen virtualization article and create two different files: xen_virtualization.html and virtualization_with_xen.html. If they're the same content, Google will pick one, but more importantly, if Google determines that this approach is a deliberate attempt to game the system, they may delist your site.
With the Supplemental Index gone, we're left to use the Webmaster Tools and Google Analytics to determine what content is less relevant. For example, if you submit a Sitemap through the Webmaster Tools, you'll see a Sitemap Summary where you have a number of Total URLs in the Sitemap and then a number of Indexed URLs. You want equality, or as close to it, between those two numbers. Look at the 'Content Analysis' reports under the 'Diagnostics' menu to help identify what's not being indexed.
This probably goes without saying, but having a webpage with more than 100 links isn't valuable to a user and raises a red flag to Google that you're a link farm. If you have a lot of links to give (such as a site map), consider categorizing and paginating them.
Things to do
QA your site
This also goes for pop-up windows or Flash-based sites, which crawlers will have problems spidering and may miss your content.
A sitemap isn't really intended for web browsers or real people, but it's a way to help guide a crawler through your site to make sure they don't miss anything as well as give them a sense of how your prioritize your content. You could easily spend a couple of days getting it setup and going, but Google offers a generator script (written in Python) that you could install on your server to help construct & reconstruct your sitemap as you make changes. If you're starting from scratch, look to see if your site software already has support to upkeep a sitemap automatically. At QSR magazine, we had our content all over the place, static files, PHP files, db-driven templates and Perl web applications. So we used the Google generator tool to bind it all together.
There's a place in the sitemap specification to specify a priority for a webpage. When I first read that, I was excited and thought this was a way to outrank our competition. It's not. It's a way to prioritize results specifically and only for your site. It also helps tell the crawlers how frequently to index your page. Basically, it's a way to lay out your site and how important its various aspects are.
RSS feeds are important, especially if you update your site's content frequently. MRSS is Media RSS, useful for videos or podcasts. You can have as many RSS feeds as you want. At QSR magazine, we have 25, which allows our users to subscribe to a specific topic-related section or a feature-related section (a favorite columnists, for example) or for the site as a whole. Once you get them going, you can add direct links to the .rss file and the modern browsers will take care of the rest. Depending on your audience, you may want to add an explanation page as well as helper links to add the RSS feed to a web-based reader, like Google Reader, My Yahoo!, Newsgator, Rojo or others. You can also add a link to your RSS file in the head of your HTML, as a <LINK> tag. This adds the commonly-seen RSS icon in the addressbar of the browser.
Submit your site
This is the easiest step. Once you're ready for the world to know about you, if they don't already, you can submit your site to Google, Yahoo and Microsite's Live Search. These are all free and there's no need to pay for submitting your site. From time to time, we'll get unsolicited emails or actual physical mails for site submission services and it's just not necessary. Those are scam artists trying to take advantage of people who don't know better. Also, there are plenty of other search engines out there than Google, Yahoo and Microsoft, but you shouldn't worry about them. They make up less than 10% of the market and it's likely that if you get into the top 3, you'll automatically be found by the rest.
Google Analytics, Google Webmaster Tools, Yahoo Site Explorer
With Google Analytics, pay attention to the obvious metrics (visitors, page views, time on site), but some of the other things we pay attention to are the entrance pages, besides the home page and how people are finding them. Again, I could spend an hour just on Analytics, but instead, I'll just refer you to the resources.
It's also important to set a goal or integrate any e-commerce activities for your site. With QSR magazine, we have a goal of subscribing to the magazine. You can also associate a value for that goal. Again, for QSR magazine, subscribing online means we don't have to spend money for telemarketing, which costs us about $5/person. So $5 is our goal value. For Fine Books & Collections magazine, we integrate our e-commerce activities. Once these type of settings are in place, then you can work backwards and see what drives the most goals or money to your site and then work to optimize them or improve other channels.
Google Webmaster Tools is important because that's where you can submit your Sitemap and see the numbers I referred to earlier, in terms of the number of URLs in your Sitemap and the number of URL's indexed. You can also see the top search queries that you showed up for as well as top queries that led to a click. The tools also gives you content analysis that helps identify duplicate content (duplicated titles, meta description tags, etc). The tools also show you any errors that Googlebot experienced, a way to remove content from the Google index and a testbed to help refine your robots.txt file. Lots of great stuff there.
Yahoo Site Explorer doesn't offer quite the set of tools, but the one neat report it does offer is the set of inlinks, which helps you identify how popular your site it out there and that is one of the metrics that determines your PageRank. Click on the 'Explore' button next to your site. Then click 'Inlinks'. Change the first dropdown to 'Except from this domain' and change the second dropdown to 'Entire Site'. Now you can see who's setting up links to anywhere on your site (not just your homepage). Track the number over time, too.
You could spend time emailing people/site-owners you don't know (cold-calling), but that doesn't seem the best way to spend your time and energy. Instead, approach the problem as how do you get your site out there in front of the site-owners in their natural habitat of searching for stories. Issue a press release. Post your story to del.icio.us and associate it with relevant keywords. Find out what social networks, if any, your audience uses and post your story there, too. Use stock tickers in your story so that financial wires pick it up and show it on financial sites pages of related companies.
One of the benefits of having an RSS feed is that people may pick it up and put it on their own site as a source of stories. This is not a bad thing because the clicks still come to your site. One of our initiatives we're considering next at QSR magazine is using our contacts within specific restaurant chains and seeing if they would be interested in a specific RSS feed that they could incorporate into their portal.
When your site expands beyond 10 pages or so, it's time to develop a site search. It's amazing how the search box has become the user interface, thanks to Google. We put a lot of effort into our site navigation only to see people come to the home page and if they don't see what they're looking for, they use the search. We keep a log of what is searched and this gives us valuable insight into what our audience is looking for. That gives us ideas for how to organize our content or what new content we should focus on.
This is probably the most important aspect to SEO and it was something we didn't really get until we made some initial SEO efforts only to see them not reap the rewards we expected. For example, we had a new section for our site that was similar to a phonebook, but they were listings of suppliers and vendors for the restaurant industry. So we initially set up SEO for the keyword 'restaurant supplier'. At one point, the section was the #8 result in Google, #5 in Yahoo and #4 for MSN. Good, right? We weren't seeing the traffic we were expecting, though. We later discovered the Google Keyword Tool, which can be used to plug in a keyword and it can give you a rough estimate of search volume as well as comparable keywords. That's when we found out that more people were searching for 'restaurant supply' or actually, 'restaurant equipment' than 'restaurant supplier'.
So don't chase or put a lot of effort into invaluable keywords. This might also give you insight into how best to frame your content or evolve your content.
Be the first or be more thorough/comprehensive
Frequently, the question is how can I outrank our competition for <INSERT KEYWORD HERE>? The answer is pretty simple: be first or be more thorough/comprehensive. Provide the best user experience you can in regards to that keyword and it will take care of itself.
You should seriously consider offering video content on your site. It's easy to do, between modern digital cameras taking great video and free video editing software to put it all together. There are plenty of free hosting solutions so your video audience won't affect your server bandwidth. For consumer audiences, look at YouTube or Google Video. For business audiences, look at Google Video or Brightcove. When the video is hosted on those sites, you can keep it private, so it only shows on your site, but I encourage you to let it be shown on those sites as well as embedded elsewhere, just as long as your video is watermarked or branded, has URL displays and has the appropriate copyright protection.
When you publish webpages that contain a video, you need to pay more attention to the meta information than you would with a regular webpage. The page's title, meta keywords/description tags, the filename of the video. Video files can have meta information, themselves, too. Much like the id3 information for mp3 files, you can input the metadata into the video file when you encode it. While crawlers don't seem to be using that metadata yet, it could in the future. If you publish a MRSS file, that can also contain important metadata about your video.
Look into publishing a transcription of the video file and linking to it from the video webpage. A transcription would give the crawlers more information about the video.
I included a copy of the checklist we use at QSR. It's what you might call a pre-flight checklist. Pre-flight because you want to make the best first impression of your webpage before the crawlers show up. It might be a while before they come back to pick up any changes or fixes.
Know your competition
Competitive analysis is also useful. With QSR, our potential advertisers often bring up other sites they're interested in advertising on. So we use competitive analysis to have convenient counter points to whatever site they mention. Staying on top of your competition can also extend to SEO if you know your top priority keywords and who else is in the top 20 search results. Take a monthly snapshot of their rankings on Quantcast, Ranking and Alexa, their PageRank and their number of inlinks. Take a look at the sample Keyword Difficulty report off of seomoz to see what kind of metrics they use in their report.
Evolve and Experiment
Google may seem to be very rigid with their rules and guidelines, but in our experiences with them, we have yet to be blacklisted and we've done a few no-no's, albeit accidentally and ignorantly. Once we were informed, we made the corrections to our site and if we were blacklisted, we could have submitted a request for reentry and explained our situation. Looking at others who were blacklisted, they were able to make the fixes and get reincluded (at the previous PageRank, too). If you find yourself in a vicious loop of Google auto-responding with an ambiguous message to your reinclusion request (something to the extent of "You're still in violation of the Guidelines." and you're clueless as to the specifics, get more eyes on your site. Post a friendly request on the Google Webmaster Group for a peer review.
Proper SEO techniques are not hard to understand, but they are hard to achieve consistently. This is why having a checklist helps because you won't have to re-research all of the aspects to SEO and possibly miss something. It's also not something that can be achieved overnight. They can consume a lot of time, so if SEO is a really high priority, consider a dedicated position on your staff or outsourcing it.
So that's SEO from a very high level and very fast overview. There's a whole lot more to the world of SEO and I've included some further information in my paper's resources section. That said, I feel confident that I've given you enough of an overview to determine whether or not you need to jump further into it. For some companies, they have dedicated staff just to SEO. Others may outsource it to SEO firms/consultants. You could spend a lot of time just on SEO.
So how much time can you afford to spend on SEO, looking at the big picture?
Another question you should ask yourself is how dependent on search engines do you want to be? Once people find you, you'd like them to bookmark your site and start coming to you directly. Or you'd like to spread out the network of sites that link to yours. Look at the Traffic Sources Overview report in Google Analytics. If search engines are generating 70% or more of your traffic, perhaps that's a bad sign that you need to do something to improve the quality and relevance of your site such that people bookmark it or post a link to it.
- Google's Webmaster Guidelines: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=35769
- Yahoo! Directory: http://dir.yahoo.com/
- Open Directory Project: http://www.dmoz.org/
- Matt Cutts' Google/SEO Blog: http://www.mattcutts.com/blog/type/googleseo/
- Matt Cutts' post on cutting corners: http://www.mattcutts.com/blog/traffic-power-ceo-in-jail/
- Google Webmaster Blog: http://googlewebmastercentral.blogspot.com/
- Google Webmaster Group: http://groups.google.com/group/Google_Webmaster_Help
- Google Sitemap Generator: http://code.google.com/p/sitemap-generators/downloads/list
- Google Keyword Tool: https://adwords.google.com/select/KeywordToolExternal
- MRSS Information: http://search.yahoo.com/mrss
- SEOMoz Keyword Difficulty Tool: http://www.seomoz.org/keyword-difficulty
- Beginner's SEO Checklist: http://www.seomoz.org/blog/the-beginners-checklist-for-small-business-seo
- gautility script: http://sourceforge.net/projects/gautility/
- Google Analytics Guru Blog: http://www.kaushik.net/avinash/