Search Engine SEO Theory
SEARCH ENGINE OPERATION
A search engines operate, in the following order
1. Web crawling: performed by a “robot,” basically a mathematical algorithm (conditions for solving a problem) based on if then else statements.
2. Indexing: viewing web pages in text format and caching (saving them) to Search Engine’s server.
3. Searching: when a Search Engine searches the “web” it’s actually only searching stored pages on its own server and cross-referencing them with its algorithm per the keywords that are entered into the search box.
Google adds one more step in their algorithm; they basically add or subtract points on the following arbitrary if then else criteria:
a. Meta Tags
b. XHTML Tags (Alt, Title Tags, Long Title Tags)
c. Whether meta tag keywords appear more than 2x in content and whether hyperlinks
incorporate keywords (anchored/linked text).
d. Where keywords appear in content
e. Whether short tail keywords appear within long tail keywords
Short Tail Keywords: between 2-5 words in a phrase ie. “Permissive Marketing”
Long Tail Keywords: between 5-10 words in a phrase ie. “Permissive Marketing and Advertising Companies”
Note: Adding Words in between. Keywords within content does NOT affect those words; however, adding characters to the words does affect them and will actually count as a separate keywords
ie. SEO Savvy Web Design Firm = SEO Web Design
SEO Firms does NOT = SEO Firm
f. Subpage Names containing keywords from index ie. phorsite.com/new-media.html
g. Dynamic Content and Syndication Services within Java, PHP or .NET and RSS or Atom Feeds
This may include something as simple as a Java news ticker
h. Of course, Page Rank Inbound Links with d and e. For more on this see our section on Inbound Links
i. XHTML Content
j. robots.txt file with XML sitemap
k. favicon.ico
l. Google analytics Urchin (.gs) tracker
m. Google’s new Universal search: a matrix sorted by keyword criteria per IP address and cross-
referencing with human edited reviews through DMOZ, DIGG, and Yahoo!. This also includes Gmail subject matter, Google Desktop user history, and Google’s search within a cached search. Combating this is other animal can be performed through what we call, SEO 2.0.
n. Duplicate content, code or Footprints. For this they will deduct all the points above. An SEO footprint is the imprint you leave on the internet that can be used to trace your activity through various sites. It can be used to locate multiple accounts and multiple sites you own. If you’re an SEO tech, this is important. A footprint is an obvious sign of search engine manipulation and can be used by Google or competitors to rip apart your network. If your site was originally designed by a company or person you contracted it is more often than not a template, and in the case of templates the source code will be duplicate code. If the content was written by an SEO firm or web design firm it will most definitely be duplicate content, as most of these companies don’t have the desire to become copy writers.
XHTML
- XHTML elements must be properly nested
- XHTML elements must always be closed
- XHTML elements must be in lowercase
- XHTML documents must have one root element
Empty Elements Must Also Be Closed
Empty elements must either have an end tag or the start tag must end with />.
This is wrong:
A break: <br>
A horizontal rule: <hr>
An image: <img src=”happy.gif” alt=”Happy face”> |
This is correct:
A break: <br />
A horizontal rule: <hr />
An image: <img src=”happy.gif” alt=”Happy face” /> |
XHTML Elements Must Be In Lower Case
The XHTML specification defines that the tag names and attributes need to be lower case.
This is wrong:
<BODY>
<P>This is a paragraph</P>
</BODY> |
This is correct:
<body>
<p>This is a paragraph</p>
</body> |
XHTML Elements Must Be Properly Nested
In HTML, some elements can be improperly nested within each other, like this:
<b><i>This text is bold and italic</b></i> |
In XHTML, all elements must be properly nested within each other, like this:
<em><strong>This text is bold and italic</strong></em> |
Note: A common mistake with nested lists, is to forget that the inside list must be within <li> and </li> tags.
This is wrong:
<ul>
<li>Coffee</li>
<li>Tea
<ul>
<li>Black tea</li>
<li>Green tea</li>
</ul>
<li>Milk</li>
</ul> |
This is correct:
<ul>
<li>Coffee</li>
<li>Tea
<ul>
<li>Black tea</li>
<li>Green tea</li>
</ul>
</li>
<li>Milk</li>
</ul> |
Not sure how to add an ALT or TITLE to your HTML tags? Try these examples:
<img src=”cafeteria.jpg” height=”200″ width=”200″ alt=”UAHC campers enjoy a meal in the camp cafeteria”>
<table width=”100″ border=”2″ title=”Henry Jacobs Camp summer 2003 schedule”>
<a href=”page1.html” title=”HS Jacobs – a UAHC camp in Utica, MS”>Henry S. Jacobs Camp</a>
<form name=”application” title=”Henry Jacobs camper application” method=” ” action=” “>
Web search engines work by storing information about many web pages, which they retrieve from the WWW itself. These pages are retrieved by a Web Crawler (also known as a spider) — an automated Web browser which follows every link it sees. Exclusions are made by the use of robots.txt, although Google certainly isn’t controlled by a robots.txt file or any coding in a website. The contents of each page are then analyzed to determine how it should be indexed (for example, words are extracted from the titles, headings, or special fields called meta tags). Data about web pages are stored in an index database for use in later queries. Some search engines, such as Google, store all or part of the source page (referred to as a cache) as well as information about the web pages, whereas others, such as AltaVista, store every word of every page they find. This cached page always holds the actual search text since it is the one that was actually indexed, so it can be very useful when the content of the current page has been updated and the search terms are no longer in it. This problem might be considered to be a mild form of linkrot, and Google’s handling of it increases usability by satisfying user expectations that the search terms will be on the returned webpage. This satisfies the principle of least astonishment since the user normally expects the search terms to be on the returned pages. Increased search relevance makes these cached pages very useful, even beyond the fact that they may contain data that may no longer be available elsewhere.
When a user enters a query into a search engine (typically by using key words), the engine examines its index and provides a listing of best-matching web pages according to its criteria, usually containing a short summary containing the document’s title and sometimes parts of the text.
Link rot is the process by which links on a website gradually become irrelevant or broken as time goes on, because websites that they link to disappear, change their content or redirect to new locations.
The phrase also describes the effects of failing to update web pages so that they become out-of-date, containing information that is old and useless, and that clutters up search engine results. This process most frequently occurs in personal web pages and is prevalent in free web because there is no financial incentive to fix link rot.
When a user enters a query into a search engine (typically by using key words), the engine examines its index and provides a listing of best-matching web pages according to its criteria,
A recent enhancement to search engine technology is the addition of geocoding and geoparsing to the processing of the ingested documents being indexed, to enable searching within a specified locality (or region). Geoparsing attempts to match any found references to locations and places to a geospatial frame of reference, such as a street address, gazetteer locations, or to an area (such as a polygonal boundary for a municipality). Through this geoparsing process, latitudes and longitudes are assigned to the found places, and these latitudes and longitudes are indexed for later spatial query and retrieval. This can enhance the search process tremendously by allowing a user to search for documents within a given map extent, or conversely, plot the location of documents matching a given keyword to analyze incidence and clustering, or any combination of the two.
Below are examples of what we see through a browser like Firefox or Internet explorer and what a search engine actually sees:
|