Some time ago I explained in my blog what” seo is> , for anyone who starts to ask themselves questions about how to position their website in search engines. Well, today I want to go a step further by telling, as thoroughly as possible, what is the process for a web page to appear in a search engine like Google.
For many people Google is synonymous with the Internet , something that is due to the evolution of the way users browse the network. Oddly enough, before the use of search engines such as Yahoo and Google spread, people learned the web addresses to access them. Currently users go directly to the search engine and enter a query to get what they are looking for.
But before a web page appears in the results of that query, it must first go through the 3 phases that the search engines carry out: crawling , indexing and ranking . Let’s see what each of these phases consist of.
For search engines search engines have the so-called “bots” (robots), which are computer programs that comb the World Wide Web to find new content to offer users. How do they do this?
I explain, from links these robots are discovering all the content of the linked website. So for each link discovered (eye, everything that looks like a link is treated as such) the following process is performed:
- Read all the content of the page in particular
- They collect all the links of said page
- Plan new tracking operations for the pages to which the links point
This process is done recursively, that is, it is retrieving the information of each web and in turn that of all the websites to which it links (as long as the “nofollow” attribute or any other impediment is not in the way examine the web), and this is done indefinitely. The ultimate goal is to locate all the possible content and recover it in the most optimal way in your own system, to show it later to the user.
Best of all, when you finish examining all the pages, go back to the beginning to examine them again for fresh content. A few years ago, Googlebot examined our page every X weeks, but given the freshness that currently characterizes the Internet, search engines have improved their resources to ensure that this process is carried out continuously. Today your website can be examined again in a matter of a few hours thanks to the power of hardware they have. In addition, the indexes are updated as soon as a page is crawled (from Google Caffeine ).
The next phase would be indexing, which consists of creating a content index that allows access to the expected content in an efficient manner. So once the crawling robots are discovering new content, they are copied (only those of the pages allowed through the robots.txt file) and are organized in these indexes. From Google Caffeine, this process is done simultaneously instead of being two different phases.
The textual resources of the web pages are fundamental in this process since they will serve to find matches with the words entered in the search engine, when the ranking process is carried out.
We now go to the most controversial phase that takes place every search engine, the classification of results for the user (ranking). I say controversial, because it is always possible to find in the first results, pages that are not considered precisely the “best” to return in a search query. These websites take advantage of the possible weaknesses of a search engine that has to process more than 20 billion websites with their respective interior pages, for infinite variations of the terms searched by users.
Let’s get to the point, in this phase the goal is to order the indexes of web pages based on the relevance they can have for the user, is not it? Well, how does this make a search engine like Google?
The first thing to say is that for this classification is based on the algorithm PageRank , which obviously has evolved a lot since its birth, but it will explain what Google has in mind when it comes to showing my web page.
What is page rank?
PageRank was devised by Larry Page and Sergey Brin, taking as reference the ranking systems of scientific publications in which the most important authors are those cited by other colleagues in their field.
With this starting point, we find a scenario in which a web page is more relevant if it has a large volume of links from other pages (it has been heavily referenced). Well, with nuances. Because currently the most important thing is not the number of links, but these links are of quality. What does quality mean? Well basically the links are natural (not achieved by spamming or other fraudulent practices), that these are mostly from sector pages, that these in turn have a certain reputation … etc. Logically, the more links of better quality, the more chances of winning those who compete for the main positions, but the quantity is no longer a key factor.
How does Google order the results?
To sort the results of a search, in addition to the incoming links, Google takes into account many other factors that do not derive directly from the so-called PageRank that we have just seen. Factors that have to do with the relationship of the keywords with the search query are determinants.
For example, if we search for “buy clothes online”, the search engine will be left of its indexes with the pages that have those 3 words in important elements of the web such as the title, the web address, the description, the header … etc . It will give priority to web pages with better profile of incoming links. And of course this valuation also depends on other factors that Google has been adding (they are more than 200).
So from now on, every time you ask the question “how to position my website”, take a look at the process that your page will follow and help optimize it.