In today’s internet, search engine optimization is a term that is widely thrown around as something very important. But why? It is to help the person looking for a resource to find it quicker and easier and it is also important for businesses, small and big alike, to get in front of the front page of Google so there is more chance of someone clicking on their website.

When you search for something on a search engine like Google, three main processes happen to deliver the search results to you- Crawling, Indexing and Ranking.

  • Crawling is the process by which the search engine discovers new and updated web pages using a complex algorithmic program called by ‘crawler’, ‘bot’ or ‘spider’.
  • Once this crawling process is complete, the search engine compiles a massive index of all the words and locations on and of the webpage. It can be thought of as the database of millions of web pages. This content is further processed using algorithms to determine its importance relative to similar pages on the internet.
  • Ranking is basically giving a webpage a score based on a keyword and different ranking signals that are derived from algorithms.

But for this article, we shall look into Indexing in a little bit more detail.

The search engine index has many parts to it, like design factors and data structures. The design factor is responsible for outlining the architecture and layout of the index.  When the index is being built it must be based on a data structure. Each data structure has different advantages than the other. The common data structures include Tree, Inverted Index, Citation Index, Term document matrix, Suffix Tree and Ngram index.

The other important parts of a search engine index like:

  • Merge factors- Used to decide whether data being entered is new data, or data is being updated. It also decides how this data enters the index.
  • Index size- This refers to how much memory is needed by the server for the index.
  • Storage Techniques- This is used to decide how the information in the index is stored like compression of large files and filtering of smaller files.
  • Fault Tolerance- This refers to the reliability of search engine index.
  • Lookup speed- This refers to how fast a word can be found in a search engine index when searched for.
  • Maintenance- Properly maintained search engines indexes work better than ones that are not.

As previously mentioned, the search engine compiles a massive database of websites. These are stored in servers across the world and take up a lot of space for both sorting and storing. For this reason, companies like Google and Microsoft both have one million servers each. If you are someone running their business on the internet it is imperative to have good SEO done as it directly correlates to people visiting your website.