Technology

Googlebot Starting To Crawl Sites Over Http/2

Googlebot Starting To Crawl Sites Over Http/2

Now recognized browsers begin supporting the next major version of HTTP, HTTP/2(Crawl Sites Over Http/2), or h2 for short, web experts questioned us whether Googlebot can crawl over the upgraded, more modern version of the protocol. 

What Is Http/2

As we discussed, it’s the next major version of HyperText Transfer Protocol, the protocol the internet is basically used for transferring data. HTTP/2 is much more efficient, robust, and faster than its predecessor, due to its architecture and the features it implements for clients (for example, your browser) and servers. 

Why We’re Making This Change

Normally, we expect this change to make crawling more fast and efficient in terms of server resource usage. Googlebot with heading h2 is able to create a single TCP connection to the server and efficiently transfer multiple files over it in parallel, instead of requiring multiple connections. The fewer connections established, the fewer resources the server and Googlebot have to spend on crawling.

How It Works

In step 1, we will crawl a small number of sites over heading 2, and we’ll ramp up gradually to more sites that may benefit from the initially supported features, like request multiplexing.

Googlebot acknowledges which site to crawl over h2 based on whether the site supports h2, and whether the site and Googlebot would benefit from crawling over HTTP/2. If your server works with h2 and Googlebot already crawls a lot from your site, you may be already eligible for the connection upgrade, and you don’t have to do anything.

If your server still only work with HTTP/1.1, that’s also fine. There is no other drawback for crawling over this protocol. Crawling will remain the same, quality and quantity-wise.

How To Opt-out

Our primary tests showed no negative impact on indexing, but we understand that, for various reasons, you may want to opt your site out from crawling over HTTP/2. You can do that by giving commands to the server to respond with a 421 HTTP status code when Googlebot attempts to crawl your site over h2. If that’s not working well at the moment, you can send a message to the Googlebot team (however, this solution is temporary).

Earlier versions of the HTTP protocol were intentionally designed for simplicity of implementation: HTTP/0.9 was a one-line protocol to bootstrap the World Wide Web; HTTP/1.0 documented the popular extensions to HTTP/0.9 in an informational standard; HTTP/1.1 introduced an official IETF standard. Basically, HTTP/0.9-1.x delivered exactly what it set out to do and HTTP is one of the most widely used application protocols on the Internet.

Unfortunately, implementation simplicity also came at a cost of application performance: HTTP/1.x clients need to use multiple connections to achieve concurrency and reduce latency; HTTP/1.x does not diminish request and response headers causing unnecessary network traffic. HTTP/1.x does not go through effective resource prioritization, resulting in poor use of the underlying TCP connection and so on.

These limitations were not final, but as the web applications continued to grow in their scope, complexity, and importance in our everyday lives, they imposed a growing burden on both the users and the developers of the web, which is the exact gap that HTTP/2 was designed to address:

It is important to know that HTTP/2 is extending, not replacing, the previous HTTP standards. The application semantics of HyperText Transfer Protocol is similar, and no changes were made to the offered functionality or core concepts such as HTTP methods, status codes, URIs, and header fields. These changes were outside the scope for the HTTP/2 effort. Although the high-level API remains the same, it is important to understand how the low-level changes address the performance limitations of the previous protocols. 

Binary Framing Layer

At the start of all performance enhancements of HTTP/2 is the new binary framing layer, which dictates how the HTTP messages are encapsulated and transferred between the client and server.

The ‘layer’ is recognized to a design solution to introduce a new optimized encoding mechanism between the socket interface and the higher HTTP API exposed to our applications: the HTTP semantics, such as methods, verbs, and headers, are unaffected, but the way they are encoded while in transit is different. 

Conclusion

We all know that Googlebot Starting To crawl Sites Over Http/2. Above in this blog, we have discussed all important information related to Crawl sites.

You May Also Read:-

A community for web creators to grow and get inspired
Top 10 2D Animation Tools for Linux
Why Python is Great for Machine Learning?
FAUJI Game is the next PUBG!
You Need To Know About React v17.0
Top tools to become a Devops Engineer

Realted Post