It’s very interesting to know in which order Googlebot crawls your website. If you know the order, you can manipulate Google to crawl your most important pages first. Especially on new large content websites, you want Google to crawl the important sections of your site first.
To test this, we took a domain with no history (never registered, no backlinks) and made a page with 250 links on it. Those links refer to pages with also 250 on it (and so on…). The linktext and URL’s were numbered from 1 till 250, In the same order as they appeared in the source code. We submitted the URL via “addurl” and waited.
On the first visit of Googlebot it only visited the root page (http://example.com/). After a few hours it returned and visited all the 250 pages it found on the root page. At first it seemed that Google was dividing all the links on the pages into three blocks:
- Block1: links 1 till 9
- Block2: links 10 till 99
- Block3: links 100 till 250
If Googlebot visits a page there is a good chance that it will follow links from one or more of the blocks. The links are crawled in a batch per block in a random order. The chances that block 1 is crawled is three times greater then block 2 and even 6 times greater then block 3. In block 3 there are two links which have a slightly higher chance to be crawled: link /100/ and /200/.
With only the results of this test it is to early to conclude that Google always divides a page into these blocks. It is possible that Google bases its blocks on the length of the link text, on the length of a URL, on the position of the link etcetera. To exclude al the other possibilities we have set up some other tests.
The test which gave us the decisive answer was a test with the length of URL’s in random order on a page. Those links linked to similar pages with also links with various lengths in random order (etcetera). Googlebot seems to crawl links in order of length. A good thing to know if you want to do some advanced site structure sculpting.
Takeaways
With these insights choosing the length of your URL’s becomes more important. It is a good way to influence Googlebot so you should choose the length of each URL wisely. Google crawls short URL’s before it crawls the longer URL’s. It doesn’t help to make all your URL’s short. If they all have an equal length they will get crawled at random. Always consider the need for indexation and the need to be crawled if you choose your URL.
So it’s good to take it into account when you are designing your site structure. Probably it is even more important for linkbuilding. We have not tested it yet, but there is a good chance Google crawls external links the same way. This would mean that if your URL is on a page with hundreds of other links, you may increase your crawl chance if you have the shortest URL.
note: This insight is part of a thesis, more to come soon. (link will be placed here.)











