Technical SEO Guide: From Crawling to Indexing – Making Search Engines Understand Website (Part 1)

Post Time: Mar 29, 2026

Update Time: Apr 12, 2026

Article.Summary

Master technical SEO with this complete guide covering crawlability, indexability, Core Web Vitals, structured data, XML sitemaps, and mobile-first indexing. Learn how to build a search-engine-friendly website infrastructure.

What is Google technical SEO?

Simply put, it is the practice of enabling search engines to find your pages, understand your content, and correctly index your website.

Google technical SEO

Many people think technical SEO is mysterious. It involves code, configurations, and a host of confusing terms.

In reality, it is not. The core logic of technical SEO is simple: Google is a robot. It needs to crawl your website, understand your content, and store your pages in its database. Your job is to make this process as smooth as possible.

If Google cannot crawl your pages, even the best content is useless. If Google cannot understand your page structure, it will not know where to rank you. If your website is slow or offers a poor mobile experience, Google will directly lower your rankings.

According to research by SEMrush, over 80% of websites have technical SEO issues. Many of these are fundamental – incorrect robots.txt configurations, missing canonical tags, and excessively slow page speeds.

This article will start from the most basic crawling mechanisms and go all the way to the cutting edge of AI search optimization. Whether you are a beginner or an experienced SEO professional, you will find useful information here.

Let us begin.

The Underlying Logic of Google Technical SEO: Crawling, Rendering, Indexing

Before diving into specific operations, it is important to understand how Google works.

Step 1: Crawling

Google has a crawler program called Googlebot. Its job is to constantly visit web pages and fetch their content.

How does Googlebot discover new pages?

Through links on known pages
Through XML Sitemaps
Through manual submission in Google Search Console
Through links from third-party websites pointing to your site

Once Googlebot discovers a URL, it adds it to the crawl queue. However, not all URLs are crawled immediately. Google decides the crawl priority based on the page's importance, update frequency, and the website's crawl budget.

Step 2: Rendering

After crawling the HTML, Google needs to render the page – execute JavaScript, load CSS, and generate the final Document Object Model (DOM).

This step is crucial. If your website heavily relies on JavaScript to generate content (for example, React, Vue, or Angular single-page applications), Google needs additional time and resources to render. According to official Google documentation, rendering can be delayed from a few seconds to several days.

Step 3: Indexing

After rendering, Google analyzes the page content, extracts key information (titles, body text, links, structured data, etc.), and then decides whether to include the page in its index.

The index is Google's database. Only pages that are indexed can appear in search results.

The entire process:

txt Copy

1URL Discovery → Added to Crawl Queue → HTML Crawled → Page Rendered → Content Analyzed → Added to Index → Included in Rankings
2
3

The goal of technical SEO is to ensure every step in this process runs smoothly.

Robots.txt: The First Door for Controlling Crawlers

Robots.txt is a text file placed in your website's root directory ([example.com/robots.txt]). It tells search engine crawlers which pages can be crawled and which cannot.

Basic Syntax

Copy

1User-agent: *
2Disallow: /admin/
3Disallow: /cart/
4Disallow: /checkout/
5Allow: /
6
7Sitemap: https://example.com/sitemap.xml
8
9

Explanation:

User-agent: * — applies to all crawlers
Disallow: /admin/ — blocks crawling of all pages under the /admin/ directory
Allow: / — allows crawling of all other pages
Sitemap: — tells crawlers the location of the Sitemap

Common Robots.txt Errors

Error	Consequence	Correct Approach
Disallow: / (blocking all crawling)	Entire website disappears from search results	Only block directories that do not need indexing
Blocking CSS/JS files	Google cannot render the page, affecting rankings	Allow crawling of CSS and JS
Blocking image directories	Images do not appear in Google Images	Allow crawling of images
Forgetting to modify development environment	Entire site blocked after going live	Check robots.txt before launch
Using robots.txt to prevent indexing	Page may still be indexed (just not crawled)	Use noindex tag to prevent indexing

The last point is particularly important: robots.txt can only prevent crawling, not indexing. If another website links to one of your pages, Google might index that URL without crawling it (showing only the URL without a content summary). To truly prevent indexing, you must use the noindex tag.

Robots.txt for Different Platforms

WordPress: robots.txt is generated automatically by default and can be customized using plugins like Rank Math or Yoast SEO.
Shopify: robots.txt is generated automatically and cannot be edited directly. However, starting in 2021, limited customization is possible through the robots.txt.liquid template.
Custom Websites: Create the robots.txt file manually and place it in the website's root directory.

Robots.txt for AI Crawlers

Between 2024 and 2025, AI crawlers have become a new issue. OpenAI's GPTBot, Anthropic's ClaudeBot, and Google's Google-Extended – these AI crawlers scrape your content to train their models.

If you do not want AI crawlers to scrape your content, you can add content to robots.txt as follows:

Copy

1User-agent: GPTBot
2Disallow: /
3
4User-agent: ClaudeBot
5Disallow: /
6
7User-agent: Google-Extended
8Disallow: /
9
10

However, note that blocking AI crawlers may affect your visibility in AI search results. This is a trade-off.

XML Sitemap: Giving Google a Map

An XML Sitemap is a file that lists all the important pages on your website. It helps Google discover and understand your website's structure.

A Sitemap is not a ranking factor. Having a Sitemap will not make you rank higher. However, it helps ensure Google is aware of all your important pages.

1. Basic Sitemap Format

xml Copy

1<?xml version="1.0" encoding="UTF-8"?>
2<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
3  <url>
4    <loc>https://example.com/page-1/</loc>
5    <lastmod>2025-01-15</lastmod>
6    <changefreq>monthly</changefreq>
7    <priority>0.8</priority>
8  </url>
9</urlset>
10

Where:

loc: Page URL (required)
lastmod: Last modification date (recommended; Google references this)
changefreq: Update frequency (Google largely ignores this field)
Priority: Priority (Google also largely ignores this)

In practice, you only need to focus on loc and lastmod.

2. Sitemap Best Practices

Rule	Explanation
Include only pages that need indexing	Do not put [noindex] pages, redirected pages (301/302), or 404 error pages in the Sitemap. These waste crawl budget and send mixed signals to search engines.
URLs in the Sitemap must be canonical URLs	If a page has a canonical tag pointing to another URL, the Sitemap should include the canonical URL—not the duplicate or variant URL.
Maximum 50,000 URLs per Sitemap	Each Sitemap file cannot exceed 50,000 URLs. If your site exceeds this limit, split URLs across multiple Sitemap files and use a Sitemap Index file ([sitemap_index.xml]) to aggregate them.
Sitemap file size not exceeding 50MB	Uncompressed file size must stay under 50MB. For large Sitemaps, submit compressed files ([.xml.gz]) to reduce bandwidth and improve processing speed.
[lastmod] should be accurate	Update the [] tag only when the page content actually changes. Do not automatically update all pages daily—this creates unnecessary crawl demand and reduces trust signals with search engines.
Declare Sitemap location in [robots.txt]	Add a [Sitemap] directive to your [robots.txt] file to help search engines discover your Sitemap location. Format: [Sitemap: https://example.com/sitemap.xml]
Submit in Google Search Console	After publishing your Sitemap, submit it via Google Search Console (or Bing Webmaster Tools). Monitor the Index status report to verify that pages are being discovered and indexed correctly.

3. Sitemap Strategy for Large Websites

If your website has tens of thousands or even hundreds of thousands of pages, you need to organize them using a Sitemap Index file:

xml Copy

1<?xml version="1.0" encoding="UTF-8"?>
2<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
3  <sitemap>
4    <loc>https://example.com/sitemap-products.xml</loc>
5    <lastmod>2025-01-15</lastmod>
6  </sitemap>
7  <sitemap>
8    <loc>https://example.com/sitemap-categories.xml</loc>
9    <lastmod>2025-01-10</lastmod>
10  </sitemap>
11  <sitemap>
12    <loc>https://example.com/sitemap-blog.xml</loc>
13    <lastmod>2025-01-14</lastmod>
14  </sitemap>
15</sitemapindex>
16
17

Split the Sitemap by page type (products, categories, blog, static pages) to facilitate management and monitoring.

4. Sitemap for Different Platforms

WordPress + Rank Math: Sitemap is generated automatically. You can control which content types are included in the Sitemap within Rank Math settings. The path is usually /sitemap_index.xml.
Shopify: Sitemap is generated automatically at the path /sitemap.xml. It cannot be customized, but Shopify's default Sitemap is sufficient.
Custom Websites: Generate using tools like Screaming Frog or Sitebulb, or generate dynamically with code.

Canonical Tags: The Tool for Resolving Duplicate Content

Duplicate content is one of the most common technical SEO issues. When the same content appears across multiple URLs, Google does not know which one to index.

The canonical tag tells Google which version among these duplicate pages is the "master" copy.

1. When Are Canonical Tags Needed?

Common duplicate content scenarios:

URL parameters: example.com/product and example.com/product?ref=email are the same page
HTTP/HTTPS: http://example.com and https://example.com
www/non-www: www.example.com and example.com
Trailing slashes: example.com/page and example.com/page/
Uppercase/lowercase: example.com/Page and example.com/page

Pagination: example.com/blog and example.com/blog?page=1

Sorting/filtering: example.com/products?sort=price and example.com/products?sort=name
Cross-domain content: Your article republished on other websites

2. Using Canonical Tags

Add the following in the page's section:

html Copy

1<link rel="canonical" href="https://example.com/preferred-url/" />
2
3

Every page should have a canonical tag, including self-referencing canonicals (pointing to itself).

3. Common Canonical Tag Errors

Error	Consequence	Correct Approach
All pages canonical pointing to homepage	All pages except the homepage disappear from the index	Each page points to its own canonical URL (self-referential canonical)
Canonical URL is a 404 page	Google ignores the canonical tag	Ensure the canonical URL returns a 200 (OK) status code
Canonical URL blocked by robots.txt	Google cannot verify the canonical relationship	Ensure the canonical URL can be crawled (not blocked by robots.txt)
Canonical chain (A → B → C)	Google may ignore the chain or follow inconsistently	Point directly to the final (master) URL—avoid chains
Canonical and noindex used together	Conflicting signals confuse Google's indexing decision	Choose one strategy: either canonical to the master version OR noindex—never both
HTTP canonical on an HTTPS page	Protocol mismatch creates confusion and may be ignored	Always use HTTPS for canonical URLs when the site uses HTTPS

Important Note: The canonical tag is a "suggestion," not a "directive." Google may ignore your canonical tag and choose a URL it considers more appropriate as the canonical. If you find Google choosing the wrong canonical, you need to check whether internal links, the Sitemap, and external links all point to the correct URL.

Website Architecture and URL Structure

Website architecture determines how Google understands your website. A good architecture allows Google to crawl all pages easily; a poor one leaves Google lost.

1. Flat Architecture

The ideal website architecture is flat – any page should be reachable from the homepage within three clicks.

text Copy

1Homepage
2├── Category A
3│   ├── Product A1
4│   ├── Product A2
5│   └── Product A3
6├── Category B
7│   ├── Product B1
8│   └── Product B2
9└── Blog
10    ├── Article 1
11    └── Article 2
12
13

Problems with overly deep issues:

Google crawlers may not reach deep pages
Deep pages receive less internal link equity (PageRank)
Users have difficulty finding deep content

2. URL Structure Best Practices

URLs are the foundation of technical SEO. Good URL structure:

Principle	Good URL	Poor URL
Short	/ball-valves/	/products/category/industrial/ball-valves/stainless-steel/
Descriptive	/stainless-steel-ball-valve/	/product-12345/
Hyphen-separated	/ball-valve/	/ball_valve/ or /ballvalve/
Lowercase	/ball-valve/	/Ball-Valve/
No parameters	/ball-valves/	/products?cat=5&sort=price
Contains keywords	/link-building-guide/	/post-2025-01-15/

3. URL Changes and Redirects

Once a URL is established, try not to change it. Each time you change a URL, you need to set up a 301 redirect and may experience short-term ranking fluctuations.

If you must change a URL:

Set up a 301 redirect (permanent redirect) from the old URL to the new URL
Update all internal links to point to the new URL
Update the Sitemap
Monitor in Google Search Console
Keep the 301 redirect in place for at least one year

301 vs. 302 Redirects:

301: Permanent redirect. Tells Google the old URL has been permanently moved to the new URL; link equity is transferred.
302: Temporary redirect. Tells Google the old URL is only temporarily redirected; link equity is not transferred (or very little is transferred).

In most cases, you should use 301. Only use 302 when the page is genuinely temporary (such as for A/B testing or temporary maintenance).

Page Speed and Core Web Vitals

In 2021, Google officially incorporated Core Web Vitals into its ranking factors. Page speed is no longer just "nice to have"; it is "must-have."

1. The Three Core Metrics

Metric	What It Measures	Good	Needs Improvement	Poor
LCP (Largest Contentful Paint)	Loading time of the largest content element (e.g., hero image, main heading)	≤ 2.5 seconds	2.5–4 seconds	> 4 seconds
INP (Interaction to Next Paint)	Delay from user interaction (click, tap, keypress) to visual page response	≤ 200 ms	200–500 ms	> 500 ms
CLS (Cumulative Layout Shift)	Visual stability—unexpected layout shifts during page load	≤ 0.1	0.1–0.25	> 0.25

Note: In March 2024, Google replaced FID (First Input Delay) with INP (Interaction to Next Paint). INP measures the responsiveness to all interactions throughout the page's lifecycle, making it more comprehensive than FID.

2. Optimizing LCP

LCP is typically the largest image or text block on the page. Methods to optimize LCP:

Optimize server response time (TTFB): Use good hosting, enable caching, use a CDN
Optimize the largest content element: If the LCP element is an image, compress it, use WebP format, and set appropriate dimensions
Preload LCP resources:
Reduce render-blocking resources: Inline critical CSS, defer non-critical JavaScript
Avoid client-side rendering: If LCP content requires JavaScript to display, consider server-side rendering

3. Optimizing INP

INP measures how quickly a page responds to user interactions. Clicking buttons, typing into fields, selecting dropdown menus – how fast does the page provide visual feedback after these interactions?

Optimizing INP:

Reduce main thread blocking: Split long tasks using requestIdleCallback or scheduler.yield()
Reduce JavaScript execution time: Remove unnecessary JavaScript, defer loading third-party scripts
Optimize event handlers: Avoid complex calculations in event handlers
Reduce DOM size: The more DOM nodes, the slower the interaction response

4. Optimizing CLS

CLS measures unexpected movement of elements during page load. You are reading a paragraph when suddenly an ad loads and pushes the text down – that is a layout shift.

Common CLS issues and solutions:

Issue	Cause	Solution
Image loading causes shift	Image has no width/height attributes set	Set [width] and [height] attributes on [img] tags
Ad loading causes shift	Ad space has no reserved area	Set fixed dimensions for ad containers
Font loading causes shift	Web font replaces system font with different size	Use [font-display: swap] with matching fallback fonts
Dynamic content insertion	Content inserted after JavaScript loads	Reserve space or use CSS [contain] property
Iframe loading	Iframe has no dimensions set	Set fixed [width] and [height] attributes for iframes

Measurement Tools

Tools for measuring Core Web Vitals:

Google PageSpeed Insights: Most commonly used; displays both lab data and field data
Google Search Console: Core Web Vitals report showing the site-wide CWV status
Chrome DevTools: Performance panel for detailed analysis
Web Vitals Chrome Extension: Real-time display of CWV data for the current page
Lighthouse: Chrome's built-in auditing tool

Important Distinction: Lab Data vs. Field Data.

Lab Data: Measured in a simulated environment; results may vary each time; used for debugging
Field Data (CrUX): Data from real Chrome users; this is what Google uses for ranking

If your lab data is good but field data is poor, it means your real users have devices or network conditions worse than your simulated environment. Optimization must target low-end devices and slow networks.

Mobile Optimization and Mobile-First Indexing

Starting in 2023, Google fully transitioned to Mobile-First Indexing. This means Google primarily uses the mobile version of your website to determine rankings.

If your mobile version lacks content, loads slowly, or offers a poor experience, your rankings will suffer – even if the desktop version is perfect.

1. Requirements for Mobile-First Indexing

Content consistency: Content on mobile and desktop must be identical. Do not hide content on mobile
Structured data consistency: Schema markup on mobile and desktop must be identical
Meta tag consistency: Titles, descriptions, and robots tags must be identical on both versions
Image consistency: Mobile images must have alt text; format and quality should not be inferior to desktop images

2. Responsive Design vs. Separate Mobile Site

Google recommends responsive design – a single URL that automatically adjusts its layout based on screen size.

Separate mobile sites (m.example.com) are not recommended, because:

Two sets of content need maintenance
Correct canonical and alternate tags need configuration
Content inconsistencies are common
Technical complexity increases

If you are still using a separate mobile site, it is strongly recommended to migrate to a responsive design.

Mobile Technical Checklist

Viewport meta tag:
Font size: At least 16px (to avoid requiring users to zoom)
Touch targets: At least 48x48 pixels, with at least 8px spacing
No Flash: (largely obsolete, but some older sites still use it)
No horizontal scrolling layout
Form inputs: Use appropriate input types (email, tel, number, etc.) to trigger the correct keyboard

For more, please read this article:

Click The Part 2 of Google Technical SEO for tailored content reading.

Technical SEO Guide: From Crawling to Indexing – Making Search Engines Understand Website (Part 1)

Post Time: Mar 29, 2026

Update Time: Apr 12, 2026

Tutorial

Article.Summary

What is Google technical SEO?

Simply put, it is the practice of enabling search engines to find your pages, understand your content, and correctly index your website.

Google technical SEO

Many people think technical SEO is mysterious. It involves code, configurations, and a host of confusing terms.

Let us begin.

The Underlying Logic of Google Technical SEO: Crawling, Rendering, Indexing

Before diving into specific operations, it is important to understand how Google works.

Step 1: Crawling

Google has a crawler program called Googlebot. Its job is to constantly visit web pages and fetch their content.

How does Googlebot discover new pages?

Through links on known pages
Through XML Sitemaps
Through manual submission in Google Search Console
Through links from third-party websites pointing to your site

Step 2: Rendering

After crawling the HTML, Google needs to render the page – execute JavaScript, load CSS, and generate the final Document Object Model (DOM).

Step 3: Indexing

After rendering, Google analyzes the page content, extracts key information (titles, body text, links, structured data, etc.), and then decides whether to include the page in its index.

The index is Google's database. Only pages that are indexed can appear in search results.

The entire process:

txt Copy

1URL Discovery → Added to Crawl Queue → HTML Crawled → Page Rendered → Content Analyzed → Added to Index → Included in Rankings
2
3

The goal of technical SEO is to ensure every step in this process runs smoothly.

Robots.txt: The First Door for Controlling Crawlers

Robots.txt is a text file placed in your website's root directory ([example.com/robots.txt]). It tells search engine crawlers which pages can be crawled and which cannot.

Basic Syntax

Copy

1User-agent: *
2Disallow: /admin/
3Disallow: /cart/
4Disallow: /checkout/
5Allow: /
6
7Sitemap: https://example.com/sitemap.xml
8
9

Explanation:

User-agent: * — applies to all crawlers
Disallow: /admin/ — blocks crawling of all pages under the /admin/ directory
Allow: / — allows crawling of all other pages
Sitemap: — tells crawlers the location of the Sitemap

Common Robots.txt Errors

Error	Consequence	Correct Approach
Disallow: / (blocking all crawling)	Entire website disappears from search results	Only block directories that do not need indexing
Blocking CSS/JS files	Google cannot render the page, affecting rankings	Allow crawling of CSS and JS
Blocking image directories	Images do not appear in Google Images	Allow crawling of images
Forgetting to modify development environment	Entire site blocked after going live	Check robots.txt before launch
Using robots.txt to prevent indexing	Page may still be indexed (just not crawled)	Use noindex tag to prevent indexing

Robots.txt for Different Platforms

WordPress: robots.txt is generated automatically by default and can be customized using plugins like Rank Math or Yoast SEO.
Shopify: robots.txt is generated automatically and cannot be edited directly. However, starting in 2021, limited customization is possible through the robots.txt.liquid template.
Custom Websites: Create the robots.txt file manually and place it in the website's root directory.

Robots.txt for AI Crawlers

Between 2024 and 2025, AI crawlers have become a new issue. OpenAI's GPTBot, Anthropic's ClaudeBot, and Google's Google-Extended – these AI crawlers scrape your content to train their models.

If you do not want AI crawlers to scrape your content, you can add content to robots.txt as follows:

Copy

1User-agent: GPTBot
2Disallow: /
3
4User-agent: ClaudeBot
5Disallow: /
6
7User-agent: Google-Extended
8Disallow: /
9
10

However, note that blocking AI crawlers may affect your visibility in AI search results. This is a trade-off.

XML Sitemap: Giving Google a Map

An XML Sitemap is a file that lists all the important pages on your website. It helps Google discover and understand your website's structure.

A Sitemap is not a ranking factor. Having a Sitemap will not make you rank higher. However, it helps ensure Google is aware of all your important pages.

1. Basic Sitemap Format

xml Copy

1<?xml version="1.0" encoding="UTF-8"?>
2<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
3  <url>
4    <loc>https://example.com/page-1/</loc>
5    <lastmod>2025-01-15</lastmod>
6    <changefreq>monthly</changefreq>
7    <priority>0.8</priority>
8  </url>
9</urlset>
10

Where:

loc: Page URL (required)
lastmod: Last modification date (recommended; Google references this)
changefreq: Update frequency (Google largely ignores this field)
Priority: Priority (Google also largely ignores this)

In practice, you only need to focus on loc and lastmod.

2. Sitemap Best Practices

Rule	Explanation
Include only pages that need indexing	Do not put [noindex] pages, redirected pages (301/302), or 404 error pages in the Sitemap. These waste crawl budget and send mixed signals to search engines.
URLs in the Sitemap must be canonical URLs	If a page has a canonical tag pointing to another URL, the Sitemap should include the canonical URL—not the duplicate or variant URL.
Maximum 50,000 URLs per Sitemap	Each Sitemap file cannot exceed 50,000 URLs. If your site exceeds this limit, split URLs across multiple Sitemap files and use a Sitemap Index file ([sitemap_index.xml]) to aggregate them.
Sitemap file size not exceeding 50MB	Uncompressed file size must stay under 50MB. For large Sitemaps, submit compressed files ([.xml.gz]) to reduce bandwidth and improve processing speed.
[lastmod] should be accurate	Update the [] tag only when the page content actually changes. Do not automatically update all pages daily—this creates unnecessary crawl demand and reduces trust signals with search engines.
Declare Sitemap location in [robots.txt]	Add a [Sitemap] directive to your [robots.txt] file to help search engines discover your Sitemap location. Format: [Sitemap: https://example.com/sitemap.xml]
Submit in Google Search Console	After publishing your Sitemap, submit it via Google Search Console (or Bing Webmaster Tools). Monitor the Index status report to verify that pages are being discovered and indexed correctly.

3. Sitemap Strategy for Large Websites

If your website has tens of thousands or even hundreds of thousands of pages, you need to organize them using a Sitemap Index file:

xml Copy

1<?xml version="1.0" encoding="UTF-8"?>
2<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
3  <sitemap>
4    <loc>https://example.com/sitemap-products.xml</loc>
5    <lastmod>2025-01-15</lastmod>
6  </sitemap>
7  <sitemap>
8    <loc>https://example.com/sitemap-categories.xml</loc>
9    <lastmod>2025-01-10</lastmod>
10  </sitemap>
11  <sitemap>
12    <loc>https://example.com/sitemap-blog.xml</loc>
13    <lastmod>2025-01-14</lastmod>
14  </sitemap>
15</sitemapindex>
16
17

Split the Sitemap by page type (products, categories, blog, static pages) to facilitate management and monitoring.

4. Sitemap for Different Platforms

WordPress + Rank Math: Sitemap is generated automatically. You can control which content types are included in the Sitemap within Rank Math settings. The path is usually /sitemap_index.xml.
Shopify: Sitemap is generated automatically at the path /sitemap.xml. It cannot be customized, but Shopify's default Sitemap is sufficient.
Custom Websites: Generate using tools like Screaming Frog or Sitebulb, or generate dynamically with code.

Canonical Tags: The Tool for Resolving Duplicate Content

Duplicate content is one of the most common technical SEO issues. When the same content appears across multiple URLs, Google does not know which one to index.

The canonical tag tells Google which version among these duplicate pages is the "master" copy.

1. When Are Canonical Tags Needed?

Common duplicate content scenarios:

URL parameters: example.com/product and example.com/product?ref=email are the same page
HTTP/HTTPS: http://example.com and https://example.com
www/non-www: www.example.com and example.com
Trailing slashes: example.com/page and example.com/page/
Uppercase/lowercase: example.com/Page and example.com/page

Pagination: example.com/blog and example.com/blog?page=1

Sorting/filtering: example.com/products?sort=price and example.com/products?sort=name
Cross-domain content: Your article republished on other websites

2. Using Canonical Tags

Add the following in the page's section:

html Copy

1<link rel="canonical" href="https://example.com/preferred-url/" />
2
3

Every page should have a canonical tag, including self-referencing canonicals (pointing to itself).

3. Common Canonical Tag Errors

Error	Consequence	Correct Approach
All pages canonical pointing to homepage	All pages except the homepage disappear from the index	Each page points to its own canonical URL (self-referential canonical)
Canonical URL is a 404 page	Google ignores the canonical tag	Ensure the canonical URL returns a 200 (OK) status code
Canonical URL blocked by robots.txt	Google cannot verify the canonical relationship	Ensure the canonical URL can be crawled (not blocked by robots.txt)
Canonical chain (A → B → C)	Google may ignore the chain or follow inconsistently	Point directly to the final (master) URL—avoid chains
Canonical and noindex used together	Conflicting signals confuse Google's indexing decision	Choose one strategy: either canonical to the master version OR noindex—never both
HTTP canonical on an HTTPS page	Protocol mismatch creates confusion and may be ignored	Always use HTTPS for canonical URLs when the site uses HTTPS

Website Architecture and URL Structure

Website architecture determines how Google understands your website. A good architecture allows Google to crawl all pages easily; a poor one leaves Google lost.

1. Flat Architecture

The ideal website architecture is flat – any page should be reachable from the homepage within three clicks.

text Copy

1Homepage
2├── Category A
3│   ├── Product A1
4│   ├── Product A2
5│   └── Product A3
6├── Category B
7│   ├── Product B1
8│   └── Product B2
9└── Blog
10    ├── Article 1
11    └── Article 2
12
13

Problems with overly deep issues:

Google crawlers may not reach deep pages
Deep pages receive less internal link equity (PageRank)
Users have difficulty finding deep content

2. URL Structure Best Practices

URLs are the foundation of technical SEO. Good URL structure:

Principle	Good URL	Poor URL
Short	/ball-valves/	/products/category/industrial/ball-valves/stainless-steel/
Descriptive	/stainless-steel-ball-valve/	/product-12345/
Hyphen-separated	/ball-valve/	/ball_valve/ or /ballvalve/
Lowercase	/ball-valve/	/Ball-Valve/
No parameters	/ball-valves/	/products?cat=5&sort=price
Contains keywords	/link-building-guide/	/post-2025-01-15/

3. URL Changes and Redirects

Once a URL is established, try not to change it. Each time you change a URL, you need to set up a 301 redirect and may experience short-term ranking fluctuations.

If you must change a URL:

Set up a 301 redirect (permanent redirect) from the old URL to the new URL
Update all internal links to point to the new URL
Update the Sitemap
Monitor in Google Search Console
Keep the 301 redirect in place for at least one year

301 vs. 302 Redirects:

301: Permanent redirect. Tells Google the old URL has been permanently moved to the new URL; link equity is transferred.
302: Temporary redirect. Tells Google the old URL is only temporarily redirected; link equity is not transferred (or very little is transferred).

In most cases, you should use 301. Only use 302 when the page is genuinely temporary (such as for A/B testing or temporary maintenance).

Page Speed and Core Web Vitals

In 2021, Google officially incorporated Core Web Vitals into its ranking factors. Page speed is no longer just "nice to have"; it is "must-have."

1. The Three Core Metrics

Metric	What It Measures	Good	Needs Improvement	Poor
LCP (Largest Contentful Paint)	Loading time of the largest content element (e.g., hero image, main heading)	≤ 2.5 seconds	2.5–4 seconds	> 4 seconds
INP (Interaction to Next Paint)	Delay from user interaction (click, tap, keypress) to visual page response	≤ 200 ms	200–500 ms	> 500 ms
CLS (Cumulative Layout Shift)	Visual stability—unexpected layout shifts during page load	≤ 0.1	0.1–0.25	> 0.25

2. Optimizing LCP

LCP is typically the largest image or text block on the page. Methods to optimize LCP:

Optimize server response time (TTFB): Use good hosting, enable caching, use a CDN
Optimize the largest content element: If the LCP element is an image, compress it, use WebP format, and set appropriate dimensions
Preload LCP resources:
Reduce render-blocking resources: Inline critical CSS, defer non-critical JavaScript
Avoid client-side rendering: If LCP content requires JavaScript to display, consider server-side rendering

3. Optimizing INP

Optimizing INP:

Reduce main thread blocking: Split long tasks using requestIdleCallback or scheduler.yield()
Reduce JavaScript execution time: Remove unnecessary JavaScript, defer loading third-party scripts
Optimize event handlers: Avoid complex calculations in event handlers
Reduce DOM size: The more DOM nodes, the slower the interaction response

4. Optimizing CLS

CLS measures unexpected movement of elements during page load. You are reading a paragraph when suddenly an ad loads and pushes the text down – that is a layout shift.

Common CLS issues and solutions:

Issue	Cause	Solution
Image loading causes shift	Image has no width/height attributes set	Set [width] and [height] attributes on [img] tags
Ad loading causes shift	Ad space has no reserved area	Set fixed dimensions for ad containers
Font loading causes shift	Web font replaces system font with different size	Use [font-display: swap] with matching fallback fonts
Dynamic content insertion	Content inserted after JavaScript loads	Reserve space or use CSS [contain] property
Iframe loading	Iframe has no dimensions set	Set fixed [width] and [height] attributes for iframes

Measurement Tools

Tools for measuring Core Web Vitals:

Google PageSpeed Insights: Most commonly used; displays both lab data and field data
Google Search Console: Core Web Vitals report showing the site-wide CWV status
Chrome DevTools: Performance panel for detailed analysis
Web Vitals Chrome Extension: Real-time display of CWV data for the current page
Lighthouse: Chrome's built-in auditing tool

Important Distinction: Lab Data vs. Field Data.

Lab Data: Measured in a simulated environment; results may vary each time; used for debugging
Field Data (CrUX): Data from real Chrome users; this is what Google uses for ranking

Mobile Optimization and Mobile-First Indexing

Starting in 2023, Google fully transitioned to Mobile-First Indexing. This means Google primarily uses the mobile version of your website to determine rankings.

If your mobile version lacks content, loads slowly, or offers a poor experience, your rankings will suffer – even if the desktop version is perfect.

1. Requirements for Mobile-First Indexing

Content consistency: Content on mobile and desktop must be identical. Do not hide content on mobile
Structured data consistency: Schema markup on mobile and desktop must be identical
Meta tag consistency: Titles, descriptions, and robots tags must be identical on both versions
Image consistency: Mobile images must have alt text; format and quality should not be inferior to desktop images

2. Responsive Design vs. Separate Mobile Site

Google recommends responsive design – a single URL that automatically adjusts its layout based on screen size.

Separate mobile sites (m.example.com) are not recommended, because:

Two sets of content need maintenance
Correct canonical and alternate tags need configuration
Content inconsistencies are common
Technical complexity increases

If you are still using a separate mobile site, it is strongly recommended to migrate to a responsive design.

Mobile Technical Checklist

Viewport meta tag:
Font size: At least 16px (to avoid requiring users to zoom)
Touch targets: At least 48x48 pixels, with at least 8px spacing
No Flash: (largely obsolete, but some older sites still use it)
No horizontal scrolling layout
Form inputs: Use appropriate input types (email, tel, number, etc.) to trigger the correct keyboard

For more, please read this article:

Click The Part 2 of Google Technical SEO for tailored content reading.

Technical SEO Guide: From Crawling to Indexing – Making Search Engines Understand Website (Part 1)

What is Google technical SEO?

The Underlying Logic of Google Technical SEO: Crawling, Rendering, Indexing

Step 1: Crawling

Step 2: Rendering

Step 3: Indexing

Robots.txt: The First Door for Controlling Crawlers

Basic Syntax

Common Robots.txt Errors

Robots.txt for Different Platforms

Robots.txt for AI Crawlers

XML Sitemap: Giving Google a Map

1. Basic Sitemap Format

2. Sitemap Best Practices

3. Sitemap Strategy for Large Websites

4. Sitemap for Different Platforms

Canonical Tags: The Tool for Resolving Duplicate Content

1. When Are Canonical Tags Needed?

2. Using Canonical Tags

3. Common Canonical Tag Errors

Website Architecture and URL Structure

1. Flat Architecture

2. URL Structure Best Practices

3. URL Changes and Redirects

Page Speed and Core Web Vitals

1. The Three Core Metrics

2. Optimizing LCP

3. Optimizing INP

4. Optimizing CLS

Common CLS issues and solutions:

Measurement Tools

Mobile Optimization and Mobile-First Indexing

1. Requirements for Mobile-First Indexing

2. Responsive Design vs. Separate Mobile Site

Mobile Technical Checklist

Related articles

Technical SEO Guide: From Crawling to Indexing – Making Search Engines Understand Website (Part 2)

What Is GEO? The Complete Guide to Generative Engine Optimization for Ecommerce

The Complete Guide to Anonymous Browsing: 8 Effective Methods Compared

Complete Guide to JumpTask in 2026: Earn US Dollars from Micro-Tasks

Spotify Helper: The Essential but Troublesome Background Process

Earn $2.49 Per Minute by Watching Google Ads in 2026: A Step-by-Step Guide.

Affiliate Program: Simple Steps to Start Earning

Change Your IP Address: A Beginner's Guide to Using Proxies

IP Temporarily Blocked? Here's How to Fix It

Start your Free Trial Now!

Technical SEO Guide: From Crawling to Indexing – Making Search Engines Understand Website (Part 1)

What is Google technical SEO?

The Underlying Logic of Google Technical SEO: Crawling, Rendering, Indexing

Step 1: Crawling

Step 2: Rendering

Step 3: Indexing

Robots.txt: The First Door for Controlling Crawlers

Basic Syntax

Common Robots.txt Errors

Robots.txt for Different Platforms

Robots.txt for AI Crawlers

XML Sitemap: Giving Google a Map

1. Basic Sitemap Format

2. Sitemap Best Practices

3. Sitemap Strategy for Large Websites

4. Sitemap for Different Platforms

Canonical Tags: The Tool for Resolving Duplicate Content

1. When Are Canonical Tags Needed?

2. Using Canonical Tags

3. Common Canonical Tag Errors

Website Architecture and URL Structure

1. Flat Architecture

2. URL Structure Best Practices

3. URL Changes and Redirects

Page Speed and Core Web Vitals

1. The Three Core Metrics

2. Optimizing LCP

3. Optimizing INP

4. Optimizing CLS

Common CLS issues and solutions:

Measurement Tools

Mobile Optimization and Mobile-First Indexing

1. Requirements for Mobile-First Indexing

2. Responsive Design vs. Separate Mobile Site