Feat/dynamic sitemap robots setup #2568

itsNintu · 2025-07-31T00:40:33Z

Replace static sitemap and llms.txt generation with dynamic Next.js App Router implementation for both web client and docs applications. This implementation automatically discovers pages and
generates sitemaps +llms.txt +llms-full.txt + robots.txt files using Next.js built-in metadata routes.

Key Changes:

• Implement dynamic sitemap.ts and robots.ts files using Next.js App Router conventions
• Remove next-sitemap dependency from docs application
• Add automatic page discovery with configurable exclusion patterns
• Ensure proper SEO optimization with appropriate priorities and change frequencies
• Maintain consistency between robots.txt disallow rules and sitemap exclusions

Related Issues

Type of Change

[ ] Bug fix
[✓] New feature
[ ] Documentation update
[ ] Release
[✓] Refactor
[ ] Other (please describe):

Testing

Manual Testing Steps:

Web Client (onlook.com):
• Visit /robots.txt - should show proper disallow rules and sitemap reference
• Visit /sitemap.xml - should show all public pages with correct priorities
• Verify excluded routes (auth, API, user-specific) are not in sitemap
Docs (docs.onlook.com):
• Visit /robots.txt - should reference sitemap correctly
• Visit /sitemap.xml - should show docs homepage
• Verify old next-sitemap functionality is replaced
Build Testing:
• Run bun install and bun build for both applications
• Confirm no next-sitemap related errors in docs build

Screenshots (if applicable)

Additional Notes

• Breaking Change: Removes next-sitemap dependency - docs application no longer needs postbuild script
• SEO Optimized: Homepage gets priority 1.0, marketing pages 0.9, auth pages 0.6
• Automatic: New pages are automatically included in sitemap without manual configuration
• Secure: Private routes (user dashboards, API endpoints) are automatically excluded
• Standards Compliant: Uses official Next.js metadata route conventions for better caching and performance
anthropic/claude-4-sonnet-20250522 (07:39 AM)

Important

Replaces static sitemap and robots.txt generation with dynamic Next.js App Router implementation, removing next-sitemap dependency and adding automatic page discovery with exclusion patterns.

Behavior:
- Replace static sitemap and robots.txt generation with dynamic implementation using Next.js App Router.
- Automatic page discovery with exclusion patterns in sitemap-utils.ts.
- Ensure SEO optimization with priorities and change frequencies.
- Consistency between robots.ts disallow rules and sitemap exclusions.
Files:
- Add sitemap.ts, robots.ts, llms.txt/route.ts, and llms-full.txt/route.ts in both web/client and docs applications.
- Remove next-sitemap.config.js and related postbuild script from package.json in docs.
Misc:
- Remove next-sitemap dependency from docs/package.json.
- Update constants/index.ts for route management.

^{This description was created by}^{for caf080d. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

New Features
- Added public LLMS documentation endpoints (llms.txt and llms-full.txt) for both the app and docs sites.
- Implemented robots.txt via metadata routes with sensible crawl rules and sitemap references.
- Introduced dynamic sitemap generation, including automatic route discovery for the app and a daily-updated sitemap for docs.
Chores
- Removed legacy sitemap tooling and configuration.
- Cleaned up build scripts and dependencies related to sitemap generation.
- Minor file formatting cleanup.

- Replace next-sitemap with native Next.js sitemap.ts and robots.ts files - Add automatic page discovery for web client sitemap generation - Create comprehensive sitemap utilities with SEO optimization - Remove deprecated robots.txt route handler in docs - Update docs package.json to remove next-sitemap dependency - Add detailed implementation documentation in DYNAMIC_SITEMAP_SETUP.md - Fix constants.ts formatting 🤖 Generated with [opencode](https://opencode.ai) Co-Authored-By: opencode <[email protected]>

- Remove redundant /docs path from sitemap (docs.onlook.com/docs -> docs.onlook.com) - Keep correct docs.onlook.com domain for both sitemap and robots 🤖 Generated with [opencode](https://opencode.ai) Co-Authored-By: opencode <[email protected]>

Keep implementation documentation local only 🤖 Generated with [opencode](https://opencode.ai) Co-Authored-By: opencode <[email protected]>

vercel · 2025-07-31T00:40:38Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
docs	Ready	Preview	Comment	Aug 18, 2025 6:10am
web	Ready	Preview	Comment	Aug 18, 2025 6:10am

supabase · 2025-07-31T00:40:38Z

This pull request has been ignored for the connected project wowaemfasoptxrdjhilu because there are no changes detected in apps/backend/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.

Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

coderabbitai · 2025-08-18T06:05:58Z

Walkthrough

New text routes generate llms.txt and llms-full.txt for both web and docs apps. Web adds robots and sitemap metadata plus a filesystem-based sitemap utility. Docs migrates robots/sitemap to MetadataRoute, removes next-sitemap config and related script/dependency, and deletes the old robots.txt route. A minor formatting change adds a newline.

Changes

Cohort / File(s)	Summary
Web LLMS text routes `apps/web/client/src/app/llms.txt/route.ts`, `apps/web/client/src/app/llms-full.txt/route.ts`	Add GET handlers serving plaintext llms*.txt. Use DOCS_URL env fallback, set Content-Type text/plain and X-Robots-Tag: llms-txt. llms-full builds comprehensive documentation text; llms renders structured sections.
Web SEO routes and sitemap utility `apps/web/client/src/app/robots.ts`, `apps/web/client/src/app/sitemap.ts`, `apps/web/client/src/lib/sitemap-utils.ts`	Add robots metadata route using APP_URL and disallow lists. Add sitemap metadata route delegating to getWebRoutes(). Implement getWebRoutes() to scan app routes, filter excluded patterns, and return MetadataRoute.Sitemap entries with priorities and frequencies.
Docs LLMS text routes `docs/src/app/llms.txt/route.ts`, `docs/src/app/llms-full.txt/route.ts`	Add GET handlers generating plaintext llms*.txt. llms-full scans docs content, extracts titles, cleans Markdown, builds TOC and sections; sets revalidate=3600. llms outputs static sections. Both set X-Robots-Tag: llms-txt.
Docs SEO config migration `docs/src/app/robots.ts`, `docs/src/app/sitemap.ts`, `docs/src/app/robots.txt/route.ts` (removed), `docs/next-sitemap.config.js` (deleted), `docs/package.json`	Migrate to MetadataRoute-based robots and sitemap. Remove dynamic robots.txt route. Delete next-sitemap config and drop next-sitemap script/dependency from package.json.
Misc formatting `apps/web/client/src/utils/constants/index.ts`	Add trailing newline; no functional change.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant NextApp as Next.js Route (web/docs)
  participant Generator as LLMS Generator

  Client->>NextApp: GET /llms.txt or /llms-full.txt
  NextApp->>Generator: Build documentation text (env DOCS_URL)
  Generator-->>NextApp: Plaintext content
  NextApp-->>Client: 200 text/plain (X-Robots-Tag: llms-txt)

sequenceDiagram
  participant Client
  participant WebApp as Next.js sitemap (web)
  participant Utils as getWebRoutes()
  participant FS as File System

  Client->>WebApp: GET /sitemap.xml
  WebApp->>Utils: getWebRoutes()
  Utils->>FS: readdir(app/src/app recursively)
  FS-->>Utils: Directory entries
  Utils-->>WebApp: MetadataRoute.Sitemap entries
  WebApp-->>Client: Sitemap response

sequenceDiagram
  participant Client
  participant DocsApp as Next.js llms-full (docs)
  participant Scanner as Docs Scanner
  participant FS as File System

  Client->>DocsApp: GET /llms-full.txt
  DocsApp->>Scanner: scanDocsDirectory()
  Scanner->>FS: Read *.mdx/*.md
  FS-->>Scanner: File contents
  Scanner-->>DocsApp: Titles + cleaned content
  DocsApp-->>Client: 200 text/plain (assembled document)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

In burrows of bytes I twitch my nose,
New maps and manuals neatly compose.
Robots now know where not to tread,
Sitemaps bloom where routes are read.
LLMS scrolls, a carrot-long list—
I thump approval: nothing’s missed! 🥕🐇

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/dynamic-sitemap-robots-setup

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (15)

docs/src/app/sitemap.ts (2)
5-5: Prefer DOCS_URL for docs app + normalize trailing slashes

Use DOCS_URL (consistent with other docs routes) and strip trailing slashes to avoid double slashes when composing URLs elsewhere.
-    const BASE_URL = process.env.APP_URL ?? 'https://docs.onlook.com';
+    const BASE_URL = (process.env.DOCS_URL ?? 'https://docs.onlook.com').replace(/\/+$/, '');
4-4: Confirm whether lastModified should be build-time or request-time

Using new Date() makes lastModified change per request. If you want stable values between builds, consider computing once at module load or using ISR semantics.

Would you like me to prepare a variant that hoists timestamps to module scope or sets revalidate for predictable freshness?
apps/web/client/src/app/robots.ts (1)
3-3: Normalize BASE_URL to avoid double slashes in sitemap/host

If APP_URL ends with '/', ${BASE_URL}/sitemap.xml will produce a double slash. Normalize once.
-const BASE_URL = process.env.APP_URL ?? 'https://onlook.com';
+const BASE_URL = (process.env.APP_URL ?? 'https://onlook.com').replace(/\/+$/, '');
docs/src/app/robots.ts (1)
3-3: Use DOCS_URL for docs app + normalize trailing slashes

Align with other docs routes and prevent accidental double slashes.
-const BASE_URL = process.env.APP_URL ?? 'https://docs.onlook.com';
+const BASE_URL = (process.env.DOCS_URL ?? 'https://docs.onlook.com').replace(/\/+$/, '');
docs/src/app/llms.txt/route.ts (3)
27-27: Normalize docsUrl to avoid double slashes in links

Prevent accidental // in constructed URLs.
-    const docsUrl = process.env.DOCS_URL ?? 'https://docs.onlook.com';
+    const docsUrl = (process.env.DOCS_URL ?? 'https://docs.onlook.com').replace(/\/+$/, '');
82-87: Use a standard X-Robots-Tag value or remove it

'X-Robots-Tag: llms-txt' isn’t a standard directive. If you intend to keep this page out of search results, use 'noindex'; if indexing is fine, drop the header.
-        headers: {
-            'Content-Type': 'text/plain; charset=utf-8',
-            'X-Robots-Tag': 'llms-txt',
-        },
+        headers: {
+            'Content-Type': 'text/plain; charset=utf-8',
+            // Use 'noindex' to prevent indexing, or remove this header entirely if indexing is desired.
+            'X-Robots-Tag': 'noindex',
+        },
If you need a custom marker for observability, prefer a custom header name (e.g., X-LLMS-Doc: true) instead of overloading X-Robots-Tag.

1-24: Deduplicate LLMS types/renderer across apps

The same LLMSSection/LLMSData and renderMarkdown exist in apps/web and docs. Consider a tiny shared module to avoid divergence.

Would you like me to extract a shared llms-utils.ts and update both routes?
apps/web/client/src/app/llms.txt/route.ts (3)
26-27: Normalize docsUrl to avoid double slashes in links

Small hygiene improvement matching the docs route.
-    const docsUrl = process.env.DOCS_URL ?? 'https://docs.onlook.com';
+    const docsUrl = (process.env.DOCS_URL ?? 'https://docs.onlook.com').replace(/\/+$/, '');
71-75: Use a standard X-Robots-Tag or remove

Same reasoning as the docs route; 'llms-txt' isn’t a recognized directive.
-        headers: {
-            'Content-Type': 'text/plain; charset=utf-8',
-            'X-Robots-Tag': 'llms-txt',
-        },
+        headers: {
+            'Content-Type': 'text/plain; charset=utf-8',
+            'X-Robots-Tag': 'noindex',
+        },
1-24: DRY up LLMS data model and renderer

Same structures and renderer exist in docs. Extracting a shared utility prevents drift.

I can propose a shared file (e.g., packages/shared/llms-utils.ts or apps/web/client/src/lib/llms-utils.ts consumed by both) if you’re open to it.
apps/web/client/src/app/llms-full.txt/route.ts (1)
1-162: Unused async function and redundant parameter

The getFullDocumentation function is declared as async but doesn't perform any asynchronous operations. Additionally, the docsUrl parameter is only used once to append to the documentation content.
-async function getFullDocumentation(docsUrl: string): Promise<string> {
+function getFullDocumentation(docsUrl: string): string {
Also update the call site:
-        const content = await getFullDocumentation(docsUrl);
+        const content = getFullDocumentation(docsUrl);
docs/src/app/llms-full.txt/route.ts (2)
48-59: Consider more robust title extraction

The regex pattern for extracting titles from frontmatter doesn't handle multi-line YAML values correctly. Additionally, the filename fallback uses a deprecated regex flag.
 function extractTitle(content: string, filename: string): string {
     // Try to extract title from frontmatter or first heading
     const titleMatch =
-        content.match(/^title:\s*["']?([^"'\n]+)["']?/m) || content.match(/^#\s+(.+)$/m);
+        content.match(/^title:\s*["']?([^"'\n]+?)["']?\s*$/m) || content.match(/^#\s+(.+)$/m);
 
     if (titleMatch) {
         return titleMatch[1].trim();
     }
 
     // Fallback to filename without extension
-    return filename.replace(/\.(mdx?|md)$/, '').replace(/-/g, ' ');
+    return filename.replace(/\.(mdx?)$/, '').replace(/-/g, ' ');
 }
89-90: Potential anchor collision in table of contents

The anchor generation could produce duplicate IDs if multiple documents have the same title after normalization.

Consider adding the file path or index to ensure uniqueness:
-        const anchor = file.title.toLowerCase().replace(/[^a-z0-9]+/g, '-');
+        const anchor = `${file.title.toLowerCase().replace(/[^a-z0-9]+/g, '-')}-${docFiles.indexOf(file)}`;
Apply the same change at line 97.
apps/web/client/src/lib/sitemap-utils.ts (2)
17-61: Consider handling symbolic links and improving error messages

The directory scanning doesn't handle symbolic links, which could cause infinite loops. Also, the error message could be more specific about the failure type.
 async function scanAppDirectory(
     dir: string,
     basePath = '',
     excludedPatterns: string[],
 ): Promise<string[]> {
     const routes: string[] = [];
 
     try {
         const entries = await readdir(dir, { withFileTypes: true });
 
         for (const entry of entries) {
             const fullPath = join(dir, entry.name);
             const routePath = join(basePath, entry.name);
 
-            if (entry.isDirectory()) {
+            if (entry.isDirectory() && !entry.isSymbolicLink()) {
                 if (
                     entry.name.startsWith('_') ||
                     entry.name.startsWith('(') ||
                     entry.name.startsWith('[')
                 ) {
                     continue;
                 }
 
                 const subRoutes = await scanAppDirectory(fullPath, routePath, excludedPatterns);
                 routes.push(...subRoutes);
             } else if (entry.name === 'page.tsx' || entry.name === 'page.ts') {
                 let route = basePath === '' ? '/' : basePath.replace(/\\/g, '/');
 
                 if (!route.startsWith('/')) {
                     route = '/' + route;
                 }
 
                 const shouldExclude = excludedPatterns.some((pattern) => route.startsWith(pattern));
 
                 if (!shouldExclude) {
                     routes.push(route);
                 }
             }
         }
     } catch (error) {
-        console.warn(`Failed to scan directory ${dir}:`, error);
+        console.warn(`Failed to scan directory ${dir}:`, error instanceof Error ? error.message : String(error));
     }
 
     return routes;
 }
82-100: Consider caching the route discovery for production

The filesystem scanning operation could be expensive in production. Consider implementing a caching mechanism.

For production environments, you might want to cache the discovered routes to avoid filesystem operations on every sitemap request. You could either:

Use Next.js ISR by exporting a revalidate constant (similar to the llms-full.txt route)

Implement a simple in-memory cache with TTL

Generate the sitemap at build time if routes don't change dynamically

Example with ISR:
// Add at the top of the file
export const revalidate = 3600; // Revalidate every hour

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 46ce8ce and caf080d.

📒 Files selected for processing (13)

apps/web/client/src/app/llms-full.txt/route.ts (1 hunks)
apps/web/client/src/app/llms.txt/route.ts (1 hunks)
apps/web/client/src/app/robots.ts (1 hunks)
apps/web/client/src/app/sitemap.ts (1 hunks)
apps/web/client/src/lib/sitemap-utils.ts (1 hunks)
apps/web/client/src/utils/constants/index.ts (1 hunks)
docs/next-sitemap.config.js (0 hunks)
docs/package.json (0 hunks)
docs/src/app/llms-full.txt/route.ts (1 hunks)
docs/src/app/llms.txt/route.ts (1 hunks)
docs/src/app/robots.ts (1 hunks)
docs/src/app/robots.txt/route.ts (0 hunks)
docs/src/app/sitemap.ts (1 hunks)

💤 Files with no reviewable changes (3)

docs/package.json
docs/src/app/robots.txt/route.ts
docs/next-sitemap.config.js

🧰 Additional context used

🧬 Code Graph Analysis (9)

docs/src/app/robots.ts (1)

apps/web/client/src/app/robots.ts (1)

robots (5-27)

apps/web/client/src/lib/sitemap-utils.ts (1)

apps/web/client/src/components/store/editor/pages/helper.ts (1)

scanAppDirectory (198-307)

apps/web/client/src/app/robots.ts (1)

docs/src/app/robots.ts (1)

robots (5-16)

docs/src/app/llms.txt/route.ts (3)

apps/web/client/src/app/llms-full.txt/route.ts (1)

GET (164-179)

apps/web/client/src/app/llms.txt/route.ts (1)

GET (26-77)

docs/src/app/llms-full.txt/route.ts (1)

GET (152-201)

apps/web/client/src/app/sitemap.ts (2)

docs/src/app/sitemap.ts (1)

sitemap (3-15)

apps/web/client/src/lib/sitemap-utils.ts (1)

getWebRoutes (82-100)

apps/web/client/src/app/llms.txt/route.ts (3)

apps/web/client/src/app/llms-full.txt/route.ts (1)

GET (164-179)

docs/src/app/llms.txt/route.ts (1)

GET (26-88)

docs/src/app/llms-full.txt/route.ts (1)

GET (152-201)

apps/web/client/src/app/llms-full.txt/route.ts (1)

docs/src/app/llms-full.txt/route.ts (1)

GET (152-201)

docs/src/app/sitemap.ts (1)

apps/web/client/src/app/sitemap.ts (1)

sitemap (4-7)

docs/src/app/llms-full.txt/route.ts (3)

apps/web/client/src/app/llms-full.txt/route.ts (1)

GET (164-179)

apps/web/client/src/app/llms.txt/route.ts (1)

GET (26-77)

docs/src/app/llms.txt/route.ts (1)

GET (26-88)

🔇 Additional comments (10)

apps/web/client/src/utils/constants/index.ts (1)

37-37: LGTM — formatting-only change adds trailing newline

No functional impact; improves POSIX compliance and avoids diff churn.

docs/src/app/sitemap.ts (1)

7-14: Homepage-only sitemap entry looks good

Returns a valid MetadataRoute.Sitemap entry with priority and changeFrequency aligned with PR goals.

apps/web/client/src/app/robots.ts (1)

24-26: Sitemap/host composition looks correct

Once BASE_URL is normalized, these fields render clean URLs.

docs/src/app/robots.ts (2)

13-15: LGTM

Sitemap and host fields are correct and consistent with BASE_URL.

7-12: CrawlDelay is supported in Next.js ≥ 13.3.0
The crawlDelay?: number property on rules was introduced in Next.js v13.3.0. No changes are required here—your use of crawlDelay: 1 will be honored.

Official docs: https://nextjs.org/docs/app/api-reference/file-conventions/metadata/robots

docs/src/app/llms.txt/route.ts (1)

12-24: Renderer is straightforward and safe

Simple markdown generation with no dynamic input; low risk of injection or formatting issues.

apps/web/client/src/app/llms-full.txt/route.ts (1)

164-179: Good error handling and header configuration

The implementation correctly handles errors with appropriate logging and status codes, and sets the proper headers for LLM consumption.

docs/src/app/llms-full.txt/route.ts (1)

152-201: Well-structured error recovery with fallback content

The error handling with fallback content ensures the endpoint always returns valid documentation, even when file reading fails. Good defensive programming.

apps/web/client/src/lib/sitemap-utils.ts (1)

63-80: Consistent metadata configuration

The route metadata configuration is well-structured with appropriate priorities and change frequencies for different page types.

apps/web/client/src/app/sitemap.ts (1)

1-7: Clean and minimal sitemap implementation

The implementation correctly delegates to the utility function and maintains a clean separation of concerns.

coderabbitai · 2025-08-18T06:15:04Z

apps/web/client/src/app/robots.ts

+        rules: {
+            userAgent: '*',
+            allow: '/',
+            disallow: [
+                '/api/',
+                '/auth/',
+                '/callback/',
+                '/webhook/',
+                '/projects/',
+                '/project/',
+                '/invitation/',
+                '/_next/',
+                '/_vercel/',
+                '/private/',
+            ],
+            crawlDelay: 1,
+        },


💡 Verification agent

🧩 Analysis chain

Validate crawlDelay support and sync disallow list with sitemap exclusions

Confirm Next.js MetadataRoute.Robots supports crawlDelay; some versions don’t emit it. If unsupported, it will be ignored.

Ensure this disallow list exactly matches paths excluded from sitemap generation (apps/web/client/src/lib/sitemap-utils.ts) to keep robots and sitemap consistent per PR objectives.

[suggest_optional_refactor]

Run this to inspect sitemap-utils for exclusion patterns and compare:

If desired, extract disallow into a constant to make intent explicit:

+const DISALLOW = [ + '/api/', + '/auth/', + '/callback/', + '/webhook/', + '/projects/', + '/project/', + '/invitation/', + '/_next/', + '/_vercel/', + '/private/', +]; ... - disallow: [ - '/api/', - '/auth/', - '/callback/', - '/webhook/', - '/projects/', - '/project/', - '/invitation/', - '/_next/', - '/_vercel/', - '/private/', - ], + disallow: DISALLOW,

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "---- Inspecting sitemap exclusions ----" fd -a "sitemap-utils.ts" | while read -r f; do echo "File: $f" rg -n -C2 -e 'exclude|excluded|EXCLUDE|DISALLOW|private|auth|api|webhook|project' "$f" || true done echo echo "---- Current robots disallow list ----" rg -n -A20 -e 'disallow:\s*\[' apps/web/client/src/app/robots.ts

Length of output: 2137

Validate crawlDelay support and align robots/disallow with sitemap exclusions

Confirm your Next.js version’s MetadataRoute.Robots supports crawlDelay (older versions will ignore it). If unsupported, consider upgrading or removing the field.

The disallow list in apps/web/client/src/app/robots.ts and the EXCLUDED_PATTERNS in sitemap-utils.ts must match exactly. Right now /private/ is present in robots but missing from sitemap exclusions. Add it:

--- a/apps/web/client/src/lib/sitemap-utils.ts +++ b/apps/web/client/src/lib/sitemap-utils.ts @@ -6,7 +6,8 @@ const BASE_URL = process.env.APP_URL ?? 'https://onlook.com'; const EXCLUDED_PATTERNS = [ '/api/', '/auth/', - '/callback/', + '/callback/', '/webhook/', '/projects/', '/project/', '/invitation/', + '/private/', // ← add this to mirror robots.ts '/_', // covers both /_next/ and /_vercel/ ];

Optional refactor: extract the shared exclusion array into a single constant (e.g. EXCLUDED_PATHS) and import it in both robots.ts and sitemap-utils.ts to avoid future drift.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

rules: {

userAgent: '*',

allow: '/',

disallow: [

'/api/',

'/auth/',

'/callback/',

'/webhook/',

'/projects/',

'/project/',

'/invitation/',

'/_next/',

'/_vercel/',

'/private/',

],

crawlDelay: 1,

},

// File: apps/web/client/src/lib/sitemap-utils.ts

const BASE_URL = process.env.APP_URL ?? 'https://onlook.com';

const EXCLUDED_PATTERNS = [

'/api/',

'/auth/',

'/callback/',

'/webhook/',

'/projects/',

'/project/',

'/invitation/',

'/private/', // ← add this to mirror robots.ts

'/_', // covers both /_next/ and /_vercel/

];

🤖 Prompt for AI Agents

In apps/web/client/src/app/robots.ts around lines 7 to 23, validate whether your Next.js MetadataRoute.Robots implementation supports the crawlDelay field and if not either remove the crawlDelay entry or upgrade Next.js to a version that supports it; also ensure the robots disallow list exactly matches the EXCLUDED_PATTERNS in sitemap-utils.ts by adding '/private/' to the sitemap exclusions (or better, extract a shared EXCLUDED_PATHS constant and import it into both robots.ts and sitemap-utils.ts so both lists remain identical going forward).

coderabbitai · 2025-08-18T06:15:04Z

docs/src/app/llms-full.txt/route.ts

+function cleanMarkdownContent(content: string): string {
+    // Remove frontmatter
+    content = content.replace(/^---[\s\S]*?---\n/, '');
+
+    // Remove JSX components and imports
+    content = content.replace(/^import\s+.*$/gm, '');
+    content = content.replace(/<[^>]+>/g, '');
+
+    // Clean up extra whitespace
+    content = content.replace(/\n{3,}/g, '\n\n');
+
+    return content.trim();
+}


🛠️ Refactor suggestion

Consider preserving code blocks in markdown content

The current implementation removes all JSX/HTML tags indiscriminately, which would also remove legitimate code blocks containing JSX examples from the documentation.

function cleanMarkdownContent(content: string): string { // Remove frontmatter content = content.replace(/^---[\s\S]*?---\n/, ''); // Remove JSX components and imports content = content.replace(/^import\s+.*$/gm, ''); - content = content.replace(/<[^>]+>/g, ''); + // Only remove JSX components that are not within code blocks + content = content.replace(/^(?!```).*<[A-Z][^>]*>.*$/gm, ''); // Clean up extra whitespace content = content.replace(/\n{3,}/g, '\n\n'); return content.trim(); }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In docs/src/app/llms-full.txt/route.ts around lines 61 to 73, the regex that removes all angle-bracket tags (content.replace(/<[^>]+>/g, '')) strips JSX/HTML inside fenced code blocks too; update the function to preserve fenced code blocks by extracting or tokenizing triple-backtick sections (or using a simple stateful parser), perform import and tag removals only on the non-code segments, then restore the fenced code blocks intact; ensure the import-stripping regex remains line-anchored and apply the whitespace collapse after restoring code blocks.

itsNintu and others added 4 commits July 31, 2025 07:26

fix: correct docs sitemap structure

9d0b07d

- Remove redundant /docs path from sitemap (docs.onlook.com/docs -> docs.onlook.com) - Keep correct docs.onlook.com domain for both sitemap and robots 🤖 Generated with [opencode](https://opencode.ai) Co-Authored-By: opencode <[email protected]>

docs: remove DYNAMIC_SITEMAP_SETUP.md from repository

b8b1ad5

Keep implementation documentation local only 🤖 Generated with [opencode](https://opencode.ai) Co-Authored-By: opencode <[email protected]>

chore: remove local documentation file completely

bb95b8e

vercel bot deployed to Preview – web July 31, 2025 00:45 View deployment

vercel bot deployed to Preview – docs July 31, 2025 00:46 View deployment

fixed llms stuff to also be dynamic

9e8b196

vercel bot deployed to Preview – web July 31, 2025 01:39 View deployment

vercel bot deployed to Preview – docs July 31, 2025 01:40 View deployment

Delete llms-txt-audit-report.md

d5099c4

vercel bot deployed to Preview – docs August 3, 2025 06:32 View deployment

vercel bot deployed to Preview – web August 3, 2025 06:34 View deployment

merge main

caf080d

vercel bot deployed to Preview – web August 18, 2025 06:09 View deployment

vercel bot deployed to Preview – docs August 18, 2025 06:10 View deployment

coderabbitai bot reviewed Aug 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/dynamic sitemap robots setup #2568

Feat/dynamic sitemap robots setup #2568

Uh oh!

itsNintu commented Jul 31, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Jul 31, 2025 •

edited

Loading

Uh oh!

supabase bot commented Jul 31, 2025

Uh oh!

coderabbitai bot commented Aug 18, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Aug 18, 2025

Uh oh!

coderabbitai bot Aug 18, 2025

Uh oh!

Uh oh!

Feat/dynamic sitemap robots setup #2568

Are you sure you want to change the base?

Feat/dynamic sitemap robots setup #2568

Uh oh!

Conversation

itsNintu commented Jul 31, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues

Type of Change

Testing

Screenshots (if applicable)

Additional Notes

Summary by CodeRabbit

Uh oh!

vercel bot commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

supabase bot commented Jul 31, 2025

Uh oh!

coderabbitai bot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

itsNintu commented Jul 31, 2025 •

edited by coderabbitai bot

Loading

vercel bot commented Jul 31, 2025 •

edited

Loading

coderabbitai bot commented Aug 18, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)