Skip to content

Conversation

KTS-o7
Copy link

@KTS-o7 KTS-o7 commented Sep 2, 2025

This pull request introduces LLM-powered content summarization to the web crawler, allowing users to generate high-quality summaries of search results using GPT-4o-mini and other supported models. The changes update the documentation, add new dependencies, enhance the CLI workflow, and introduce new modules for summarization and improved search functionality.

LLM Summarization Feature

  • Updated README.md to announce LLM-powered summarization, document new features, configuration steps, example output, architecture, supported LLM providers, and production considerations. [1] [2]
  • Added litellm as a dependency in pyproject.toml to enable LLM integration.

CLI and Workflow Enhancements

  • Modified src/main.py to prompt users for AI summaries, allow configuration of the number of summaries, and integrate the new SummarizationService for summarizing search results. [1] [2] [3]

Search and Summarization Modules

  • Added a new src/search/searcher.py implementing WebSearcher for robust, multi-source web search with improved RSS parsing, Google News URL resolution, and result cleaning/sorting.
  • Created src/summarizer/__init__.py to expose summarization engine components for use in the crawler.

…n. Update search functionality to include optional summaries, enhance README with new features, and add tests for summarization service.
@KTS-o7
Copy link
Author

KTS-o7 commented Sep 2, 2025

#3 @virattt Please review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant