English | 中文
Author: Tiger, member from HKUST Dial
Last update: September 09, 2025
This workflow serves for tracking daily updates in arXiv.org. Paper info will be preprocessed and concluded by a series of modules. Finally, it will post to a group chat in Feishu for reading. The target audience is for education and research community.
💰 Cost: less than 0.05 CNY per workflow execution.
- 📚 Automatically fetch latest arXiv papers
- 🤖 AI-powered paper summarization and filtering
- 📱 Auto-send to Feishu group chat
- ⏰ GitHub Actions automated scheduling
- 🛠️ Local debugging script support
Before getting started, please ensure you have prepared the following accounts and services:
- Dify account - Free registration for building AI workflows
- LLM Provider API - Recommended DeepSeek API (cost-effective)
- Jina API key - For web content extraction, new users get 1M free credits
- Feishu Group Bot Webhook - For message pushing
-
Open Dify Console
-
Import Workflow
- Create a new workflow by importing this DSL file
- This DSL file contains the complete logic for paper fetching, processing, and pushing
-
Configure Environment Variables
-
Get API Token
- Get your workflow API token from workflow settings
- This token will be used for automated scheduling
The project provides an integrated scheduler that can trigger Dify-side workflows on schedule.
-
Configure GitHub Secrets:
- Go to repository Settings > Secrets and variables > Actions > New repository secret
- Add secret
DIFY_TOKENS
: Your Dify workflow API token (separate multiple tokens with;
)
-
Enable GitHub Actions: Go to repository Actions tab and enable workflows
-
Automatic Execution: The scheduler will automatically run according to timing rules defined in dify-scheduler.yml. For syntax details, see cron.help.
- GitHub Actions: Go to Actions tab > "Dify ArxivFlow Scheduler" > "Run workflow"
- Local Testing:
npm install # Set environment variables export DIFY_TOKENS="your_workflow_token_here" npm start
The scheduler will automatically:
- ✅ Execute your Dify workflow daily
- 📊 Log execution results and status
- ❌ Report any errors to GitHub Actions logs
- 🔄 Support multiple workflows if needed
DIFY_TOKENS
: Your Dify workflow API token, separate multiple workflows with;
DIFY_BASE_URL
: Dify API base URL (default:https://api.dify.ai/v1
)DIFY_INPUTS
: Workflow input variables in JSON format (default:{}
)
FEISHU_DEV
/FEISHU_PROD
: Feishu Group Bot Webhook for testing/production environmentsJINA
: API key for crawling arXiv search resultsKEYWORDS
: Keywords for arXiv paper search, comma-separated- The number of KEYWORDS and sending frequency needs to match the timing rules in GitHub Actions
- Example: If sending 4 pushes daily, KEYWORDS needs 4 keywords, and timing rules need 4 time points
PAPER_NUM_MAX
: Maximum number of papers per message (limited by Feishu message length)
The /scripts
folder contains scripts for local debugging and testing, simulating the processes used in Dify Workflow:
jina_extract.py
: Simulates Jina API calls and paper information extraction logicsample.text
: Sample data returned by Jina API for local testingextracted_papers.json
: Example of structured paper data after extraction, serves as input for downstream LLM analysis in workflow
These scripts help you test and debug paper extraction logic without consuming API credits.
cd scripts
python jina_extract.py
- Dify Official Guidance: Link
- Feishu - How to use Bot in Group Chat: Link (Chinese)
- AWS Workshop: Lab3-使用Dify构建AI Workflow: Link (Chinese)
- arXiv Category: Link
- Dify Schedule Project: Link - Inspiration for the automated scheduler implementation
MIT License - See LICENSE file