Skip to content

tigerlcl/ArxivFlow

Repository files navigation

ArxivFlow - Periodic Track on arXiv Paper

English | 中文

Author: Tiger, member from HKUST Dial

Last update: September 09, 2025

🎯 Objectives

This workflow serves for tracking daily updates in arXiv.org. Paper info will be preprocessed and concluded by a series of modules. Finally, it will post to a group chat in Feishu for reading. The target audience is for education and research community.

💰 Cost: less than 0.05 CNY per workflow execution.

✨ Key Features

  • 📚 Automatically fetch latest arXiv papers
  • 🤖 AI-powered paper summarization and filtering
  • 📱 Auto-send to Feishu group chat
  • ⏰ GitHub Actions automated scheduling
  • 🛠️ Local debugging script support

📋 Prerequisites

Before getting started, please ensure you have prepared the following accounts and services:

  1. Dify account - Free registration for building AI workflows
  2. LLM Provider API - Recommended DeepSeek API (cost-effective)
  3. Jina API key - For web content extraction, new users get 1M free credits
  4. Feishu Group Bot Webhook - For message pushing

🚀 Quick Start

Step 1: Setup Dify Workflow

  1. Open Dify Console

    • Login to Dify and find the "Studio" tab
  2. Import Workflow

    • Create a new workflow by importing this DSL file
    • This DSL file contains the complete logic for paper fetching, processing, and pushing
  3. Configure Environment Variables

    • Configure necessary environment variables in workflow settings
    • See detailed configuration in Environment Variables Configuration section below
  4. Get API Token

    • Get your workflow API token from workflow settings
    • This token will be used for automated scheduling

Step 2: Setup Automated Scheduler (Recommended)

The project provides an integrated scheduler that can trigger Dify-side workflows on schedule.

Quick Setup:

  1. Configure GitHub Secrets:

    • Go to repository Settings > Secrets and variables > Actions > New repository secret
    • Add secret DIFY_TOKENS: Your Dify workflow API token (separate multiple tokens with ;)
  2. Enable GitHub Actions: Go to repository Actions tab and enable workflows

  3. Automatic Execution: The scheduler will automatically run according to timing rules defined in dify-scheduler.yml. For syntax details, see cron.help.

Manual Execution:

  • GitHub Actions: Go to Actions tab > "Dify ArxivFlow Scheduler" > "Run workflow"
  • Local Testing:
    npm install
    # Set environment variables
    export DIFY_TOKENS="your_workflow_token_here"
    npm start

📱 Final Result

The scheduler will automatically:

  • ✅ Execute your Dify workflow daily
  • 📊 Log execution results and status
  • ❌ Report any errors to GitHub Actions logs
  • 🔄 Support multiple workflows if needed

🔧 Environment Variables Configuration

GitHub Actions Secrets (Required):

  • DIFY_TOKENS: Your Dify workflow API token, separate multiple workflows with ;

Optional Configuration:

  • DIFY_BASE_URL: Dify API base URL (default: https://api.dify.ai/v1)
  • DIFY_INPUTS: Workflow input variables in JSON format (default: {})

Dify Workflow Internal Environment Variables:

  • FEISHU_DEV / FEISHU_PROD: Feishu Group Bot Webhook for testing/production environments
  • JINA: API key for crawling arXiv search results
  • KEYWORDS: Keywords for arXiv paper search, comma-separated
    • The number of KEYWORDS and sending frequency needs to match the timing rules in GitHub Actions
    • Example: If sending 4 pushes daily, KEYWORDS needs 4 keywords, and timing rules need 4 time points
  • PAPER_NUM_MAX: Maximum number of papers per message (limited by Feishu message length)

🛠️ Debugging Scripts

The /scripts folder contains scripts for local debugging and testing, simulating the processes used in Dify Workflow:

  • jina_extract.py: Simulates Jina API calls and paper information extraction logic
  • sample.text: Sample data returned by Jina API for local testing
  • extracted_papers.json: Example of structured paper data after extraction, serves as input for downstream LLM analysis in workflow

These scripts help you test and debug paper extraction logic without consuming API credits.

Usage for Local Development:

cd scripts
python jina_extract.py

🤝 Acknowledgement

  • Dify Official Guidance: Link
  • Feishu - How to use Bot in Group Chat: Link (Chinese)
  • AWS Workshop: Lab3-使用Dify构建AI Workflow: Link (Chinese)
  • arXiv Category: Link
  • Dify Schedule Project: Link - Inspiration for the automated scheduler implementation

📄 License

MIT License - See LICENSE file

About

ArxivFlow - Periodic Track on arXiv Paper

Topics

Resources

License

Stars

Watchers

Forks