Skip to content

A Node.js application that transcribes audio recordings into text and automatically saves them to Google Docs. Perfect for maintaining voice notes, meeting minutes, or any spoken content in a written format.

License

Notifications You must be signed in to change notification settings

s4ff0x/speech-to-docs

Repository files navigation

speech-to-docs

A Node.js application that transcribes audio recordings into text and automatically saves them to Google Docs. Perfect for maintaining voice notes, meeting minutes, or any spoken content in a written format.

Setup Example (Article)

img_1.png

Environment Variables

Create a .env file with the following variables:

OPENAI_SPEECH_API_KEY=
DOC_ID=The ID of the Google document where the text will be saved
TIMEZONE=your_timezone (example: Asia/Jerusalem)
PERSONAL_AUTH_TOKEN=Create your personal key to use when calling the API 

# Notion
NOTION_API_KEY=Your Notion integration secret
NOTION_DATABASE_ID=The Notion database ID or full database URL

# Google Service Account Credentials
TYPE=
PROJECT_ID=
PRIVATE_KEY_ID=
PRIVATE_KEY=
CLIENT_EMAIL=
CLIENT_ID=
AUTH_URI=
TOKEN_URI=
AUTH_PROVIDER_X509_CERT_URL=
CLIENT_X509_CERT_URL=

Features

  • Audio file transcription using OpenAI's Whisper model
  • Automatic saving of transcriptions to Google Docs
  • Timestamp recording for each transcription
  • Support for M4A audio format
  • RESTful API interface

Technical Stack

  • Node.js with Express.js
  • OpenAI API (Whisper model for transcription)
  • Google Docs API
  • Multer for file upload handling

Prerequisites

  • Node.js installed
  • OpenAI API key
  • Google Cloud project with enabled Google Docs API
  • Google Service Account credentials
  • Notion integration (optional): Notion API key and database

How to test api with curl

curl -F "audio=@[path to file].m4a;type=audio/m4a" \
-H "Authorization:[your personal auth token]" \
-X POST https://[your api url]/transcribe

Notion Integration

The app can create a Notion page for each transcription when NOTION_API_KEY and NOTION_DATABASE_ID are configured. Notion updates run in parallel with Google Docs for speed.

1) Create a Notion integration

  • Go to Notion Developers → Create a new internal integration.
  • Copy the integration secret into NOTION_API_KEY.

2) Prepare your Notion database

You can use an existing database or create a new one. Share the database with your integration (Share → Invite → select your integration) so it has access.

Required/Supported properties (column names and types):

  • Title property (type: Title)
    • Name can be anything. The app auto-detects the Title property.
  • Content property (type: Rich text)
    • Recommended names: content, text, body, description, or notes. The app picks the first matching Rich text property.
  • tags (type: Multi-select)
    • General tags. The app prioritizes existing options but may add new options here if needed.
    • Property must be named exactly tags (case-insensitive). This prevents conflicts with similarly named columns like project-tags.
  • category-tags (type: Multi-select)
    • High-level categories matched conservatively via AI from your transcription.
    • The app will ONLY use options that already exist in this property and will NOT create new options.
    • Create the options you want to be eligible, for example: dev, health.
  • project-tags (type: Multi-select)
    • Project-specific tags matched conservatively via AI.
    • The app will ONLY use options that already exist in this property and will NOT create new options.
    • Create the options you want to be eligible, for example: smart-journal, p1v3, p1v4.

Notes:

  • NOTION_DATABASE_ID may be the raw 32-character ID or the full database URL. The app extracts the ID automatically (hyphens and query params are handled).
  • If category-tags or project-tags properties are missing, they are simply skipped.

3) What the app writes to Notion

  • Title: generated by AI to summarize the transcription.
  • Content: the full transcription text (stored in the first suitable Rich text property).
  • tags: generated by AI; reuses existing options when possible and may create new options if needed.
  • category-tags: matched strictly against existing options; never creates new options.
  • project-tags: matched strictly against existing options; never creates new options.

4) Performance

The app performs title generation, tags, category-tags, and project-tags extraction in parallel to reduce latency.

About

A Node.js application that transcribes audio recordings into text and automatically saves them to Google Docs. Perfect for maintaining voice notes, meeting minutes, or any spoken content in a written format.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published