Ava is an AI-powered mock technical interviewer that helps job seekers practice coding and behavioral interviews by simulating real-world interview conditions and offering real-time feedback.
- Day 1: User Flow & Research
- Day 2: Database Design & Pseudocode
- Day 3: Feature: Authentication
- Day 4: Feature: Audio Transcription & PDF Parsing
- Day 5: Feature: AI Conversation
- Day 6: CI/CD & Deployment
- Day 7: Documentation
Interview prep is one of the most important parts of the job application process. Preparing alone can feel isolating, and finding the right accountability partner is often challenging. Ava is a mock AI interviewer designed to simulate real interviews and provide users with a structured environment to think out loud and receive instant feedback.
In Coding Mode, users follow the UMPIRE framework:
- Understand – Clarify the problem using examples and questions.
- Match – Identify the problem category and known strategies.
- Plan – Use visualizations and write pseudocode.
- Implement – Write the code in the sandbox.
- Review – Walk through the code with test cases.
- Evaluate – Analyze time/space complexity and tradeoffs.
- User selects a topic (e.g., Trees, DP)
- A random question is displayed
- The user starts the timer
- User records answer for each UMPIRE step
- The system transcribes the answer and gives AI feedback
- Final feedback is provided based on performance
- User uploads resume (PDF)
- The system parses it to create context
- AI generates personalized questions
- User records responses
- AI transcribes and asks up to 2 follow-up questions
- Final AI feedback is provided
Feature | Priority | Reason |
---|---|---|
Audio Recording & Transcription | 🔥🔥🔥 | Core to simulating interviews because the key focus here is to let the users practice thinking out loud in a structured way |
PDF Parsing | 🔥🔥🔥 | Enables resume-based behavioral questions because the system needs a context to generate a list of appropriate questions |
AI Conversation | 🔥🔥🔥 | Enables dynamic mock interview with a two-way interaction |
Code Editor | 🔥🔥🔥 | Enables code editor-like to type out the implementation |
Follow-up Questions | 🔥🔥 | Adds realism and depth |
Analytics | 🔥🔥 | Tracks user progress |
Text-to-Speech | 🔥 | AI Interviewer speaks like humans instead of just returning text |
Timer | 🔥 | Simulates real interview pressure |
Authentication | 🔥 | Required for account setup |
Feature | Technology | Notes |
---|---|---|
STT - Audio Recording & Transcription | MediaStream Recording API, OpenAI Whisper API | The first API can be used to record audio in the browser, and the second API is used to transcribe the audio into text. |
PDF Parsing | PDF.js, Gemini API (with built-in feature for PDF parsing), PyMuPDF | Both can read and parse PDF into text ready to be used in the AI's context, but if I use Gemini API, it will cost towards the number of tokens used |
AI Conversation | Gemini API, Claude API | Enables dynamic mock interview |
Code Editor | CodeMirror | Enables code editor-like to type out the implementation |
Follow-up Questions | Gemini API, Claude API | Adds realism and depth |
Analytics | Rechart.js | Tracks user progress by providing some data visualizations |
TTS - Text-to-Speech | ElevenLabs API | AI Interviewer speaks like humans instead of just returning text only |
Authentication | Supabase | Required for account setup |
For the first version of this app, I had to make a decision between complexity and storage. The part that got me thinking is that: Do I really need to store the audio files from the users? My reasoning is that
- Saved audio files are not of significant importance because what matters is that the user can receive the assessment of their performance after a mock interview session and suggestions on how to improve the next time. One might argue that if the user needs to review the audio of themselves later, or re-run the analysis. But as someone whose main focus is to practice thinking out loud in a structured way, reviewing audio is way less important than having a functional space to do as much practice as possible.
- Saved audio files cost more storage, while the value per storage is not high, unless the key feedback is to use the data to train some models, which is not the case here.
As a result, my final database schema design is shown below, which I believe strikes a balance between simplicity and functionality with the following tables:
Table | Description |
---|---|
User | Stores basic user personal information |
Question | A question bank that stores all questions (both coding and behavioral). The type field indicates whether the question is behavioral, tree, dynamic programming, etc. |
Resume | A simple table to store a user's resume data |
Session | Stores a complete mock interview session for a user that includes the initial context (coding question description or parsed resume data) |
Message | Stores the conversation between the user and the AI system |
Feedback | Stores the feedback for the session as a whole |

Layer | Tech | Reason |
---|---|---|
Backend | FastAPI | Lightweight, async-friendly, production-ready. I had extensive experience with Flask before, so I want to try something similar. |
Frontend | Next.js | Hybrid SSR/SPA; pairs well with Vercel and Supabase Auth out of the box. |
Auth | Supabase Auth | Simple to set up, good support for email/password and OAuth; tightly integrates with Next.js. |
Database | Supabase DB (PostgreSQL) | Simplify the process of setting up the database |
Deployment | Railway (backend), Vercel (frontend) | Simple, scalable, CI/CD-friendly. |
Action | Method | Route | Description |
---|---|---|---|
Create a new interview session | POST |
/sessions |
Starts a new session (e.g. after selecting a question or uploading a resume) |
Get session details | GET |
/sessions/{session_id} |
Fetch a specific session’s metadata, messages, etc. |
Submit a user answer (audio) | POST |
/sessions/{session_id}/messages |
Adds a new message from the user (with audio blob) to the session |
Respond with system feedback | POST |
/sessions/{session_id}/messages |
Same route: system's follow-up is added as another message |
Get full conversation history | GET |
/sessions/{session_id}/messages |
Returns the list of user + system messages |
Generate final feedback for session | POST |
/sessions/{session_id}/feedback |
Generates and stores tone summary, speech rate, overall evaluation |
Get feedback for session | GET |
/sessions/{session_id}/feedback |
Retrieve saved feedback (for review or display) |
Upload & parse resume | POST |
/resumes |
Upload resume (PDF), parse it on server, store parsed data |
Get parsed resume for user | GET |
/resumes/user/{user_id} |
Retrieve a user’s uploaded resumes |
Generate questions based on resume | POST |
/questions/generated |
Dynamically generate questions from a parsed resume |
Create a new custom question | POST |
/questions |
Add a manual question (coding or behavioral) to the question bank |
Get all questions | GET |
/questions |
Retrieve question bank (optionally filtered by type or user) |
Get a specific question | GET |
/questions/{question_id} |
Fetch one question by ID |
Delete a question | DELETE |
/questions/{question_id} |
Remove a question (admin or owner only) |
backend/
│
├── app/ # Main application package
│ ├── __init__.py
│ ├── main.py # Entry point for FastAPI app
│ │
│ ├── api/ # API route definitions
│ │ ├── __init__.py
│ │ ├── auth.py # Routes for login, signup
│ │ ├── session.py # Start session, submit answer, get response
│ │ ├── resume.py # Upload + parse resume
│ │ ├── feedback.py # Final feedback generation
│ │ └── question.py # Create, list questions
│ │
│ ├── models/ # Pydantic models for request/response validation
│ │ ├── __init__.py
│ │ ├── user.py
│ │ ├── session.py
│ │ ├── resume.py
│ │ ├── feedback.py
│ │ └── question.py
│ │
│ ├── services/ # Business logic, Gemini/Whisper wrappers
│ │ ├── __init__.py
│ │ ├── whisper.py # Audio transcription (OpenAI Whisper)
│ │ ├── gemini.py # Gemini response generation
│ │ ├── resume_parser.py # PDF parsing using PyMuPDF or LlamaIndex
│ │ └── feedback_generator.py # Feedback creation logic
│ │
│ ├── db/ # DB access and Supabase client
│ │ ├── __init__.py
│ │ ├── supabase.py # Supabase connection instance
│ │ └── crud.py # Abstractions for DB operations
│ │
│ ├── core/ # Settings, dependencies, utils
│ │ ├── config.py # Environment & app configs
│ │ ├── security.py # JWT, password hashing (bcrypt)
│ │ └── dependencies.py # Shared dependencies
│
├── requirements.txt # Python dependencies
├── .env # Environment variables
The app has two primary interview modes: Coding and Behavioral.
/frontend
│
├── app/ # App router structure (Next.js 13+)
│ ├── page.jsx # Landing or redirect to dashboard
│ │
│ ├── auth/ # Supabase auth pages
│ │ ├── login/page.jsx
│ │ └── register/page.jsx
│ │
│ ├── dashboard/ # Entry point after login (choose mode)
│ │ ├── layout.jsx
│ │ └── page.jsx
│ │
│ ├── session/ # Dynamic session pages
│ │ └── [id]/page.jsx # Live session: chat, code, question flow, feedback
│
├── components/ # Reusable UI and logic components
│ ├── session/ # Interview-specific components
│ │ ├── ChatBox.jsx
│ │ ├── CodeEditor.jsx
│ │ ├── QuestionDisplay.jsx
│ │ ├── Feedback.jsx
│ │ └── CreateQuestionModal.jsx
│ │
│ ├── shared/ # Generic UI components
│ │ ├── AudioRecorder.jsx
│ │ ├── LoadingSpinner.jsx
│ │ └── ErrorMessage.jsx
│
├── hooks/ # Custom React hooks
│ ├── useSession.js # Handle session ID, mode, etc.
│ ├── useChat.js # Multi-turn conversation logic
│ └── useAudioRecorder.js # Audio recording logic
│
├── lib/ # Utility and service functions
│ ├── api.js # Calls FastAPI backend
│ └── supabaseClient.js # Supabase instance config
│
├── constants/ # Static data (question types, enums)
│ └── questionTypes.js
│
├── public/ # Static assets (logo, icons)
├── styles/ # Global CSS & Tailwind config
│ └── globals.css
│
├── .env.local # Supabase, backend URLs
├── tailwind.config.js
├── postcss.config.js
├── package.json
└── README.md
The structure below shows my progress so far. There are some modifications compared to what I planned yesterday. Because I'm new to FastAPI (I have more experience with Flask), I rely on the fullstack template provided by FastAPI to structure my folder. I made some modifications that meet my needs and level.
Currently, when tested with SwaggerUI, the registration, login, and logout endpoints work. This is also the first time I used Pydantic
for data validation. I also successfully connected the backend with the Postgres connector using Supabase
. For the next steps, I want to modify the config file to set up 3 different environments: testing
, development
, and production
. I'm thinking of using a local database for the testing environment, and one Supabase database for development and one Supabase database for production.
I think I underestimated how long it took to write code that I actually understand. This took me around 3 hours to write code, read the code template to understand what's going on, and think about how to structure the code best to make it functional yet focused and simple. Here's what I've learned today:
- When defining Pydantic models, I need to define the input that the server is expected to receive from the client and the output that is sent from the server to the client
- Before running the app, we need a file to initialize/populate the database (create all tables), and this file should be separate instead of being put inside the main file to avoid the database being recreated every time at app startup.
- One more thing to take into consideration, if you use Supabase and have already defined the tables before coding, make sure not to recreate them one more time when starting the app.
- I've done with the part to set up different environments for testing, dev, and prod:
- For testing, I just use an in-memory database to run the test using
pytest
- For development and production, I currently use one Supabase database, but I plan to create a different one for production
- To make things easier to run, I created a folder
scripts
to store different shell scripts for test, dev, and prod, and learned to make the scripts executable by usingchmod +x scripts/*.sh
- For testing, I just use an in-memory database to run the test using
- I encountered a compatibility issue between
passlib
andbcrypt
, so to resolve the problem, I asked Claude and found out that the newer version ofbcrypt
is not compatible withpasslib
. Then, one solution was to useargon2
, which is more secure and compatible.
pwd_context = CryptContext(schemes=["argon2"], deprecated="auto")
- I installed
jwt
instead ofPyJWT
for JWT-based auth; the latter is the correct one
return jwt.encode(payload, settings.SECRET_KEY, algorithm=ALGORITHM)
^^^^^^^^^^
AttributeError: module 'jwt' has no attribute 'encode'
- I put the
tests
folder in the wrong location - the correct one should be inside the root level of thebackend
folder, not in theapp
folder.
🎉 Authentication works!
AI.Technical.Interviewer.-.Auth.Feature.mp4
- Start by creating the backend logic for creating manual questions
- Create the frontend component for the create-question modal
- Then, continue to create the backend logic for parsing the resume and generating questions from the parsed data
- Create the frontend component for uploading a resume
- Create the frontend component for displaying generated questions/manual questions
- Then, create the backend logic for audio transcription
- Create the audio recorder component on the frontend
I've done with the logic to parse the resume data. We can actually use PyMuPDF4LLM
to parse the PDF text into markdown. This is a specific library built for parsing PDF into input used for different LLMs. Basically, the function takes the uploaded resume input as raw bytes.
tempfile.NamedTemporaryFile(...)
creates a temporary file on disk that will be deleted after thewith
block and has a.pdf
suffix so pymupdf4llm recognizes it as a PDF.tmp.write(file_bytes)
writes the raw bytes into the temporary PDF file.tmp.flush()
ensures that all data is written from the buffer to disk, so it’s ready for reading by other processes.- Using
strip()
to return the cleaned-up (stripped of leading/trailing whitespace) Markdown string.
def parse_resume_to_markdown(file_bytes: bytes) -> str:
"""Write PDF to a temp file and extract markdown using pymupdf4llm."""
try:
with tempfile.NamedTemporaryFile(delete=True, suffix=".pdf") as tmp:
tmp.write(file_bytes)
tmp.flush()
# Parse the file into markdown!
markdown = pymupdf4llm.to_markdown(tmp.name)
if not markdown or not markdown.strip():
raise HTTPException(
status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
detail="Could not extract text from PDF. The file may be corrupted or contain only images.",
)
return markdown.strip()
except Exception as e:
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to parse PDF: {str(e)}",
)
Wow, this one is much harder and more confusing than I thought. I used open-source Whisper model for audio transcription and Gemini API to respond to user's answers.
- README with:
- Overview
- Feature List
- Installation & Setup
- API Reference
- Record a Loom demo