[FirebaseAI] Add Multimodal Analysis demos #1750

YoungHypo · 2025-08-17T23:16:49Z

PR description

Added demo for multimodal analysis, including image, video, audio, and PDF support.
Enabled file uploads from local gallery, file system, and internet URLs.
Introduced an attachment preview card.
Added an independent example module for grounding.
Removed obsolete files: InputField.swift and MultimodalInput.swift.
Added an all pillar entry to the home screen.

Related issue

#1729.

Demos

Notes: The app functionality is now aligned with quickstart-android

gemini-code-assist

Summary of Changes

Hello @YoungHypo, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Firebase AI example application by integrating robust multimodal analysis features. It introduces the ability to handle various media types like images, videos, audio, and PDFs, alongside flexible file upload options from local sources and URLs. A new attachment preview card improves user interaction, and a dedicated grounding example module demonstrates how AI responses can be anchored to external information. The changes also involve substantial refactoring and cleanup of the codebase, including the removal of obsolete components and the addition of a comprehensive 'all' entry on the home screen for better sample discoverability. These updates aim to provide a more versatile and user-friendly platform for showcasing Firebase AI's capabilities.

Highlights

Enhanced Multimodal Analysis: This PR introduces comprehensive multimodal analysis capabilities, allowing the application to process and interact with various media types including images, videos, audio, and PDF documents. This significantly expands the AI model's understanding and response generation based on diverse inputs.
Flexible File Uploads: Users can now upload files from multiple sources: their local photo gallery, the device's file system, and directly from internet URLs. This provides great flexibility in how users provide input to the AI model.
New Attachment Preview Card: A new attachment preview card has been implemented, offering a visual representation of the files attached to a message before they are sent. This improves the user experience by providing immediate feedback on selected media.
Dedicated Grounding Example Module: An independent example module for 'grounding' has been added. Grounding allows the AI model to base its responses on specific, up-to-date information, often from external sources like Google Search, enhancing the accuracy and relevance of generated content.
Codebase Cleanup and Refactoring: The project structure has been streamlined by removing obsolete files such as InputField.swift and MultimodalInput.swift, which were replaced by more generic and flexible input handling within the ConversationKit framework. This reduces technical debt and simplifies the codebase.
Consolidated Home Screen Experience: A new 'all' pillar entry has been added to the home screen, providing a consolidated view of all available AI samples and demos. This improves navigation and discoverability for users exploring the application's capabilities.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is a substantial and well-executed pull request that significantly enhances the example app with multimodal capabilities. The introduction of MultimodalAttachment and the associated views and view models is a great addition. The code is generally clean and follows good practices. I have two pieces of feedback: one critical issue regarding Hashable conformance that could lead to bugs, and a minor typo in a filename. Additionally, to improve long-term maintainability, you might consider refactoring the various ViewModel classes (like ChatViewModel, MultimodalViewModel, GroundingViewModel) to inherit from a common base class, as they share a lot of boilerplate code.

firebaseai/ChatExample/Models/ChatMessage.swift

firebaseai/GroundingExample/ViewModels/GroundingViewMoel.swift

gemini-code-assist

Code Review

This pull request significantly enhances the FirebaseAI example app by adding comprehensive multimodal analysis demos, including support for images, videos, audio, and PDFs. The changes are well-structured, introducing new screens, view models, and data models to support the new functionality, while also refactoring existing code. My review focuses on a few key areas: a critical correctness issue in Hashable conformance, and several opportunities to improve maintainability by reducing code duplication, enhancing error handling, and increasing code readability. Overall, this is a great addition to the project.

firebaseai/ChatExample/Models/ChatMessage.swift

gemini-code-assist · 2025-08-17T23:25:17Z

firebaseai/ChatExample/ViewModels/ChatViewModel.swift

+          if let inlineDataPart = chunk.inlineDataParts.first {
+            if let uiImage = UIImage(data: inlineDataPart.data) {
+              messages[messages.count - 1].image = uiImage
+            } else {
+              print("Failed to convert inline data to UIImage")
            }
          }


This block of code for handling an inlineDataPart is duplicated in internalSendMessage on lines 159-165. To improve maintainability and reduce redundancy, consider extracting this logic into a private helper method. This would make the code cleaner and easier to manage.

firebaseai/MultimodalExample/Models/MultimodalAttachment.swift

firebaseai/MultimodalExample/Views/AttachmentPreviewCard.swift

YoungHypo · 2025-08-18T22:53:26Z

New Updates

Move the initialization of FirebaseService into the ViewModels

FirebaseAIExample/ContentView only needs to pass the enum values for Vertex AI and Gemini API, while each ViewModel file contains the full initialization flow. This eliminates the need to jump across multiple files to understand how the service is created, making each functional module easier to grasp, and simpler to reuse.

A new data member `fileDataParts` was added for Cloud Storage file URLs

In sample.swift, strictly speaking, this is intended only for Vertex AI. However, since the URLs are public and include file-type suffixes, they can also be handled by the Gemini API using the Data(contentsof:) to download files. Both of the ai backends can handle the cloud storage urls in different ways. Ideally, an alert should be added to prevent users from navigating and require them to switch manually. For now, this change needs to be discussed further, as it would refactor the ContentView UI from NavigationLink to NavigationDestination, and the PR already involves too many changes..

Keep only the Assets in the FirebaseALExample folder and remove all other Assets.

Although it looks like many files were changed, about 20 files were just deleted or relocated.

peterfriese

Great work!

I left a few comments, nothing major.

peterfriese · 2025-08-26T13:27:30Z

firebaseai/GroundingExample/Screens/GroundingScreen.swift

+      }
+      .disableAttachments()
+      .onSendMessage { message in
+        Task {


onSendMessage is now async. You can remove the Task { } here.

peterfriese · 2025-08-26T13:27:42Z

firebaseai/MultimodalExample/Screens/MultimodalScreen.swift

+      }
+      .attachmentPreview { attachmentPreviewScrollView }
+      .onSendMessage { message in
+        Task {


onSendMessage is now async. You can remove the Task { } here.

BTW, this applies to all other ConversationView instances as well, also the ones not part of this PR.

peterfriese · 2025-08-26T15:06:18Z

firebaseai/MultimodalExample/Models/MultimodalAttachment.swift

+
+  public static func fromPhotosPickerItem(_ item: PhotosPickerItem) async -> MultimodalAttachment? {
+    do {
+      guard let data = try await item.loadTransferable(type: Data.self) else {


When trying to attach photos, I often get Failed to create attachment from PhotosPickerItem: [CoreTransferble] Given Transferable item does not support import - not sure if this is caused by live photos.

Can you try to find a solution for this?

peterfriese · 2025-08-26T15:09:51Z

firebaseai/ChatExample/ViewModels/ChatViewModel.swift

    self.sample = sample
+    self.backendType = backendType
+
+    let firebaseService: FirebaseAI


I think a ternary might be more compact:

let firebaseService = backendType == .googleAI ? FirebaseAI.firebaseAI(backend: .googleAI()) : FirebaseAI.firebaseAI(backend: .vertexAI())

peterfriese · 2025-08-26T15:11:29Z

firebaseai/ImagenExample/ImagenViewModel.swift

    self.sample = sample
+    self.backendType = backendType
+
+    let firebaseService: FirebaseAI


Same here:

let firebaseService = backendType == .googleAI ? FirebaseAI.firebaseAI(backend: .googleAI()) : FirebaseAI.firebaseAI(backend: .vertexAI())

peterfriese · 2025-08-26T15:12:17Z

firebaseai/FunctionCallingExample/ViewModels/FunctionCallingViewModel.swift

    self.sample = sample
+    self.backendType = backendType
+
+    let firebaseService: FirebaseAI


let firebaseService = backendType == .googleAI ? FirebaseAI.firebaseAI(backend: .googleAI()) : FirebaseAI.firebaseAI(backend: .vertexAI())

peterfriese · 2025-08-26T15:12:53Z

firebaseai/GroundingExample/ViewModels/GroundingViewModel.swift

+    self.sample = sample
+    self.backendType = backendType
+
+    let firebaseService: FirebaseAI


let firebaseService = backendType == .googleAI ? FirebaseAI.firebaseAI(backend: .googleAI()) : FirebaseAI.firebaseAI(backend: .vertexAI())

YoungHypo added 2 commits August 17, 2025 14:21

add multimodal analysis

d2d4f87

fix bug of gemini image generation

1d369bf

gemini-code-assist bot reviewed Aug 17, 2025

View reviewed changes

firebaseai/ChatExample/Models/ChatMessage.swift Outdated Show resolved Hide resolved

firebaseai/GroundingExample/ViewModels/GroundingViewMoel.swift Outdated Show resolved Hide resolved

fix typo

a6a7e8e

gemini-code-assist bot reviewed Aug 17, 2025

View reviewed changes

YoungHypo added 4 commits August 17, 2025 16:33

add error logs

042c9c5

add chathistory in multimodal exmaple

107e963

add cloud storage and move firebaseService in viewModels

0bcee88

add image in hasher

fe8ff0b

YoungHypo added 2 commits August 18, 2025 16:03

fix comments

dd4ae93

add imageURL

401e983

peterfriese requested changes Aug 26, 2025

View reviewed changes

[FirebaseAI] Add Multimodal Analysis demos #1750

Are you sure you want to change the base?

[FirebaseAI] Add Multimodal Analysis demos #1750

Uh oh!

Conversation

YoungHypo commented Aug 17, 2025

PR description

Related issue

Demos

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

YoungHypo commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New Updates

Move the initialization of FirebaseService into the ViewModels

A new data member fileDataParts was added for Cloud Storage file URLs

Keep only the Assets in the FirebaseALExample folder and remove all other Assets.

Uh oh!

peterfriese left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

YoungHypo commented Aug 18, 2025 •

edited

Loading

A new data member `fileDataParts` was added for Cloud Storage file URLs