Skip to content

Conversation

YoungHypo
Copy link
Contributor

PR description

  • Added demo for multimodal analysis, including image, video, audio, and PDF support.
  • Enabled file uploads from local gallery, file system, and internet URLs.
  • Introduced an attachment preview card.
  • Added an independent example module for grounding.
  • Removed obsolete files: InputField.swift and MultimodalInput.swift.
  • Added an all pillar entry to the home screen.

Related issue

#1729.

Demos

Notes: The app functionality is now aligned with quickstart-android

simulator_screenshot_94950DC2-E9F3-4BC4-9959-222B405B707C simulator_screenshot_B9454BDB-159C-42EC-9522-8FA5EF63B3EF

simulator_screenshot_316EDD56-3F17-4128-BC07-3820D7EFC3DE simulator_screenshot_62275E3C-0353-4A8B-9DBB-B0DAED8147AB

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @YoungHypo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Firebase AI example application by integrating robust multimodal analysis features. It introduces the ability to handle various media types like images, videos, audio, and PDFs, alongside flexible file upload options from local sources and URLs. A new attachment preview card improves user interaction, and a dedicated grounding example module demonstrates how AI responses can be anchored to external information. The changes also involve substantial refactoring and cleanup of the codebase, including the removal of obsolete components and the addition of a comprehensive 'all' entry on the home screen for better sample discoverability. These updates aim to provide a more versatile and user-friendly platform for showcasing Firebase AI's capabilities.

Highlights

  • Enhanced Multimodal Analysis: This PR introduces comprehensive multimodal analysis capabilities, allowing the application to process and interact with various media types including images, videos, audio, and PDF documents. This significantly expands the AI model's understanding and response generation based on diverse inputs.
  • Flexible File Uploads: Users can now upload files from multiple sources: their local photo gallery, the device's file system, and directly from internet URLs. This provides great flexibility in how users provide input to the AI model.
  • New Attachment Preview Card: A new attachment preview card has been implemented, offering a visual representation of the files attached to a message before they are sent. This improves the user experience by providing immediate feedback on selected media.
  • Dedicated Grounding Example Module: An independent example module for 'grounding' has been added. Grounding allows the AI model to base its responses on specific, up-to-date information, often from external sources like Google Search, enhancing the accuracy and relevance of generated content.
  • Codebase Cleanup and Refactoring: The project structure has been streamlined by removing obsolete files such as InputField.swift and MultimodalInput.swift, which were replaced by more generic and flexible input handling within the ConversationKit framework. This reduces technical debt and simplifies the codebase.
  • Consolidated Home Screen Experience: A new 'all' pillar entry has been added to the home screen, providing a consolidated view of all available AI samples and demos. This improves navigation and discoverability for users exploring the application's capabilities.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a substantial and well-executed pull request that significantly enhances the example app with multimodal capabilities. The introduction of MultimodalAttachment and the associated views and view models is a great addition. The code is generally clean and follows good practices. I have two pieces of feedback: one critical issue regarding Hashable conformance that could lead to bugs, and a minor typo in a filename. Additionally, to improve long-term maintainability, you might consider refactoring the various ViewModel classes (like ChatViewModel, MultimodalViewModel, GroundingViewModel) to inherit from a common base class, as they share a lot of boilerplate code.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the FirebaseAI example app by adding comprehensive multimodal analysis demos, including support for images, videos, audio, and PDFs. The changes are well-structured, introducing new screens, view models, and data models to support the new functionality, while also refactoring existing code. My review focuses on a few key areas: a critical correctness issue in Hashable conformance, and several opportunities to improve maintainability by reducing code duplication, enhancing error handling, and increasing code readability. Overall, this is a great addition to the project.

Comment on lines +112 to 118
if let inlineDataPart = chunk.inlineDataParts.first {
if let uiImage = UIImage(data: inlineDataPart.data) {
messages[messages.count - 1].image = uiImage
} else {
print("Failed to convert inline data to UIImage")
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of code for handling an inlineDataPart is duplicated in internalSendMessage on lines 159-165. To improve maintainability and reduce redundancy, consider extracting this logic into a private helper method. This would make the code cleaner and easier to manage.

@YoungHypo
Copy link
Contributor Author

YoungHypo commented Aug 18, 2025

New Updates

Move the initialization of FirebaseService into the ViewModels

FirebaseAIExample/ContentView only needs to pass the enum values for Vertex AI and Gemini API, while each ViewModel file contains the full initialization flow. This eliminates the need to jump across multiple files to understand how the service is created, making each functional module easier to grasp, and simpler to reuse.

A new data member fileDataParts was added for Cloud Storage file URLs

In sample.swift, strictly speaking, this is intended only for Vertex AI. However, since the URLs are public and include file-type suffixes, they can also be handled by the Gemini API using the Data(contentsof:) to download files. Both of the ai backends can handle the cloud storage urls in different ways. Ideally, an alert should be added to prevent users from navigating and require them to switch manually. For now, this change needs to be discussed further, as it would refactor the ContentView UI from NavigationLink to NavigationDestination, and the PR already involves too many changes..

Keep only the Assets in the FirebaseALExample folder and remove all other Assets.

Although it looks like many files were changed, about 20 files were just deleted or relocated.

Copy link
Contributor

@peterfriese peterfriese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

I left a few comments, nothing major.

}
.disableAttachments()
.onSendMessage { message in
Task {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onSendMessage is now async. You can remove the Task { } here.

}
.attachmentPreview { attachmentPreviewScrollView }
.onSendMessage { message in
Task {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onSendMessage is now async. You can remove the Task { } here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this applies to all other ConversationView instances as well, also the ones not part of this PR.


public static func fromPhotosPickerItem(_ item: PhotosPickerItem) async -> MultimodalAttachment? {
do {
guard let data = try await item.loadTransferable(type: Data.self) else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When trying to attach photos, I often get Failed to create attachment from PhotosPickerItem: [CoreTransferble] Given Transferable item does not support import - not sure if this is caused by live photos.

Can you try to find a solution for this?

self.sample = sample
self.backendType = backendType

let firebaseService: FirebaseAI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a ternary might be more compact:

    let firebaseService = backendType == .googleAI
      ? FirebaseAI.firebaseAI(backend: .googleAI())
      : FirebaseAI.firebaseAI(backend: .vertexAI())

self.sample = sample
self.backendType = backendType

let firebaseService: FirebaseAI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here:

    let firebaseService = backendType == .googleAI
      ? FirebaseAI.firebaseAI(backend: .googleAI())
      : FirebaseAI.firebaseAI(backend: .vertexAI())

self.sample = sample
self.backendType = backendType

let firebaseService: FirebaseAI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    let firebaseService = backendType == .googleAI
      ? FirebaseAI.firebaseAI(backend: .googleAI())
      : FirebaseAI.firebaseAI(backend: .vertexAI())

self.sample = sample
self.backendType = backendType

let firebaseService: FirebaseAI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    let firebaseService = backendType == .googleAI
      ? FirebaseAI.firebaseAI(backend: .googleAI())
      : FirebaseAI.firebaseAI(backend: .vertexAI())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants