-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[FirebaseAI] Add Multimodal Analysis demos #1750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: peterfriese/firebase-ai-quickstart-refresh
Are you sure you want to change the base?
[FirebaseAI] Add Multimodal Analysis demos #1750
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @YoungHypo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the Firebase AI example application by integrating robust multimodal analysis features. It introduces the ability to handle various media types like images, videos, audio, and PDFs, alongside flexible file upload options from local sources and URLs. A new attachment preview card improves user interaction, and a dedicated grounding example module demonstrates how AI responses can be anchored to external information. The changes also involve substantial refactoring and cleanup of the codebase, including the removal of obsolete components and the addition of a comprehensive 'all' entry on the home screen for better sample discoverability. These updates aim to provide a more versatile and user-friendly platform for showcasing Firebase AI's capabilities.
Highlights
- Enhanced Multimodal Analysis: This PR introduces comprehensive multimodal analysis capabilities, allowing the application to process and interact with various media types including images, videos, audio, and PDF documents. This significantly expands the AI model's understanding and response generation based on diverse inputs.
- Flexible File Uploads: Users can now upload files from multiple sources: their local photo gallery, the device's file system, and directly from internet URLs. This provides great flexibility in how users provide input to the AI model.
- New Attachment Preview Card: A new attachment preview card has been implemented, offering a visual representation of the files attached to a message before they are sent. This improves the user experience by providing immediate feedback on selected media.
- Dedicated Grounding Example Module: An independent example module for 'grounding' has been added. Grounding allows the AI model to base its responses on specific, up-to-date information, often from external sources like Google Search, enhancing the accuracy and relevance of generated content.
- Codebase Cleanup and Refactoring: The project structure has been streamlined by removing obsolete files such as
InputField.swift
andMultimodalInput.swift
, which were replaced by more generic and flexible input handling within theConversationKit
framework. This reduces technical debt and simplifies the codebase. - Consolidated Home Screen Experience: A new 'all' pillar entry has been added to the home screen, providing a consolidated view of all available AI samples and demos. This improves navigation and discoverability for users exploring the application's capabilities.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is a substantial and well-executed pull request that significantly enhances the example app with multimodal capabilities. The introduction of MultimodalAttachment
and the associated views and view models is a great addition. The code is generally clean and follows good practices. I have two pieces of feedback: one critical issue regarding Hashable
conformance that could lead to bugs, and a minor typo in a filename. Additionally, to improve long-term maintainability, you might consider refactoring the various ViewModel
classes (like ChatViewModel
, MultimodalViewModel
, GroundingViewModel
) to inherit from a common base class, as they share a lot of boilerplate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request significantly enhances the FirebaseAI example app by adding comprehensive multimodal analysis demos, including support for images, videos, audio, and PDFs. The changes are well-structured, introducing new screens, view models, and data models to support the new functionality, while also refactoring existing code. My review focuses on a few key areas: a critical correctness issue in Hashable
conformance, and several opportunities to improve maintainability by reducing code duplication, enhancing error handling, and increasing code readability. Overall, this is a great addition to the project.
if let inlineDataPart = chunk.inlineDataParts.first { | ||
if let uiImage = UIImage(data: inlineDataPart.data) { | ||
messages[messages.count - 1].image = uiImage | ||
} else { | ||
print("Failed to convert inline data to UIImage") | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New UpdatesMove the initialization of FirebaseService into the ViewModels
A new data member
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
I left a few comments, nothing major.
} | ||
.disableAttachments() | ||
.onSendMessage { message in | ||
Task { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onSendMessage is now async. You can remove the Task { } here.
} | ||
.attachmentPreview { attachmentPreviewScrollView } | ||
.onSendMessage { message in | ||
Task { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onSendMessage is now async. You can remove the Task { } here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, this applies to all other ConversationView instances as well, also the ones not part of this PR.
|
||
public static func fromPhotosPickerItem(_ item: PhotosPickerItem) async -> MultimodalAttachment? { | ||
do { | ||
guard let data = try await item.loadTransferable(type: Data.self) else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When trying to attach photos, I often get Failed to create attachment from PhotosPickerItem: [CoreTransferble] Given Transferable item does not support import
- not sure if this is caused by live photos.
Can you try to find a solution for this?
self.sample = sample | ||
self.backendType = backendType | ||
|
||
let firebaseService: FirebaseAI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a ternary might be more compact:
let firebaseService = backendType == .googleAI
? FirebaseAI.firebaseAI(backend: .googleAI())
: FirebaseAI.firebaseAI(backend: .vertexAI())
self.sample = sample | ||
self.backendType = backendType | ||
|
||
let firebaseService: FirebaseAI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here:
let firebaseService = backendType == .googleAI
? FirebaseAI.firebaseAI(backend: .googleAI())
: FirebaseAI.firebaseAI(backend: .vertexAI())
self.sample = sample | ||
self.backendType = backendType | ||
|
||
let firebaseService: FirebaseAI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let firebaseService = backendType == .googleAI
? FirebaseAI.firebaseAI(backend: .googleAI())
: FirebaseAI.firebaseAI(backend: .vertexAI())
self.sample = sample | ||
self.backendType = backendType | ||
|
||
let firebaseService: FirebaseAI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let firebaseService = backendType == .googleAI
? FirebaseAI.firebaseAI(backend: .googleAI())
: FirebaseAI.firebaseAI(backend: .vertexAI())
PR description
all
pillar entry to the home screen.Related issue
#1729.
Demos
Notes: The app functionality is now aligned with quickstart-android