[CLIP] Captioning Pipeline #1145

dsikka · 2023-07-25T20:49:46Z

Note: there are currently no models for captioning on sparsezoo.
We have an open issue with open_clip to track some of the issues with CoCa models that have been brought up.

Summary

Implement CLIPCaptioning and CLIPDecoder pipelines. These pipelines allow us to produce captions given an image. This leverages the previous CLIPVisual and CLIPText Pipelines that were implemented for zeroshot, with some modifications to make them more generic
The captioning pipeline adds a _generate function which is adapted from open_clip and applies BeamSearch to build the caption: https://github.com/neuralmagic/open_clip/blob/onnx-edit/src/open_clip/coca_model.py
One caveat is that in open_clip's implementation, the input sequence length is dynamic. We're using padded sequences
Also, the exported onnx models are all originally from open_clip

Testing

Added tests to the original clip tests
Also ran the following script to generate captions for various images:

from deepsparse import BasePipeline, Pipeline
from deepsparse.clip import CLIPCaptionInput, CLIPCaptionPipeline, CLIPVisualInput

root = "caption_models"
model_path_visual = f"{root}/clip_visual.onnx"
model_path_text = f"{root}/clip_text.onnx"
model_path_decoder = f"{root}/clip_text_decoder.onnx"

kwargs = {
    "visual_model_path": model_path_visual,
    "text_model_path": model_path_text,
    "decoder_model_path": model_path_decoder,
}
pipeline = BasePipeline.create(task="clip_caption", **kwargs)

pipeline_input = CLIPCaptionInput(image=CLIPVisualInput(images="mountain.jpg"))
output = pipeline(pipeline_input)

Examples of images and the generated caption:

Caption: a view of mountains in the background .

Caption: an adult elephant and a baby elephant .

Caption: a cup of coffee .

The base branch was changed.

dsikka changed the base branch from main to clip_zshot July 25, 2023 20:49

dsikka marked this pull request as ready for review July 27, 2023 20:59

dsikka force-pushed the clip_zshot branch from f3508da to 06e958e Compare July 31, 2023 20:18

dsikka force-pushed the captioning branch 4 times, most recently from 743bd95 to 402bfc6 Compare July 31, 2023 23:54

dsikka requested review from bfineran and dbogunowicz August 1, 2023 00:53

dsikka assigned rahul-tuli Aug 1, 2023

dsikka requested a review from Satrat August 1, 2023 00:53

dsikka assigned dsikka and unassigned rahul-tuli Aug 1, 2023

dsikka requested a review from rahul-tuli August 1, 2023 00:54

dsikka force-pushed the captioning branch from da61183 to dba5711 Compare August 1, 2023 00:56

dbogunowicz previously approved these changes Aug 1, 2023

View reviewed changes

dsikka force-pushed the clip_zshot branch from 06e958e to ca54b1d Compare August 1, 2023 14:10

dsikka force-pushed the captioning branch from dba5711 to b0695f6 Compare August 1, 2023 14:53

bfineran previously approved these changes Aug 1, 2023

View reviewed changes

Base automatically changed from clip_zshot to main August 2, 2023 18:03

dsikka added 8 commits August 2, 2023 22:53

initial refactor

9fa4197

move BasePipeline to a new file

073cf38

test fix

128f7eb

anothe test fix

f5826a4

fix import

da81c7d

revert

a370e02

initial refactor

04e01f5

add tests for BasePipeline

1c0a086

dsikka added 20 commits August 2, 2023 23:34

initial refactor

cd35c2b

move BasePipeline to a new file

2624c41

initial refactor

bebe206

rebase fix

9a81e32

move paths to fixtures

921818e

initial refactor

836b157

initial caption functionality

a7f1e30

debugging

11e4c0e

more debugging

10c7835

post debugging code

7d1c5ca

fix imports

0f9ebcc

cleanup post model fix

9b147fc

fix variable names, some clean-up

699dafc

remove image embs loading

55cf8c6

update dimensions

83c9570

rebase

ce670a9

remove extra param

6c8cd4d

remove typo

dd1d6b2

update README instructions; fix linalg import

04be990

clean-up pipelines, updatetyping and descriptions

74a6e5b

dsikka force-pushed the captioning branch from 734c609 to 5114009 Compare August 2, 2023 23:49

rebase fix

d583fde

dsikka force-pushed the captioning branch from 5114009 to d583fde Compare August 2, 2023 23:52

dsikka requested review from bfineran and dbogunowicz August 2, 2023 23:52

expose pipeline engine args

8fb97ad

bfineran approved these changes Aug 3, 2023

View reviewed changes

dbogunowicz approved these changes Aug 7, 2023

View reviewed changes

dsikka merged commit ffeb98f into main Aug 7, 2023

dsikka deleted the captioning branch August 7, 2023 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CLIP] Captioning Pipeline #1145

[CLIP] Captioning Pipeline #1145

Uh oh!

dsikka commented Jul 25, 2023 •

edited

Loading

Uh oh!

Uh oh!

[CLIP] Captioning Pipeline #1145

[CLIP] Captioning Pipeline #1145

Uh oh!

Conversation

dsikka commented Jul 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

Uh oh!

dsikka commented Jul 25, 2023 •

edited

Loading