Fix: StringLookup returns torch native types for torch backend #21614

Ma-gi-cian · 2025-08-25T17:07:10Z

Implementations:
-Refactored the call method in StringLookup to provide consistent, PyTorch-native outputs when using the torch backend.
-Ensures that forward lookups (string-to-int) now always return a torch.Tensor.
-Ensures that inverse lookups (int-to-string) now always return a Python list, aligning behavior with torchtext.vocab.lookup_tokens.
-Added test_torch_backend_compatibility to validate the fix and prevent future regressions.

Here is a test file along side the output to confirm its working :

import os
os.environ['KERAS_BACKEND'] = 'torch'
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import numpy as np
import torch

import tensorflow as tf
from keras.src import backend
from keras.src import layers

def run_stringlookup_checks():
    print(f"Current backend is: {backend.backend()})\n")

    vocab = ["a", "b", "c"]
    oov_token = "[OOV]"

    print("1. Forward Lookup (strings to numbers)")
    forward_lookup_layer = layers.StringLookup(
        vocabulary=vocab, oov_token=oov_token
    )

    input_strings = ["a", "c", "d"]
    print(f"Input (list): {input_strings}")

    output_numeric = forward_lookup_layer(input_strings)

    print(f"Output: {output_numeric}")
    print(f"Output Type: {type(output_numeric)}")
    print("Expected: A torch.Tensor with values [1, 3, 0]\n")

    print("2. Inverse Lookup (numbers to strings)")
    inverse_lookup_layer = layers.StringLookup(
        vocabulary=vocab, oov_token=oov_token, invert=True
    )

    input_integers_torch = torch.tensor([1, 3, 0], dtype=torch.int64)
    print(f"Input (torch.Tensor): {input_integers_torch}")

    output_strings_torch = inverse_lookup_layer(input_integers_torch)

    print(f"Output: {output_strings_torch}")
    print(f"Output Type: {type(output_strings_torch)}")
    print("Expected: A Python list ['a', 'c', '[OOV]']\n")

    print("3. Passing Numpy array for torch backend")
    input_integers_numpy = np.array([1, 3, 0], dtype=np.int64)
    print(f" Input (numpy.ndarray): {input_integers_numpy}")

    output_strings_numpy = inverse_lookup_layer(input_integers_numpy)

    print(f"Output: {output_strings_numpy}")
    print(f"Output Type: {type(output_strings_numpy)}")
    print("Expected: A Python list ['a', 'c', '[OOV]']")

if __name__ == "__main__":
    run_stringlookup_checks()

Output:

jha@MAGICIAN:~/testing$ python3 main.py
Current backend is: torch)

1. Forward Lookup (strings to numbers)
Input (list): ['a', 'c', 'd']
Output: tensor([1, 3, 0])
Output Type: <class 'torch.Tensor'>
Expected: A torch.Tensor with values [1, 3, 0]

2. Inverse Lookup (numbers to strings)
Input (torch.Tensor): tensor([1, 3, 0])
Output: ['a', 'c', '[OOV]']
Output Type: <class 'list'>
Expected: A Python list ['a', 'c', '[OOV]']

3. Passing Numpy array for torch backend
 Input (numpy.ndarray): [1 3 0]
Output: ['a', 'c', '[OOV]']
Output Type: <class 'list'>
Expected: A Python list ['a', 'c', '[OOV]']

gemini-code-assist

Summary of Changes

Hello @Ma-gi-cian, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue in the Keras StringLookup layer to ensure consistent output types when using the PyTorch backend. The changes refactor the layer's call method to return native PyTorch tensors for forward lookups (string to integer) and standard Python lists for inverse lookups (integer to string). This aligns the layer's behavior with PyTorch's conventions and resolves a reported bug, improving compatibility and predictability. A new test case has also been added to validate these changes.

Highlights

Consistent PyTorch Output: The StringLookup layer's call method has been refactored to ensure that when the torch backend is active, forward lookups (string-to-integer) consistently return torch.Tensor objects.
Python List for Inverse Lookup: For inverse lookups (integer-to-string), the StringLookup layer now returns a standard Python list of strings, mirroring the behavior of torchtext.vocab.lookup_tokens.
Improved Input Handling: The call method now explicitly handles torch.Tensor inputs, converting them to NumPy arrays and then to TensorFlow tensors for internal processing, before converting the final output back to the appropriate PyTorch-native type or Python list.
New Compatibility Test: A new test case, test_torch_backend_compatibility, has been added to validate the corrected behavior of StringLookup with the PyTorch backend and prevent future regressions.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request correctly adjusts the StringLookup layer to return PyTorch-native types when using the torch backend, improving consistency. The changes ensure forward lookups return torch.Tensor and inverse lookups return a Python list, aligning with torchtext behavior. The addition of test_torch_backend_compatibility is great for ensuring this behavior is maintained.

My main feedback is to refactor the call method in string_lookup.py for better readability and to address a minor issue with an unreachable comment. The proposed refactoring simplifies the conditional logic without changing the functionality.

keras/src/layers/preprocessing/string_lookup.py

codecov-commenter · 2025-08-25T17:13:20Z

Codecov Report

❌ Patch coverage is 90.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.45%. Comparing base (b9ff57a) to head (3891fb3).

Files with missing lines	Patch %	Lines
keras/src/layers/preprocessing/string_lookup.py	90.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #21614   +/-   ##
=======================================
  Coverage   82.45%   82.45%           
=======================================
  Files         572      572           
  Lines       57337    57348   +11     
  Branches     8970     8974    +4     
=======================================
+ Hits        47277    47288   +11     
  Misses       7761     7761           
  Partials     2299     2299

Flag	Coverage Δ
keras	`82.26% <85.00%> (+<0.01%)`	⬆️
keras-jax	`63.57% <45.00%> (-0.01%)`	⬇️
keras-numpy	`57.85% <45.00%> (-0.02%)`	⬇️
keras-openvino	`34.34% <0.00%> (-0.01%)`	⬇️
keras-tensorflow	`64.21% <45.00%> (-0.01%)`	⬇️
keras-torch	`63.79% <75.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Fix: StringLookup returns torch native types for torch backend

347ed09

google-ml-butler bot added the size:M label Aug 25, 2025

google-ml-butler bot assigned gbaned Aug 25, 2025

gemini-code-assist bot reviewed Aug 25, 2025

View reviewed changes

keras/src/layers/preprocessing/string_lookup.py Outdated Show resolved Hide resolved

Ma-gi-cian added 3 commits August 25, 2025 23:07

Formatting and making logic clean

777d036

Backend other than tensorflow and pytorch

3a189d0

fixed backend other than torch and tensorflow

3891fb3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: StringLookup returns torch native types for torch backend #21614

Fix: StringLookup returns torch native types for torch backend #21614

Uh oh!

Ma-gi-cian commented Aug 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

codecov-commenter commented Aug 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Fix: StringLookup returns torch native types for torch backend #21614

Are you sure you want to change the base?

Fix: StringLookup returns torch native types for torch backend #21614

Uh oh!

Conversation

Ma-gi-cian commented Aug 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

codecov-commenter commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

codecov-commenter commented Aug 25, 2025 •

edited

Loading