Skip to content

Conversation

Ma-gi-cian
Copy link

Closes #21255

Implementations:
-Refactored the call method in StringLookup to provide consistent, PyTorch-native outputs when using the torch backend.
-Ensures that forward lookups (string-to-int) now always return a torch.Tensor.
-Ensures that inverse lookups (int-to-string) now always return a Python list, aligning behavior with torchtext.vocab.lookup_tokens.
-Added test_torch_backend_compatibility to validate the fix and prevent future regressions.

Here is a test file along side the output to confirm its working :

import os
os.environ['KERAS_BACKEND'] = 'torch'
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import numpy as np
import torch

import tensorflow as tf
from keras.src import backend
from keras.src import layers

def run_stringlookup_checks():
    print(f"Current backend is: {backend.backend()})\n")

    vocab = ["a", "b", "c"]
    oov_token = "[OOV]"

    print("1. Forward Lookup (strings to numbers)")
    forward_lookup_layer = layers.StringLookup(
        vocabulary=vocab, oov_token=oov_token
    )

    input_strings = ["a", "c", "d"]
    print(f"Input (list): {input_strings}")

    output_numeric = forward_lookup_layer(input_strings)

    print(f"Output: {output_numeric}")
    print(f"Output Type: {type(output_numeric)}")
    print("Expected: A torch.Tensor with values [1, 3, 0]\n")

    print("2. Inverse Lookup (numbers to strings)")
    inverse_lookup_layer = layers.StringLookup(
        vocabulary=vocab, oov_token=oov_token, invert=True
    )

    input_integers_torch = torch.tensor([1, 3, 0], dtype=torch.int64)
    print(f"Input (torch.Tensor): {input_integers_torch}")

    output_strings_torch = inverse_lookup_layer(input_integers_torch)

    print(f"Output: {output_strings_torch}")
    print(f"Output Type: {type(output_strings_torch)}")
    print("Expected: A Python list ['a', 'c', '[OOV]']\n")

    print("3. Passing Numpy array for torch backend")
    input_integers_numpy = np.array([1, 3, 0], dtype=np.int64)
    print(f" Input (numpy.ndarray): {input_integers_numpy}")

    output_strings_numpy = inverse_lookup_layer(input_integers_numpy)

    print(f"Output: {output_strings_numpy}")
    print(f"Output Type: {type(output_strings_numpy)}")
    print("Expected: A Python list ['a', 'c', '[OOV]']")

if __name__ == "__main__":
    run_stringlookup_checks()

Output:

jha@MAGICIAN:~/testing$ python3 main.py
Current backend is: torch)

1. Forward Lookup (strings to numbers)
Input (list): ['a', 'c', 'd']
Output: tensor([1, 3, 0])
Output Type: <class 'torch.Tensor'>
Expected: A torch.Tensor with values [1, 3, 0]

2. Inverse Lookup (numbers to strings)
Input (torch.Tensor): tensor([1, 3, 0])
Output: ['a', 'c', '[OOV]']
Output Type: <class 'list'>
Expected: A Python list ['a', 'c', '[OOV]']

3. Passing Numpy array for torch backend
 Input (numpy.ndarray): [1 3 0]
Output: ['a', 'c', '[OOV]']
Output Type: <class 'list'>
Expected: A Python list ['a', 'c', '[OOV]']

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @Ma-gi-cian, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue in the Keras StringLookup layer to ensure consistent output types when using the PyTorch backend. The changes refactor the layer's call method to return native PyTorch tensors for forward lookups (string to integer) and standard Python lists for inverse lookups (integer to string). This aligns the layer's behavior with PyTorch's conventions and resolves a reported bug, improving compatibility and predictability. A new test case has also been added to validate these changes.

Highlights

  • Consistent PyTorch Output: The StringLookup layer's call method has been refactored to ensure that when the torch backend is active, forward lookups (string-to-integer) consistently return torch.Tensor objects.
  • Python List for Inverse Lookup: For inverse lookups (integer-to-string), the StringLookup layer now returns a standard Python list of strings, mirroring the behavior of torchtext.vocab.lookup_tokens.
  • Improved Input Handling: The call method now explicitly handles torch.Tensor inputs, converting them to NumPy arrays and then to TensorFlow tensors for internal processing, before converting the final output back to the appropriate PyTorch-native type or Python list.
  • New Compatibility Test: A new test case, test_torch_backend_compatibility, has been added to validate the corrected behavior of StringLookup with the PyTorch backend and prevent future regressions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly adjusts the StringLookup layer to return PyTorch-native types when using the torch backend, improving consistency. The changes ensure forward lookups return torch.Tensor and inverse lookups return a Python list, aligning with torchtext behavior. The addition of test_torch_backend_compatibility is great for ensuring this behavior is maintained.

My main feedback is to refactor the call method in string_lookup.py for better readability and to address a minor issue with an unreachable comment. The proposed refactoring simplifies the conditional logic without changing the functionality.

@codecov-commenter
Copy link

codecov-commenter commented Aug 25, 2025

Codecov Report

❌ Patch coverage is 90.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.45%. Comparing base (b9ff57a) to head (3891fb3).

Files with missing lines Patch % Lines
keras/src/layers/preprocessing/string_lookup.py 90.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #21614   +/-   ##
=======================================
  Coverage   82.45%   82.45%           
=======================================
  Files         572      572           
  Lines       57337    57348   +11     
  Branches     8970     8974    +4     
=======================================
+ Hits        47277    47288   +11     
  Misses       7761     7761           
  Partials     2299     2299           
Flag Coverage Δ
keras 82.26% <85.00%> (+<0.01%) ⬆️
keras-jax 63.57% <45.00%> (-0.01%) ⬇️
keras-numpy 57.85% <45.00%> (-0.02%) ⬇️
keras-openvino 34.34% <0.00%> (-0.01%) ⬇️
keras-tensorflow 64.21% <45.00%> (-0.01%) ⬇️
keras-torch 63.79% <75.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StringLookup does not work on torch.Tensor
3 participants