Skip to content

Conversation

@k-raina
Copy link
Member

@k-raina k-raina commented Nov 6, 2025

Overview

This PR fixes segmentation fault (SEGV) issues that occur when Kafka client methods are called on uninitialized instances. The fix addresses issue #1590

Problem

When a class is derived from Producer, Consumer, or AdminClient and doesn't call super().__init__(), the internal self->rk pointer in the C code remains NULL. Calling methods on such instances causes segmentation faults because these methods attempt to use self->rk without checking if it's NULL.

Example Crash Scenario

class MyProducer(Producer):
    def __init__(self, config):
        # Missing: super().__init__(config)
        # This leaves self->rk as NULL in the C code
        pass

producer = MyProducer({'bootstrap.servers': 'localhost:9092'})
producer.flush()  # CRASH: rd_kafka_flush(NULL, ...) → SEGV

Solution

Null Checks Added

Added null checks before operations in all affected methods using a consistent pattern:

if (!self->rk) {
    PyErr_SetString(PyExc_RuntimeError, ERR_MSG_*_CLOSED);
    return NULL;  // or goto err; for methods with cleanup
}

The checks are placed:

  • After argument parsing (to avoid unnecessary work)
  • Before any use of self->rk (to prevent NULL dereference)

Changes

Producer (src/confluent_kafka/src/Producer.c)

Added null checks to 7 methods that were missing them:

  • Producer_produce_batch() - Added check before rd_kafka_topic_new()
  • Producer_init_transactions() - Added check before rd_kafka_init_transactions()
  • Producer_begin_transaction() - Added check before rd_kafka_begin_transaction()
  • Producer_send_offsets_to_transaction() - Added check with proper cleanup of cgmd and c_offsets
  • Producer_commit_transaction() - Added check before rd_kafka_commit_transaction()
  • Producer_abort_transaction() - Added check before rd_kafka_abort_transaction()
  • Producer_purge() - Added check before rd_kafka_purge()

Consumer (src/confluent_kafka/src/Consumer.c)

Added null checks to all Consumer methods that were missing them:

  • Consumer_commit()
  • Consumer_committed()
  • Consumer_position()
  • Consumer_seek()
  • Consumer_get_watermark_offsets()
  • Consumer_store_offsets()
  • Consumer_offsets_for_times()
  • Consumer_consumer_group_metadata()
  • And other methods

AdminClient (src/confluent_kafka/src/Admin.c)

Added null checks to all AdminClient methods:

  • Admin_create_topics()
  • Admin_delete_topics()
  • Admin_create_partitions()
  • Admin_describe_configs()
  • Admin_alter_configs()
  • Admin_incremental_alter_configs()
  • Admin_create_acls()
  • Admin_delete_acls()
  • Admin_describe_acls()
  • Admin_list_consumer_groups()
  • Admin_describe_consumer_groups()
  • And all other Admin API methods

Test Coverage

Producer Tests (tests/test_Producer.py)

Consumer Tests (tests/test_Consumer.py)

AdminClient Tests (tests/test_Admin.py)

All tests verify that:

  1. Methods raise RuntimeError instead of crashing
  2. Error messages are consistent and descriptive
  3. No segmentation faults occur

Copilot AI review requested due to automatic review settings November 6, 2025 05:20
@k-raina k-raina requested review from a team and MSeal as code owners November 6, 2025 05:20
@confluent-cla-assistant
Copy link

🎉 All Contributor License Agreements have been signed. Ready to merge.
Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds null checks to prevent segmentation faults when Kafka client methods are called on uninitialized instances (where super().__init__() was not called). The fix addresses issue #1590 by checking if the internal self->rk handle is NULL before operations and raising appropriate RuntimeError exceptions.

Key changes:

  • Added standardized error message constants for closed/uninitialized client states
  • Added NULL checks before operations in Producer, Consumer, and AdminClient methods
  • Added comprehensive test coverage for all affected methods

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/confluent_kafka/src/confluent_kafka.h Defines error message constants for uninitialized/closed client states
src/confluent_kafka/src/Producer.c Adds NULL checks to Producer methods and uses standardized error messages
src/confluent_kafka/src/Consumer.c Adds NULL checks to Consumer methods and uses standardized error messages
src/confluent_kafka/src/Admin.c Adds NULL checks to AdminClient methods and uses standardized error messages
src/confluent_kafka/src/Metadata.c Updates metadata methods to use standardized error message constants
tests/test_Producer.py Adds test coverage for uninitialized Producer instance behavior
tests/test_Consumer.py Adds test coverage for uninitialized Consumer instance behavior
tests/test_Admin.py Adds test coverage for uninitialized AdminClient instance behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 768 to 773
if (!self->rk) {
rd_kafka_consumer_group_metadata_destroy(cgmd);
rd_kafka_topic_partition_list_destroy(c_offsets);
PyErr_SetString(PyExc_RuntimeError, ERR_MSG_PRODUCER_CLOSED);
return NULL;
}
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup of cgmd and c_offsets before the NULL check creates resource management complexity. Consider moving the NULL check before argument parsing to avoid unnecessary resource allocation and cleanup when the producer is uninitialized.

Copilot uses AI. Check for mistakes.
Comment on lines 1535 to 1536


Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra blank line should be removed to maintain consistent spacing with other test functions.

Suggested change

Copilot uses AI. Check for mistakes.
@sonarqube-confluent
Copy link

Quality Gate failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

@k-raina k-raina merged commit 36a9e6c into master Nov 6, 2025
2 of 3 checks passed
@k-raina k-raina deleted the fix-issue-807 branch November 6, 2025 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants