Skip to content

Conversation

@galkleinman
Copy link
Contributor

@galkleinman galkleinman commented Aug 13, 2025

Important

Update llama-index dependencies and adjust tests for API changes in opentelemetry-instrumentation-llamaindex.

  • Dependencies:
    • Update llama-index to ^0.12.52 in pyproject.toml for both opentelemetry-instrumentation-llamaindex and sample-app.
    • Update llama-index-agent-openai to ^0.4.12 and llama-index-llms-openai to ^0.4.0.
  • Tests:
    • Add assert_message_in_logs() function in test_agents.py for log validation.
    • Rename span names in test_agents.py and test_chroma_vector_store.py to reflect API changes (e.g., AgentRunner.workflow to ReActAgent.workflow).
    • Skip tests in test_agents.py due to llama-index API changes in version 0.13.1.

This description was created by Ellipsis for 42eb5bb. You can customize this summary. It will automatically update as commits are pushed.

Summary by CodeRabbit

  • New Compatibility

    • Updated to latest LlamaIndex and related OpenAI packages, ensuring compatibility with recent agent workflow changes.
  • Chores

    • Bumped dependencies across instrumentation and sample app; added OpenAI LLMs and embeddings packages.
  • Tests

    • Adjusted expectations to new span names and events.
    • Added a helper for validating GenAI log entries.
    • Skipped several agent-event tests pending migration to the new workflow.
    • Removed an outdated query pipeline test.
    • Expanded coverage to include additional Chroma query spans.

@coderabbitai
Copy link

coderabbitai bot commented Aug 13, 2025

Walkthrough

Dependency versions for llama-index and related packages were updated. Tests were revised to reflect new span names and control flow (ReActAgent, CompactAndRefine, openai.assistant.run), with several agent-related tests now skipped. Chroma vector store test expectations were adjusted. A query pipeline integration test was removed. Sample app dependencies were aligned.

Changes

Cohort / File(s) Summary
Dependency updates
packages/opentelemetry-instrumentation-llamaindex/pyproject.toml, packages/sample-app/pyproject.toml
Bump llama-index and related packages; add llama-index-llms-openai and embeddings deps; align versions; no code changes.
Agent tests migration
packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py
Update assertions to ReActAgent.* spans, CompactAndRefine, openai.assistant.run; add helper assert_message_in_logs; add skip markers for multiple event-content tests pending workflow updates.
Chroma vector store spans
packages/opentelemetry-instrumentation-llamaindex/tests/test_chroma_vector_store.py
Rename expected spans to RetrieverQueryEngine.workflow, CompactAndRefine.task, DefaultRefineProgram.task; include chroma.query and chroma.query.segment._query.
Remove query pipeline test
packages/opentelemetry-instrumentation-llamaindex/tests/test_query_pipeline.py
Delete integration test validating QueryPipeline spans and OpenAI chat attributes.

Sequence Diagram(s)

sequenceDiagram
  participant Test as Tests
  participant Agent as ReActAgent
  participant OAAsst as openai.assistant.run
  participant QE as QueryEngine/Retriever
  participant CR as CompactAndRefine
  participant OAChat as openai.chat
  participant VS as VectorStore (chroma)

  Test->>Agent: invoke()
  Agent->>OAAsst: run
  Agent->>QE: retrieve
  QE->>VS: query
  VS-->>QE: results
  Agent->>CR: refine/summarize
  CR->>OAChat: chat completions
  OAChat-->>CR: response
  CR-->>Agent: refined output
  Agent-->>Test: final answer
Loading
sequenceDiagram
  participant Test as Tests
  participant RQE as RetrieverQueryEngine
  participant Chroma as chroma
  participant CR as CompactAndRefine
  participant DRP as DefaultRefineProgram
  participant OAChat as openai.chat

  Test->>RQE: query()
  RQE->>Chroma: chroma.query
  Chroma-->>RQE: segments
  RQE->>CR: synthesize
  CR->>DRP: refine
  DRP->>OAChat: completion
  OAChat-->>DRP: text
  DRP-->>CR: refined
  CR-->>Test: answer
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • nirga

Poem

A twitch of whiskers, spans in flight,
ReAct hops where runners might,
Chroma burrows, querying deep,
Compact, Refine—no time to sleep!
Pip bumps tall as clover grows—
Trace carrots lined in tidy rows.
Thump! The pipeline’s gone—onward it goes. 🥕🐇

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch gk/llama-index-0_13_1

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@galkleinman galkleinman changed the title chore(deps): llama-index 0.13.1 chore(deps): llama-index 0.12.52 Aug 13, 2025
@galkleinman galkleinman marked this pull request as ready for review August 13, 2025 13:28
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 42eb5bb in 1 minute and 28 seconds. Click for details.
  • Reviewed 982 lines of code in 5 files
  • Skipped 2 files when reviewing.
  • Skipped posting 7 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-llamaindex/pyproject.toml:44
  • Draft comment:
    Bump llama-index dependency from ^0.12.6 to ^0.12.52, update llama-index-agent-openai from ^0.4.1 to ^0.4.12, and add llama-index-llms-openai (^0.4.0). Ensure these upgrades are fully compatible with the instrumentation code and do not introduce breaking API changes.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is about dependency changes, which are not supposed to be commented on according to the rules. It also asks the author to ensure compatibility, which is not allowed.
2. packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py:71
  • Draft comment:
    Updated expected span names in test_agents_and_tools to use 'ReActAgent.workflow' and 'ReActAgent.task' per the new API. Verify that these names consistently match the output from the updated llama-index version.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to verify that the updated span names match the output from the updated llama-index version. This falls under asking the author to ensure behavior is intended or to double-check things, which is against the rules.
3. packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py:148
  • Draft comment:
    Tests for events with content/no content are now skipped due to changes in the Agent API (llama-index 0.13.1). Remember to update these tests when the new workflow-based agents are fully supported.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, reminding the author to update tests in the future. It doesn't provide a specific suggestion or ask for a specific action related to the current PR.
4. packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py:196
  • Draft comment:
    In test_agent_with_query_tool, expected span names have been updated (e.g. from 'BaseSynthesizer.task' to 'CompactAndRefine.task'). Ensure that these new names correctly reflect the updated synthesizer tasks in the latest llama-index.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to ensure that the new span names correctly reflect the updated synthesizer tasks. This falls under asking the author to ensure the behavior is intended, which is against the rules. Therefore, this comment should be removed.
5. packages/opentelemetry-instrumentation-llamaindex/tests/test_chroma_vector_store.py:38
  • Draft comment:
    Expected span names in the Chroma vector store test have been revised (e.g. 'BaseQueryEngine.workflow' to 'RetrieverQueryEngine.workflow' and 'BaseSynthesizer.task' to 'CompactAndRefine.task'). Confirm these match the behavior of the updated llama-index.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to confirm that the changes match the behavior of an updated library, which is not allowed. It doesn't provide a specific suggestion or point out a clear issue with the code.
6. packages/sample-app/pyproject.toml:30
  • Draft comment:
    Updated llama-index and related dependencies in sample-app (e.g. llama-index to ^0.12.52, llama-index-embeddings-openai and llama-index-agent-openai to newer versions, and llama-index-llms-openai bumped to ^0.4.0). Verify compatibility with the rest of the app and dependency graph.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is asking the PR author to verify compatibility with the rest of the app and dependency graph, which is against the rules. It is not making a specific suggestion or pointing out a specific issue with the code.
7. packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py:60
  • Draft comment:
    Typo/grammar suggestion: The docstring 'Multiply two integers and returns the result integer' might read better as 'Multiply two integers and return the resulting integer'.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_WJ5WzAOrwi3fayT3

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@doronkopit5 doronkopit5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope you didn't break anything in the test removal although i get the deprecation and breaking changes

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py (1)

326-329: Don’t rely on the order of ReActAgentWorker.task spans; select by attributes.

Destructuring assumes a fixed order, which can change. Select the tool spans by tool.name instead.

Apply this diff:

-    _, sql_tool_span, calc_tool_span, _, _ = [
-        span for span in spans if span.name == "ReActAgentWorker.task"
-    ]
+    react_worker_spans = [s for s in spans if s.name == "ReActAgentWorker.task"]
+    sql_tool_span = next(s for s in react_worker_spans if s.attributes.get("tool.name") == "sql_tool")
+    calc_tool_span = next(s for s in react_worker_spans if s.attributes.get("tool.name") == "calc_tool")
🧹 Nitpick comments (7)
packages/opentelemetry-instrumentation-llamaindex/tests/test_chroma_vector_store.py (1)

55-55: Make selection of the target openai.chat span deterministic.

There can be multiple openai.chat spans in this flow. Picking the “first” via next(...) can be order-dependent. Sort by start_time to stabilize the test.

Apply this diff:

-    llm_span = next(span for span in spans if span.name == "openai.chat")
+    llm_span = sorted(
+        [s for s in spans if s.name == "openai.chat"],
+        key=lambda s: s.start_time,
+    )[0]
packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py (6)

46-55: Avoid duplicating assert_message_in_logs across repos; centralize a shared test util.

Similar helpers exist in other packages (ollama/langchain). Consider moving this to a shared test utility (e.g., tests/utils.py) and reusing it to reduce duplication.

For example, create tests/utils.py with a generalized helper and import here:

# tests/utils.py
from opentelemetry.sdk._logs import LogData

def assert_message_in_logs(log: LogData, event_name: str, system_name: str, expected_content: dict):
    from opentelemetry.semconv._incubating.attributes import event_attributes as EventAttributes
    from opentelemetry.semconv._incubating.attributes import gen_ai_attributes as GenAIAttributes

    assert log.log_record.attributes.get(EventAttributes.EVENT_NAME) == event_name
    assert log.log_record.attributes.get(GenAIAttributes.GEN_AI_SYSTEM) == system_name
    if not expected_content:
        assert not log.log_record.body
    else:
        assert log.log_record.body
        assert dict(log.log_record.body) == expected_content

Then here:

from tests.utils import assert_message_in_logs  # system_name="llamaindex" at call sites

71-78: Reduce brittleness: allow extra spans while asserting required ones.

As instrumentation evolves, additional spans may be added. Using equality will make the test fragile. Prefer subset checks.

Apply this diff:

-    assert {
-        "ReActAgent.workflow",
-        "ReActAgent.task",
-        "FunctionTool.task",
-        "openai.chat",
-        "ReActOutputParser.task",
-        "ReActAgentWorker.task",
-    } == {span.name for span in spans}
+    expected_names = {
+        "ReActAgent.workflow",
+        "ReActAgent.task",
+        "FunctionTool.task",
+        "openai.chat",
+        "ReActOutputParser.task",
+        "ReActAgentWorker.task",
+    }
+    assert expected_names.issubset({span.name for span in spans})

86-87: Stabilize ordering of openai.chat spans.

Order of finished spans is not guaranteed. Sort by start_time to consistently identify the first and second LLM calls.

Apply this diff:

-    llm_span_1, llm_span_2 = [span for span in spans if span.name == "openai.chat"]
+    llm_spans = sorted(
+        [span for span in spans if span.name == "openai.chat"],
+        key=lambda s: s.start_time,
+    )
+    llm_span_1, llm_span_2 = llm_spans

208-211: Stabilize ordering of openai.chat spans (assistant test).

Same reasoning as above — ensure deterministic selection across environments.

Apply this diff:

-    llm_span_1, llm_span_2 = [span for span in spans if span.name == "openai.chat"]
+    llm_spans = sorted(
+        [span for span in spans if span.name == "openai.chat"],
+        key=lambda s: s.start_time,
+    )
+    llm_span_1, llm_span_2 = llm_spans

310-323: Consider subset instead of strict equality to future-proof trace assertions.

Same rationale as earlier equality check; allow additional spans in future versions while still asserting the critical ones.

Apply this diff:

-    assert {
-        "ReActAgent.workflow",
-        "ReActAgent.task",
-        "NLSQLTableQueryEngine.task",
-        "Cohere.task",
-        "CompactAndRefine.task",
-        "DefaultRefineProgram.task",
-        "DefaultSQLParser.task",
-        "FunctionTool.task",
-        "QueryEngineTool.task",
-        "ReActAgentWorker.task",
-        "ReActOutputParser.task",
-        "TokenTextSplitter.task",
-    } == {span.name for span in spans}
+    expected_names = {
+        "ReActAgent.workflow",
+        "ReActAgent.task",
+        "NLSQLTableQueryEngine.task",
+        "Cohere.task",
+        "CompactAndRefine.task",
+        "DefaultRefineProgram.task",
+        "DefaultSQLParser.task",
+        "FunctionTool.task",
+        "QueryEngineTool.task",
+        "ReActAgentWorker.task",
+        "ReActOutputParser.task",
+        "TokenTextSplitter.task",
+    }
+    assert expected_names.issubset({span.name for span in spans})

148-155: Skips are fine as a temporary measure; add TODOs with issue links for migration.

If these are intended placeholders for 0.13.x workflow migrations, add TODOs and reference a tracking issue to avoid bit-rot.

Also applies to: 157-164, 249-256, 258-265, 349-356, 358-365

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f0cfab and 42eb5bb.

⛔ Files ignored due to path filters (2)
  • packages/opentelemetry-instrumentation-llamaindex/poetry.lock is excluded by !**/*.lock
  • packages/sample-app/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • packages/opentelemetry-instrumentation-llamaindex/pyproject.toml (1 hunks)
  • packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py (7 hunks)
  • packages/opentelemetry-instrumentation-llamaindex/tests/test_chroma_vector_store.py (2 hunks)
  • packages/opentelemetry-instrumentation-llamaindex/tests/test_query_pipeline.py (0 hunks)
  • packages/sample-app/pyproject.toml (2 hunks)
💤 Files with no reviewable changes (1)
  • packages/opentelemetry-instrumentation-llamaindex/tests/test_query_pipeline.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py (4)
packages/opentelemetry-instrumentation-langchain/tests/test_lcel.py (1)
  • assert_message_in_logs (1005-1013)
packages/opentelemetry-instrumentation-ollama/tests/test_embeddings.py (1)
  • assert_message_in_logs (112-120)
packages/opentelemetry-instrumentation-ollama/tests/test_chat.py (1)
  • assert_message_in_logs (823-831)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • EventAttributes (267-286)
🔇 Additional comments (5)
packages/opentelemetry-instrumentation-llamaindex/pyproject.toml (1)

44-54: Dependency bumps look consistent; verify cross-package compatibility matrix.

  • llama-index ^0.12.52 together with:
    • llama-index-agent-openai ^0.4.12
    • llama-index-llms-openai ^0.4.0
    • openai ^1.52.2
      …is a sensible combo for the updated tests (including assistant.run). The caret range for 0.12 prevents accidental 0.13 upgrades.

Action: please double-check locally that:

  • openai.assistant.run spans are emitted by opentelemetry-instrumentation-openai at the specified OpenAI SDK version.
  • The agent API usage in tests doesn’t implicitly rely on 0.13.x-only behaviors.

If these pass in CI, we’re good.

packages/sample-app/pyproject.toml (1)

30-30: LGTM; ensure app imports align with the modularized packages.

With the move to:

  • llama-index ^0.12.52
  • llama-index-embeddings-openai ^0.3.1
  • llama-index-agent-openai ^0.4.12
  • llama-index-llms-openai ^0.4.0

Confirm the sample app code uses the new module paths:

  • from llama_index.embeddings.openai import OpenAIEmbedding
  • from llama_index.llms.openai import OpenAI

This avoids import errors post-modularization.

Also applies to: 46-47, 55-55

packages/opentelemetry-instrumentation-llamaindex/tests/test_chroma_vector_store.py (2)

39-41: Span name updates match LlamaIndex 0.12.x refactors.

Replacing BaseQueryEngine.workflow/BaseSynthesizer.task/LLM.task with RetrieverQueryEngine.workflow/CompactAndRefine.task/DefaultRefineProgram.task is correct for the newer workflow trace structure.


50-54: Span lookups updated appropriately.

Using RetrieverQueryEngine.workflow for the root and CompactAndRefine.task for synthesis aligns with the new trace graph.

packages/opentelemetry-instrumentation-llamaindex/tests/test_agents.py (1)

196-203: Good: span expectations reflect assistant + workflow integration.

Checking for CompactAndRefine.task and openai.assistant.run alongside openai.chat captures the updated control flow for the Assistant agent.

@galkleinman
Copy link
Contributor Author

Hope you didn't break anything in the test removal although i get the deprecation and breaking changes

Started by updating to llama-index 0.13.1, and QueryPipeline has been deprecated in 0.13.0 (release notes part attached) but ended up updating to 0.12.52 due to dependency conflict within llama-index itself, where it actually exists, but i think i'll remove it anyway.

breaking: removed deprecated QueryPipeline class and all associated code (run-llama/llama_index#19554)

@galkleinman galkleinman merged commit 1c455b6 into main Aug 13, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants