Skip to content

Conversation

@chalmerlowe
Copy link
Collaborator

@chalmerlowe chalmerlowe commented Sep 15, 2025

Adds code generation logic

Implements functions for generating Centralized Client code, including:

  • Generating import statements.
  • Rendering code from Jinja2 templates using data from the source file we previously analyzed.

Migrates the empty __init__.py file to the microgenerator package.
Introduces the CodeAnalyzer class and helper functions for parsing Python code using the ast module. This provides the foundation for understanding service client structures.
    Implements functions to analyze Python source files, including:
    - Filtering classes and methods based on configuration.
    - Building a schema of request classes and their arguments.
    - Processing service client files to extract relevant information.
@chalmerlowe chalmerlowe requested review from a team as code owners September 15, 2025 18:00
@chalmerlowe chalmerlowe requested review from tswast and removed request for a team September 15, 2025 18:00
@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Sep 15, 2025
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Sep 15, 2025
@chalmerlowe chalmerlowe added this to the µgen PoC milestone Sep 16, 2025
@chalmerlowe chalmerlowe assigned chalmerlowe and unassigned suzmue Sep 16, 2025
@chalmerlowe
Copy link
Collaborator Author

For clarity:

  1. The GitHub Actions are being used to help ensure that unit tests pass.
    Screenshot 2025-08-20 at 9 14 08 AM
  2. The KOKORO tests are failing. This is a known problem and will be dealt with in a separate PR. It should not affect merging into the autogen dev branch.
    Screenshot 2025-08-20 at 9 13 45 AM

Base automatically changed from feat/adds-source-file-gathering-functions to autogen September 23, 2025 21:04
@chalmerlowe chalmerlowe added the automerge Merge the pull request once unit tests and other checks pass. label Sep 24, 2025
Comment on lines 528 to 529
template_path = os.path.join(config_dir, item["template"])
output_path = os.path.join(project_root, item["output"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit, optional] The pathlib.Path's / operator is a little less verbose and seems to be the preferred for new code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added Path to this section, but:

  • right now the function in utils.py that ends up using this wants a string so... we convert the whole concatenated Path object to a str()
  • Why not just update the utils.py file to take a Path OR a {string, Path}?
  • We can't run tests without multiple files and edits that are in two or three PRs that have not been merged yet, so I have no confidence that all the stars will align AND I did not want to try and do temporary workarounds to let me test this update. PR #2307 includes some, but not all the necessary changes include tests that are specific to utils.py

I will add an item to the TODO list hosted internally at b/445158219 to ensure that we circle back and clean up the os vs Path situation. I feel like there are prolly a couple other nooks and crannies where Path would be a better long-term solution.

return f"from {path} import (\n {names_str}\n)"


def generate_code(config: Dict[str, Any], analysis_results: tuple) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tuple input is a bit difficult to review to determine if the order of the fields is correct. Have you considered using a frozen data class? Or if positional access is required a named tuple?

Copy link
Collaborator Author

@chalmerlowe chalmerlowe Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the code first started, we were only passing one item, which became two in a tuple and then three and is now four items.

I agree, it is time to move it to a more robust solution. Not all the parts that will end up being affected by this move are in this PR, so I would much prefer to merge all the outstanding PRs before doing too many changes to logic, etc.

This is all microgenerator code so no customers are gonna see this OR interact with it, just us devs, but there are better approaches that will make our lives easier in the long run.

I will defer this to the TODO list hosted internally at b/445158219 for now.

Comment on lines 533 to 566
for class_name, methods in data.items():
for method_name, method_info in methods.items():
context = {
"name": method_name,
"class_name": class_name,
"return_type": method_info["return_type"],
}

# Infer the request class and find its schema.
inferred_request_name = name_utils.method_to_request_class_name(
method_name
)

# Check for a request class name override in the config.
method_overrides = (
config.get("filter", {}).get("methods", {}).get("overrides", {})
)
if method_name in method_overrides:
inferred_request_name = method_overrides[method_name].get(
"request_class_name", inferred_request_name
)

fq_request_name = ""
for key in request_arg_schema.keys():
if key.endswith(f".{inferred_request_name}"):
fq_request_name = key
break

# If found, augment the method context.
if fq_request_name:
context["request_class_full_name"] = fq_request_name
context["request_id_args"] = request_arg_schema[fq_request_name]

methods_context.append(context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With several nested loops and if statements, I'm having some trouble following along today. Maybe worth adding some private helper methods.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We pulled out two chunks of processing and created two helper functions. which definitely makes the code a bit easier to parse.

I think we might be able to a bit more, but gonna hold off until all the things are merged and working before pushing my luck.

Comment on lines +535 to +539
context = {
"name": method_name,
"class_name": class_name,
"return_type": method_info["return_type"],
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on using a data class for this instead of a dictionary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look this over and consider whether it should be modified in a future PR. Right now, for an alpha release to see what works and what doesn't, a very small dict is probably a reasonable conveyance in a microgenerator. Also added this to the TODO list for tracking.

@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Sep 25, 2025
Comment on lines +535 to +536
for key in request_arg_schema.keys():
if key.endswith(f".{request_name}"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] This looks like it'd be a good fit for a trie data structure. https://en.wikipedia.org/wiki/Trie That said, the current dictionary is probably small enough and this is part of code generation, not the user-visible path, so maybe not worth it.

Alternatively, it may be worth it to create a separate dictionary from request_name to fully-qualified name, since this method will be called more than once. That would take us from O(n^2) to O(n) (or possibly O(n log n) since I think Python dictionaries are actually trees not hashmaps.

@chalmerlowe chalmerlowe merged commit 337342b into autogen Sep 26, 2025
24 checks passed
@chalmerlowe chalmerlowe deleted the feat/adds-code-generation-logic branch September 26, 2025 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery API. size: m Pull request size is medium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants