feat: microgen - adds code generation logic #2294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

chalmerlowe merged 19 commits into autogen from feat/adds-code-generation-logic

Sep 26, 2025

Collaborator

chalmerlowe commented Sep 15, 2025 •

edited

Loading

Follows: PR feat: microgen - adds source file gathering functions #2293 (should be merged after that PR is merged.)

Adds code generation logic

Implements functions for generating Centralized Client code, including:

Generating import statements.
Rendering code from Jinja2 templates using data from the source file we previously analyzed.

chalmerlowe added 16 commits

September 11, 2025 12:03


          chore: removes old proof of concept

b9d4a04


          removes old __init__.py

5b4d538


          Adds two utility files to handle basic tasks

132c571


          Adds a configuration file for the microgenerator

90b224e


          Removes unused comment

e071eab


          chore: adds noxfile.py for the microgenerator

dc72a98


          feat: microgen - adds two init file templates

7318f0b


          feat: adds _helpers.py.js template

07910c5


          Updates with two usage examples

dc54c99


          feat: adds two partial templates for creating method signatures

28de5f8


          feat: Add microgenerator __init__.py

c457754

Migrates the empty __init__.py file to the microgenerator package.


          feat: Add AST analysis utilities

595e59f

Introduces the CodeAnalyzer class and helper functions for parsing Python code using the ast module. This provides the foundation for understanding service client structures.


          feat: Add source file analysis capabilities

44a0777

    Implements functions to analyze Python source files, including:
    - Filtering classes and methods based on configuration.
    - Building a schema of request classes and their arguments.
    - Processing service client files to extract relevant information.


          feat: adds code generation logic

3e9ade6


          removes extraneous content

485b9d4


          feat: microgen - adds code generation logic

a4276fe

chalmerlowe requested review from a team as code owners

September 15, 2025 18:00

chalmerlowe requested review from tswast and removed request for a team

September 15, 2025 18:00

product-auto-label bot added the size: m label

blunderbuss-gcf bot assigned suzmue

product-auto-label bot added the api: bigquery label

chalmerlowe mentioned this pull request

feat: microgen - adds main execution and post processing #2295

Merged

1 task

chalmerlowe added this to the µgen PoC milestone

chalmerlowe assigned chalmerlowe and unassigned suzmue

Collaborator Author

chalmerlowe commented Sep 18, 2025

For clarity:

The GitHub Actions are being used to help ensure that unit tests pass.
The KOKORO tests are failing. This is a known problem and will be dealt with in a separate PR. It should not affect merging into the autogen dev branch.

Base automatically changed from feat/adds-source-file-gathering-functions to autogen

September 23, 2025 21:04


          Merge branch 'autogen' into feat/adds-code-generation-logic

889870b

chalmerlowe added the automerge label

tswast reviewed

View reviewed changes

scripts/microgenerator/generate.py Outdated Show resolved Hide resolved

scripts/microgenerator/generate.py Outdated

Comment on lines 528 to 529

    
                      template_path = os.path.join(config_dir, item["template"])

                      output_path = os.path.join(project_root, item["output"])

Contributor

tswast Sep 24, 2025

[nit, optional] The pathlib.Path's / operator is a little less verbose and seems to be the preferred for new code.

Collaborator Author

chalmerlowe Sep 25, 2025

We added Path to this section, but:

right now the function in utils.py that ends up using this wants a string so... we convert the whole concatenated Path object to a str()
Why not just update the utils.py file to take a Path OR a {string, Path}?
We can't run tests without multiple files and edits that are in two or three PRs that have not been merged yet, so I have no confidence that all the stars will align AND I did not want to try and do temporary workarounds to let me test this update. PR #2307 includes some, but not all the necessary changes include tests that are specific to utils.py

I will add an item to the TODO list hosted internally at b/445158219 to ensure that we circle back and clean up the os vs Path situation. I feel like there are prolly a couple other nooks and crannies where Path would be a better long-term solution.

scripts/microgenerator/generate.py

    
                  return f"from {path} import (\n    {names_str}\n)"

              def generate_code(config: Dict[str, Any], analysis_results: tuple) -> None:

Contributor

tswast Sep 24, 2025

A tuple input is a bit difficult to review to determine if the order of the fields is correct. Have you considered using a frozen data class? Or if positional access is required a named tuple?

Collaborator Author

chalmerlowe Sep 25, 2025 •

edited

Loading

When the code first started, we were only passing one item, which became two in a tuple and then three and is now four items.

I agree, it is time to move it to a more robust solution. Not all the parts that will end up being affected by this move are in this PR, so I would much prefer to merge all the outstanding PRs before doing too many changes to logic, etc.

This is all microgenerator code so no customers are gonna see this OR interact with it, just us devs, but there are better approaches that will make our lives easier in the long run.

I will defer this to the TODO list hosted internally at b/445158219 for now.

scripts/microgenerator/generate.py

Comment on lines 533 to 566

    
                      for class_name, methods in data.items():

                          for method_name, method_info in methods.items():

                              context = {

                                  "name": method_name,

                                  "class_name": class_name,

                                  "return_type": method_info["return_type"],

                              }

                              # Infer the request class and find its schema.

                              inferred_request_name = name_utils.method_to_request_class_name(

                                  method_name

                              )

                              # Check for a request class name override in the config.

                              method_overrides = (

                                  config.get("filter", {}).get("methods", {}).get("overrides", {})

                              )

                              if method_name in method_overrides:

                                  inferred_request_name = method_overrides[method_name].get(

                                      "request_class_name", inferred_request_name

                                  )

                              fq_request_name = ""

                              for key in request_arg_schema.keys():

                                  if key.endswith(f".{inferred_request_name}"):

                                      fq_request_name = key

                                      break

                              # If found, augment the method context.

                              if fq_request_name:

                                  context["request_class_full_name"] = fq_request_name

                                  context["request_id_args"] = request_arg_schema[fq_request_name]

                              methods_context.append(context)

Contributor

tswast Sep 24, 2025

With several nested loops and if statements, I'm having some trouble following along today. Maybe worth adding some private helper methods.

Collaborator Author

chalmerlowe Sep 26, 2025

We pulled out two chunks of processing and created two helper functions. which definitely makes the code a bit easier to parse.

I think we might be able to a bit more, but gonna hold off until all the things are merged and working before pushing my luck.

scripts/microgenerator/generate.py

Comment on lines +535 to +539

    
                              context = {

                                  "name": method_name,

                                  "class_name": class_name,

                                  "return_type": method_info["return_type"],

                              }

Contributor

tswast Sep 24, 2025

Thoughts on using a data class for this instead of a dictionary?

Collaborator Author

chalmerlowe Sep 26, 2025

I will look this over and consider whether it should be modified in a future PR. Right now, for an alpha release to see what works and what doesn't, a very small dict is probably a reasonable conveyance in a microgenerator. Also added this to the TODO list for tracking.

gcf-merge-on-green bot removed the automerge label

chalmerlowe added 2 commits

September 25, 2025 19:37


          Adds Path class and changes path to package

f113bde


          refactors a complicated loop into two separate helpers

bb6ba4a

tswast approved these changes

View reviewed changes

scripts/microgenerator/generate.py

Comment on lines +535 to +536

    
                  for key in request_arg_schema.keys():

                      if key.endswith(f".{request_name}"):

Contributor

tswast Sep 26, 2025

[optional] This looks like it'd be a good fit for a trie data structure. https://en.wikipedia.org/wiki/Trie That said, the current dictionary is probably small enough and this is part of code generation, not the user-visible path, so maybe not worth it.

Alternatively, it may be worth it to create a separate dictionary from request_name to fully-qualified name, since this method will be called more than once. That would take us from O(n^2) to O(n) (or possibly O(n log n) since I think Python dictionaries are actually trees not hashmaps.

chalmerlowe merged commit 337342b into autogen

24 checks passed

chalmerlowe deleted the feat/adds-code-generation-logic branch

September 26, 2025 17:55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery size: m