-
Notifications
You must be signed in to change notification settings - Fork 3
Autodect parameters from dataset -- add dataset accessor #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@DocOtak What do you think of the API (I haven't written the doc yet)? To use the option one can simply give a dataset as keyword argument, i.e. |
Many arguments don't have a cf standard name. It is possible to add custom criteria in cf-xarray, so we could write our own standard names (and open a request to add them in the cf-convention) https://cf-xarray.readthedocs.io/en/latest/custom-criteria.html. I would be in favor of the 1st option (our own criteria), this will make things way easier for #30 (regarding the solving of the graph) |
@rcaneill apologies for delayed responses, I have been on a long holiday and have been attempting to actually "relax"... I'll be back "full time" next week. I'm not sure about the Elegant data selection is a hard problem faced by my seagoing group at Scripps... the functions that I have seen which take entire pandas dataframes are usually very messy. |
I see your point. Would you then not be in favor of having this kind of detection, or is it more an API problem (that we could replace by something like |
Hello @DocOtak , I'm pinging you about my last question :) |
Hi @rcaneill, greetings from pre cruise quarantine in Tahiti. I think that we should avoid changing the function signature of the wrapped upstream functions and instead rely on the "accessor" pattern as you suggest: I like the ds.gsw.sigma0(SP="psal1") Where psal1 is one of many practical salinity variables we have in hypothetical dataset ds. The func then calculates SA using this psal variable as an input for sigma0. |
I like your idea of using function call to use variable selection. That would easily tackle the issue of setting which variable to use when multiple fields are present with the same standard name. Enjoy your future field work :) |
I had some (hopefully good) idea while cycling to university today:
ds.gsw.defaults['sea_water_practical_salinity'] = 'psal1'
|
At commit e81063c we have a working version of the accessor! (we can't use getitem with lists, and there is no default option yet) |
(We can now use lists in getitem) |
(cf #57 (comment)) Roadmap of this PR:
Optional:
|
This comment was marked as resolved.
This comment was marked as resolved.
About this, it is issue #30 that I decided to split into 2 different PR. The 1st one (this PR) to create the accessor and possibility to use the standard names, and a 2nd one to have the autodiscovery (to be done, but I had a good draft of the algo here https://gist.github.com/rcaneill/0aa8b9e72112d079c4919e462a4bb378) |
@DocOtak I use an internal function from cf_xarray for performance, I opened an issue there: xarray-contrib/cf-xarray#465 |
Had an answer and problem solved! |
Oh awesome, in that gist. I'm not familiar with that graph library. Is it able to tell you if cycles or loops exist in the graph? |
Hm I don't remember but I guess that yes. As far as I remember (I did this notebook 1 year ago), in the end the easiest was not to find a path within the graph with one classical algorithm but simply brute force to find all possible variables that one can compute with a certain set of inputs. |
I played with the latest commits, I really like it and think this is the right track. There are two things I noticed:
|
I think that the API is stable if using the accessor + providing the name of each variable inside the dataset. If only relying on standard names (but practical salinity, as 2 standard names are used so it may raise issues) I think it is also robust. And the mixture of standard and non standard names using the options manager may need more focus. I tried to write good tests, but maybe we could write some more. If you have time to document our intent, I'm pleased if you do so. I can try to write some more tests + resolve your comment about modules |
This comment is to list all tests with should have for the accessor. Feel free to add more configurations if needed. I'll try to implement them tomorrow: Basic tests
More advanced
With Pint
These tests should be done with the method |
|
Any reason to not merge (so we can not let this prevent forward movement)? |
I think it was ready, I was waiting for your review but it seems that I forgot to ask you... |
We can do any cleanups in future PRs I think, tests are passing and I don't think we have broken any currently used public interfaces. Maybe we could mark the "auto detector" as experimental. I also suspect a refactor of things might be worth it eventually, there are some deep indentations that are indicative of a "code smell" but I am not worried about that for now. |
This PR makes possible to give a dataset as argument to gsw functions, and based on cf standard name the cf_xarray wrapper will pick the proper arguments. Example: