-
Notifications
You must be signed in to change notification settings - Fork 37
Attempt at implementation of VarNamedVector (Metadata alternative)
#555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Pull Request Test Coverage Report for Build 11218828080Details
💛 - Coveralls |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #555 +/- ##
==========================================
+ Coverage 77.69% 79.02% +1.33%
==========================================
Files 29 30 +1
Lines 3591 4201 +610
==========================================
+ Hits 2790 3320 +530
- Misses 801 881 +80 ☔ View full report in Codecov by Sentry. |
|
Replied to the comments @mhauru , only very minor changes suggested, aaaand then I think we're good to maybe hit that big green button 👀 |
|
@torfjelde, I think I addressed all of them. Note that the procedure for merging should be
|
|
I unfortuantely can't approve this because I'm still technically the "owner" of the PR, but this is looking good to me 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the default metadata backend to Metadata again. VarNamedVector will be in master, but not, by default, used by anything.
I think this is all ready to go. If anyone wants to put in any last minute comments, now is the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad to see this finally gone through!
|
Thanks @torfjelde for doing the bulk of the work for this, and everyone, but especially @willtebbutt and @yebai, for great feedback! |
This is an attempt at a implementation similar to what was discussed in #528 (comment).
The idea is to introduce a simpler alternative to
Metadata, which contains a fair bit of not-so-useful-anymore functionality and isn't particularly accessible.With the current state of this PR, it's possible to evaluate a model with a
VarNameVector:But it's not possible to sample.
What are we trying to achieve here?
The idea behind
VarNameVectoris to enable both an efficient and convenient representation of avarinfo, i.e. an efficient representation of a realization from a@modelwith additional information necessary for both a sampler developer and end-user to extract what they need.In that vein, it seems like we need something that sort of acts like a
Vector, and sort of acts like anOrderedDict, but is neither (at least, I don't think it fits in either).Vectorbecause we need:Vectorrepresentation).forloop over variablesx[i]in a model, we want the values forxto be stored in a contiguous chunk of memory).OrderedDictbecause we need:VarName, e.g.trace[vn]should result in the realization in it's "constrained" space (i.e. in the space that the distribution of which it came from commonly works in).Moreover, the above should be achieved while ensuring that we allow both mutable and immutable implementations, in addition to maintaining type-stability whenever possible and falling back to type unstable when not.
Current implementation:
VarInfowithMetadataThe current implementation of
VarInfowithMetadataas it's underlying storage achieves some of these properties through a few different means:getindex(varinfo, varname), is achieved by effectively grouping thevarnamescontained in aVarInfoby their symbol (which is part of the type), and putting each group in a separateMetadatain aNamedTuple. Basically, instead of representing pairs like(@varname(x) => 1.0, @varname(y) => 2.0)in a flattened way, we represent them as(x = (@varname(x) => 1.0,), y = (@varname(y) => 2.0,)), such that in the scenario where they different in types, e.g.y isa Int, we can dispatch on thesyminVarName{sym}to determine which entry of theNamedTupleto extract.xis continuous whileyis discrete. It also provides very straight-forward guidelines on how to speed up models, i.e. "group variables into a single higher-variate random variable, e.g.x[1] ~ Normal(); x[2] ~ Normal()intox ~ fill(Normal(), 2)".varinfoto get type-information, and then use this for the subsequent runs to obtain type-stability (and thus much improved performance).VarInfois somewhat lacking, e.g. it does not support changing support even if the types stay the same,varinfo[vn], but is currently a) limited, and b) very confusing.VarInfoimplements a somewhat confusing subset of bothAbstractDictandAbstractVectorinterfaces, but neither of which are "complete" in any sense.VarInfoandMetadata, even thoughVarInfois effectively just a wrapper aroundMetadata/NamedTuplecontainingMetadata, which adds further confusion.Metadatacomes with a lot of additional fields and information that we have slowly been moving away from using, as the resulting codebase ends up being overly complicated and difficult to work with. This also significantly reduces the utility ofVarInfofor the end-user.Replacing
MetadatawithVarNameVectorVarNameVectoris an attempt at replacingMetadatawhile preserving some of the good ideas fromVarInfowithMetadata(and in the process, establish a bare-minimum of functionality required to implement a underlying storage forvarinfo):VarNameVector(should) implement (almost) all operations forAbstractDict+ all non-linear-algebra operations for aAbstractVector{<:Real}, to the point where a user will find it easy to useVarNameVector(and thus aVarInfowrapping aVarNameVector).AbstractDictinterface is implemented as if keys are of typeVarNameand values are of type corresponding tovarinfo[varname].AbstractVectorinterface is implemented wrt. underlying "raw" storage, e.g. if we're working with a unconstrained representation, then the vector will be in unconstrained space (unlikevarinfo[varname]which will be in constrained space).VarNameVectoruses a contiguous chunk of memory to store the values in a flattened manner, both in the type stable and unstable scenarios, which overall should lead to better performance in both scenarios.VarNameVectorreduces the number of fields + replaces some now-less-useful fields such asdistwith fields requiring less information such astransformss(holding the transformations used to map from "unconstrained" to "constrained" representation).Updates
2023-11-13
Okay, so I've made some new additions:
VarNameVector.push!andupdate!aVarNameVectorwhere:push!means what it usually means, but the varname / key must be unique.update!works with either existing keys or new keys.push!&update!The idea behind the implementation is as follows:
varname,push!(andupdate!, which is equivalent in this scenario), is straight-forward: we just callpush!andsetindex!on all the necessary underlying containers, e.g.ranges.varname, things become a bit more complicated for a few fields, namelyrangesandvalues(everthing else is just call tosetindex!):valuesalready allocated tovarname.valuestovarnameand then shift all thevaluesoccuring after this part; doing this on everyupdate!will be expensive! Instead, we just move the new value to the end ofvaluesand mark the old location as "inactive". This leads to much more efficientupdate!, and then we can just perform a "sweep" to re-contiguify the underlying storage every now and then.To make things a bit more concrete, consider the following example:
Then we update the entry for
@varname(x)to a differently sized value:Notice that the order is still preserved, even though the underlying
rangesis no longer ordered.But, if we inspect the underlying values, this contains now-inactive entries:
But in the scenario where we care about performance, we can easily fix this:
Type-stable sampling for dynamic suppport
The result of this is that we can even perform type-stable sampling for models with changing support:
The really nice thing here is that, unlike with
TypedVarInfo, we don't need to mess around with boolean flags to indicate whether something should be resampled, etc. Instead we just callsimilaron theVarNameVectorandpush!onto this.We can also make this work nicely with
TypedVarInfo:)2024-01-26T15:00:38
Okay, so now all tests should be passing and we should have, effectively, feature parity with
VarInfousingMetadata.But this required quite a lot of code, and there are a few annoyances that are worth pointing out / discussing:
Transformations are (still) a pain
My original idea was that we really did not want to attach the entire
Distributionfrom which the random variable came from to the metadata for a couple of reasons:VarInfofromModelAinModelB, even though they only differ by the RHS of a single~statement).Distributionto determine the transformation from the vectorized / flattened representation to the original one we want and linking.In fact, the above is only partially true:
getindexinside amodelevaluation actually usesgetindex(vi, vn, dist)and uses the "tilde-local"distfor the reshaping and linking, not thedistpresent invi.Hence it seemed to me that one immediate improvement of
Metadatais to remove thedistcompletely, and instead just store the transformation from the vectorized / flattened representation to whatever desired form we want (be that linked or invlinked).And this is what we do currently with
VarNameVector.This then "resolves" (1) since now all we need is a map
fthat takes a vector and outputs something we can work with.However, (2) is still an issue: we still need the "tilde-local"
distto determine the necessary transformation for the particular realization we're interested in, while simultaenously wantinggetindex(vi, vn)to also function as intended outside of amodel, i.e. I should still be able to doSooo I don't think we can get around having to keep transformations in the metadata object that might not actually get used within a
modelevaluation if we want allow the indexing behavior above 😕2024-01-31: Update on transformations
See #575