EVault Data Store: Envelope Schemes and More #76
Replies: 2 comments 2 replies
-
was conflicted about adding json as a data type into the doc as it might encourage bad practice among devs to store everything as json for good or for worse, but if added, i think json-schema should be enforced on the envelope level and/or the schema engine level |
Beta Was this translation helpful? Give feedback.
-
I have a few comments, but they surely exceed the amount of text I am willing to type here :) But some generic notes:
For example, a tweet starts in a platform's memory as https://github.com/ccrsxx/twitter-clone/blob/main/src/lib/types/tweet.ts then it is mapped to something that adheres to (tentative) https://schema.org/SocialMediaPosting and as such is given to the vault, which stores it in a persistent storage somehow. Similarly for types: we use typescript's integer in the platform, then serialize to string when we pass it through w3adapter, the schema will use one of the types in https://www.w3.org/TR/rdf11-concepts/#xsd-datatypes to describe it, perhaps xsd:nonNegativeInteger. And then the vault will store it as whatever primitive type the DB supports |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
title: EVault Data Store: Envelope Schemes and More
about: A descriptive document about the evault document store and some problems in it as of date.
category: Metastate
tags:
dir: ltr
labels: enhancement
assignees: "@sosweetham"
author: "@sosweetham"
revision: 0.0.1
description: This document aims to list some methodologies/approaches to consider while designing the eVault data stores, such that interoperability can be performed smoothly amongst different post platforms.
Current Model
The current model of the e-vault proposal for the document store aims to implement a completely flat system for the storage of information of any kind, in containers called "envelopes" which would hold the data itself and some metadata around it, they would be addressable via an
id
field attached to them. The initial document coins to them as elementary units, resembling atoms.Envelope model:
The document does not elaborate much on the data model of the envelope further but as the fundamental unit of everything within the EVault, it deserves a little more attention 😄.
There are some upfront problems with the present data model of the envelope within our current model itself which must be addressed.
Problems
Data type primitives have not been defined
The current document leaves up to interpretation the data types that the envelope can support. The logical conclusion for one to assume would be that the post platforms would be able to insert any data type they like but this is unsafe!!! This is because they would then be able to put just about anything into the envelope and thus the user's EVault with no size constraints, imagine a full length 4k quality bee movie in an envelope of an individual with an EVault, which may or may not have been approved by the person on the post platform to be there.
It might be a good idea thus, to decide on a few conventional primitive data types that would be supported inside envelopes within an EVault.
We need to take inspiration from nature here, man can discover, different types of atoms, but they belong to and are made by nature, similarly we need to define the atoms that the developers/users of the post platforms will be able to interact with.
Structured Data Representations
Normally data is stored in relation to one another, this is to organize and segregate them into what is related and what is not, what can exist alone and still make sense and what cannot. Envelopes would exist on a sort of flat organization, i.e. there would be no hierarchy on the user's EVault, makes sense as we are just putting arbitrary data in a single table in a database on an EVault, but, the document also proposes the usage of GraphQL for the pulling and placement of data by post platforms through the web3 adapter, how would our own adapter be able to relate what is related to what data when all it knows is that key -> data and that's it, anything extra is just metadata about that data itself and nothing else.
It thus becomes important to discuss how envelopes would be able to maintain an internal structure to map to each other as well, while remaining in an optimal structure that can be communicated with in a performant manner, considering the w3 adapter of the post platform might be communicating with 100s if not 1000s of EVaults at once.
As defined in the section above, going with the nature analogy, if envelopes are the atoms of the data within the EVault, then they must be able to be made into compounds to be able to be usable by post platforms.
Data observability/brows ability within the EVault by the owning individual
Metastate empowers the users by letting them be the owner of their data by having an EVault. All interactions on the post platform by the individuals round back to their own personal EVault, thus it's important that the user is able to browse their own data which they might have stored on the EVault using any given post platform, it is thus important that the data is browsable in conventional computing terms that are user friendly, as a "file/document/envelope browser" of sorts.
Going with the established analogy of nature, we must provide users with a microscope to be able to analyze their compounds/creations within the Metastate ecosystem.
Type/Schema system
Given the decentralized/federated nature of the Metastate ecosystem, it's important to have an internal type/schema system within the EVaults to keep track of the common data/envelope structures employed within an individual's EVault and that they all interface with the ontology system to ensure correct interoperability with all post platforms and within the user's own EVault browser as well.
Going Further
Luckily the problems with the model defined above are not novel and have been solved in a conventional setting, this document aims to address the problems taking the context of the Metastate ecosystem and it's architecture into account and thus might deviate from conventional implementation approaches at some points, but going further in the document, we are entering a niche territory and thus the document might need fixes/amends. It is recommended to treat this document as an RFC and contributions are welcome.
Defining Primitive Envelopes
This section aims to define some base level envelope
types
, which would dictate what sort ofdata
can reside in it.The envelopes can be divided into 2 inter-dependent types, data-type envelopes and meta-data-type envelopes. While data type envelopes would store the data they declare they hold, meta-data type envelopes would store more like information, describing other envelopes.
An additional envelope type has been discussed further but that's way too ambitious so it has just been touched to get the ball rolling
Data Type Envelopes
Some considerations we should keep when designing these are like, constraints like, maximum size, as we don't want people to be able to store base64 encoded bee movies in their evaults.
Primitive
They simply hold the data they say they hold
Number (Signed/Unsigned Ints/Floats)
We expect post platforms to be developed in a range of different languages and tooling, thus it might be good to think about numbers in an explicitly lower level context to allow services being developed in lower level languages that seem to be gaining popularity (rust) so as to allow for performant post platform development without the EVault architecture being a bottleneck. The following is what we are looking for the most part:
Boolean
As described, can be a true/false value, 0/1.
Character/Character Arrays/Strings
Too common to not be listed separately, we are not not going to make a different envelope for each character anyways. So, as the name implies, I think UTF-8 might be a good enough set to support for the beginning, later we can look to supporting bigger sets that support all the chinese/japanese characters I suppose.
Dates/Time
Dates/Time are a common data type that are often stored in a variety of different formats, locale strings, epoch timestamps etc. It thus might be a good idea to enforce a single recommended best standard for storage of
Blobs (URL)
I propose that the EVault shouldn't store blobs by itself but rather links to CDNs and other storage buckets that can deliver the data as we expect every individual to have their own EVault and CDNs already employ data hashing to ensure that the file is unique and not replicated from an already existing. Also given that it's blob, it's kind of unsafe as the post platforms would in essence be allowed to store anything.
Reference
This is a novel data-type to the system, it's a simple reference to another document that may or may not be held in this EVault.
Compound
Compound data types don hold data directly, rather references to other primitive and compound data types.
An important thing to note here is that, a compound data type envelope and it's children can only reference other compound data type envelopes and cannot reference each other, as that would circularize dependencies when we try to construct an object from the said envelopes, thus review the following figures:
✅ Correct
❌ Incorrect
However, from a prior defined primitive data type,
reference
this can be achieved. Serializers and Deserializers can consider this node as the end of the tree, but during runtime, it can be computed as a link back to the reference it makes.✅ Correct Circular Reference Example
Some considerations to make when designing this would be, how far we allow an object to be nested under these structures.
Objects/Dictionaries
Basically a key-value store, I imagine in an EVault context, the key could be anything but the value would always be the id of an envelope as a best practice.
Records
Basically a k-v where k and v types are pre-defined.
Arrays/Tuples/Vectors
They are different but for simplification and to apply some common rules on them over this document, I have grouped them together, basically an object with an auto increment key. I imagine it would just be [@<
w3id
>,@<w3id
>,@<w3id
>,...]MetaData Type Envelopes
Holds data that describes data.
Type/Interface Definition Envelopes
As the name implies, these envelopes would hold serialized data about
type
orinterfaces
that the evault may be using, later on data types like arrays and/or objects may reference these envelopes to declare their own types.Enumeration Envelopes
It has become sort of modern convention to store common data as something that can be understood at a glance in a declarative fashion rather than as a magic number, this special envelope type would basically hold a pre declared array of strings that other envelopes would be able to reference.
Functional/Method Envelopes (FUTURE SCOPE 🚀)
This is way more ambitious and probably very future scope, but basically this envelope type would act upon certain data types that are supplied to it, it would work using a pre-decided programming language and deserialize the envelope(s) provided to it and return a new envelope and/or modify an existing envelope after serializing it back.
This would help increase data security within the EVault for data that needs to be transformed in a secure context and cannot be delegated to a post platform.
Post Platforms might be able to verify the hash of a function envelope to ensure that it does what it advertises it does, and/or employ external APIs to validate the data returned.
Internal Schema Engine
Just as Post Platforms would have their W3Adapters and Ontology, EVaults would have their own internal schema engine powered by the ontology, that the user of the EVault would be able to tweak if they need to, although it is expected that only the most technically sound would want to do it. This should not be very different from the w3adapter itself, so I think if designed correctly, we'd be able to port most stuff over from the post platform side w3adapter itself.
EVault Browser/Internal Registry
Right now, the user has an external dependency on post platforms to be able to view their own content on their EVault, if the said post platform goes down, they would have no user friendly way to browse the content that they have posted using that post platform.
It is thus, necessary that a generic methodology be developed and implemented that the user is able to create, manage and delete the content in their envelopes without reliance on the post platform.
Envelopes
Primitive data types can be viewable as straight up data files, for example an envelope containing content for a post at post platform A, could just be visible as a simple text file. A blob envelope pointing to an image file could just be visible as the image itself. the name of the file can be it's w3id itself unless further metadata is declared.
Folder Envelopes
Can be applied for object types, where the key can either be the name of a folder or a file, folder in case if it points to an array or another object, file in case it maps straight to a primitive.
Symlink Concept
For content that references external envelopes that may or may not be accessible, a sort of shortcut/symlink from
windows/*nix
system maybe utilized, it would show an error if the envelope is inaccessible or take the user to the exact envelope. It would be preferable to use the reference data type for this.Impact on Current System
The proposal introduces concepts like hierarchal data which further, let's users browse their data in a familiar folder structure format and allows post platforms to be able to store data together, which is efficient as it's been said in data management and architecture, "Data that stays together, works together."
Through the Internal Schema Engine we also develop over the pre-proposed scheme of using GraphQL, the internal schema engine component would actually be able to piece the data together to send to the platform, this also offsets processing from just the Post Platform to the person in question's EVault. If the data is accessed a lot, in say, for example the person in question is a journalist, providers would be able to set custom bandwidths per EVault, reducing the over all load on all the EVaults with this smart distribution of load, brought upon by this idea and also reducing the W3Adapter load on the Post Platform's end, as it would be sent a pre-modeled structure rather than something flat, which it would have to structure itself, rather all this effort would be reduced to the ontology renaming the common fields, and making the whole communication between the services type-safe to some extent to, as earlier only post platform was aware of the structure of the data, but with the ideas in this document, the EVault is as well.
Beta Was this translation helpful? Give feedback.
All reactions