pgxpool and multi tenancy #2384
kevin-matthew
started this conversation in
Ideas
Replies: 1 comment
-
I'm open to improvements in this area. Though we'd probably want a fair amount of discussion before actually doing any coding. A few quick thoughts:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
There has been some issues raised in the past with multi tenancy (where each tenant has their own schema) and its function with pgx pool. But none of them hit the core problems in the head, so I'm starting this discussion to index it.
Justification for multi tenancy
To put this into perspective, I like to avoid schema-based multi tenancy. So any problems that come with it I brush off as inconsequential. However in my latest project, I am faced with the requirement that the tenants should have unrestricted access to their own data (eg, they'll be writting their own sql statements) as well as frequent exports of entire tables. So I cannot used my preferred shared-table pattern (where each row has a tenant ID). This is also going to be a SaaS product so I have to be cost-aware when it comes to hardware. After much research, I reluctantly decided that schema-per-tenant was the best choice.
Problems with pgxpool and tenancy
When you execute a Postgres statement with a connection client, if its a novel sql statement that hasn't been previously been executed (eg
select * from tenant1.mytable where abc=$1
). Postgres has to compile that statement which takes time (See JIT section on postgres documentation). After compilation, subsequent calls to that sql is dramatically faster, even with differing arguments (eg$1
).However, JIT compilations are stored in the connection process, and JIT compilations treat tables of different schemas independently (even if they are the same structure). Thus executing
select * from tenant1.mytable...
requires a separate JIT compilation thanselect * from tenant2.mytable...
. This is where pgxpool falters: if you attempt to use a pool of connections equally across a pool of (thousands) of schemas, each connection is forced to recompile the same sql. This leads me to ask: What if there was a way that the pool will first acquire a connection best suited for the tenant before it tries to acquire a connection that hasn't been used with that tenant?Side note: JIT complications is what I'd consider the most important problem with multi-tenancy and connection pooling. I'm sure there's other things such as caching that I haven't done my research on.
Hash Pooling
There's already lots of rules when it comes to what connection is best to acquired. Most of those rules are dictated by
puddle
. Right now they try their best to make sure all connections are used evenly (eg using thegenstack
approach). My idea is that we extend, so that the following is considered:genstack
). This is assumed to be an expensive operation, so this is why we do it last.Configuration/Reconfiguration will be callbacks. This will mean the caller can dictate what user/schema/search_path whatever they want.
This effectively means that a connection pool has a genstack control for our traditional acquires. But each idle connection has a hash (eg, schema name) that can easily be picked up before an arbitrary connection is used.
I'm working on a solution now. But was wanting to hear some other's thoughts.
Beta Was this translation helpful? Give feedback.
All reactions