Skip to content

Conversation

@adespawn
Copy link
Contributor

@adespawn adespawn commented Oct 14, 2025

The goal of this PR is to change the current encoding approach (as described in #257) to Approach 1.

Currently, UDTs are not supported because the proper type information conversion has not yet been implemented. UDTs are supported in this PR.

This PR also introduces a change to the prepared execute API of the Rust wrapper. It replaces PreparedStatementWrapper with strings when executing prepared statements during query execution at the endpoint. With this change, we only return expected types for prepared statements, instead of the whole PreparedStatementWrapper object from Rust.
This change was introduced for performance reasons, as it results in an improvement when executing queries
through executeConcurrent, and does not slow down other endpoints.

Testing

This PR was tested locally with Cassandra integration tests, on top of the CI running the Scylla integration tests (see #244)

Encoding issues

#257

By changing the encoder, the following issues would be fixed:

fixes #89
fixes #167
fixes #200
fixes #233
fixes #215 (Type guessing is done inside encoder)

With proper type conversion, this PR fixes #245

Performance

While the core part of this PR (replacing encoders) yields a decent speedup, it is only with the other optimisation that we reach the same speed as the test implementation of the other approach.

Performance of the baseline (main):

➜ sudo perf stat node ./benchmark/logic/concurrent_insert.js scylladb-javascript-driver 2000000

 Performance counter stats for 'node ./benchmark/logic/concurrent_insert.js scylladb-javascript-driver 2000000':

         97,246.47 msec task-clock                       #    2.827 CPUs utilized             
         3,235,539      context-switches                 #   33.272 K/sec                     
           399,293      cpu-migrations                   #    4.106 K/sec                     
           444,992      page-faults                      #    4.576 K/sec                     
   158,132,408,757      cpu_atom/instructions/           #    0.70  insn per cycle              (45.09%)
   441,422,538,978      cpu_core/instructions/           #    1.13  insn per cycle              (54.91%)
   225,794,559,768      cpu_atom/cycles/                 #    2.322 GHz                         (45.09%)
   391,037,324,170      cpu_core/cycles/                 #    4.021 GHz                         (54.91%)
    30,086,164,897      cpu_atom/branches/               #  309.381 M/sec                       (45.09%)
    90,474,152,804      cpu_core/branches/               #  930.359 M/sec                       (54.91%)
       486,036,509      cpu_atom/branch-misses/          #    1.62% of all branches             (45.09%)
       658,875,141      cpu_core/branch-misses/          #    0.73% of all branches             (54.91%)
             TopdownL1 (cpu_atom)                 #     30.0 %  tma_backend_bound        (45.09%)
                                                  #     41.7 %  tma_backend_bound      
                                                  #      6.1 %  tma_bad_speculation    
                                                  #     32.2 %  tma_frontend_bound     
                                                  #     20.1 %  tma_retiring             (54.91%)
             TopdownL1 (cpu_atom)                 #     14.9 %  tma_retiring             (45.09%)
             TopdownL1 (cpu_atom)                 #      7.4 %  tma_bad_speculation    
                                                  #     47.7 %  tma_frontend_bound       (45.09%)

      34.395850085 seconds time elapsed

      57.638440000 seconds user
      33.994388000 seconds sys

Performance for this implementation:

➜ sudo perf stat node ./benchmark/logic/concurrent_insert.js scylladb-javascript-driver 2000000

 Performance counter stats for 'node ./benchmark/logic/concurrent_insert.js scylladb-javascript-driver 2000000':

         89,723.35 msec task-clock                       #    3.182 CPUs utilized             
         3,258,107      context-switches                 #   36.313 K/sec                     
           349,166      cpu-migrations                   #    3.892 K/sec                     
           418,220      page-faults                      #    4.661 K/sec                     
   144,066,525,376      cpu_atom/instructions/           #    0.69  insn per cycle              (48.61%)
   354,393,050,737      cpu_core/instructions/           #    1.00  insn per cycle              (51.39%)
   207,875,282,834      cpu_atom/cycles/                 #    2.317 GHz                         (48.61%)
   352,819,743,579      cpu_core/cycles/                 #    3.932 GHz                         (51.39%)
    27,387,146,881      cpu_atom/branches/               #  305.240 M/sec                       (48.61%)
    73,594,614,458      cpu_core/branches/               #  820.239 M/sec                       (51.39%)
       403,763,266      cpu_atom/branch-misses/          #    1.47% of all branches             (48.61%)
       613,901,763      cpu_core/branch-misses/          #    0.83% of all branches             (51.39%)
             TopdownL1 (cpu_atom)                 #     31.9 %  tma_backend_bound        (48.61%)
                                                  #     40.3 %  tma_backend_bound      
                                                  #      5.9 %  tma_bad_speculation    
                                                  #     35.7 %  tma_frontend_bound     
                                                  #     18.1 %  tma_retiring             (51.39%)
             TopdownL1 (cpu_atom)                 #     14.9 %  tma_retiring             (48.61%)
             TopdownL1 (cpu_atom)                 #      7.1 %  tma_bad_speculation    
                                                  #     46.1 %  tma_frontend_bound       (48.61%)

      28.197915381 seconds time elapsed

      50.147893000 seconds user
      34.252507000 seconds sys

This + Strings instead of Prepared Statement Wrappers:

➜ sudo perf stat node ./benchmark/logic/concurrent_insert.js scylladb-javascript-driver 2000000

 Performance counter stats for 'node ./benchmark/logic/concurrent_insert.js scylladb-javascript-driver 2000000':

         81,705.69 msec task-clock                       #    3.333 CPUs utilized             
         2,540,136      context-switches                 #   31.089 K/sec                     
           274,768      cpu-migrations                   #    3.363 K/sec                     
           418,423      page-faults                      #    5.121 K/sec                     
   138,566,100,653      cpu_atom/instructions/           #    0.71  insn per cycle              (46.94%)
   344,678,249,478      cpu_core/instructions/           #    1.04  insn per cycle              (53.06%)
   195,016,125,911      cpu_atom/cycles/                 #    2.387 GHz                         (46.94%)
   330,478,854,641      cpu_core/cycles/                 #    4.045 GHz                         (53.06%)
    26,103,495,597      cpu_atom/branches/               #  319.482 M/sec                       (46.94%)
    71,403,457,264      cpu_core/branches/               #  873.910 M/sec                       (53.06%)
       357,153,938      cpu_atom/branch-misses/          #    1.37% of all branches             (46.94%)
       522,466,210      cpu_core/branch-misses/          #    0.73% of all branches             (53.06%)
             TopdownL1 (cpu_atom)                 #     33.6 %  tma_backend_bound        (46.94%)
                                                  #     43.3 %  tma_backend_bound      
                                                  #      5.6 %  tma_bad_speculation    
                                                  #     30.5 %  tma_frontend_bound     
                                                  #     20.6 %  tma_retiring             (53.06%)
             TopdownL1 (cpu_atom)                 #     15.2 %  tma_retiring             (46.94%)
             TopdownL1 (cpu_atom)                 #      6.7 %  tma_bad_speculation    
                                                  #     44.5 %  tma_frontend_bound       (46.94%)

      24.514119238 seconds time elapsed

      48.662264000 seconds user
      28.094413000 seconds sys

Performance of the alternative implementation (see Approach 2 in #257)

➜ sudo perf stat node ./benchmark/logic/concurrent_insert.js scylladb-javascript-driver 2000000

 Performance counter stats for 'node ./benchmark/logic/concurrent_insert.js scylladb-javascript-driver 2000000':

         78,012.38 msec task-clock                       #    3.161 CPUs utilized             
         2,434,806      context-switches                 #   31.211 K/sec                     
           266,106      cpu-migrations                   #    3.411 K/sec                     
           419,842      page-faults                      #    5.382 K/sec                     
   134,718,558,269      cpu_atom/instructions/           #    0.73  insn per cycle              (46.18%)
   336,355,949,157      cpu_core/instructions/           #    1.08  insn per cycle              (53.82%)
   185,637,195,079      cpu_atom/cycles/                 #    2.380 GHz                         (46.18%)
   312,091,083,187      cpu_core/cycles/                 #    4.001 GHz                         (53.82%)
    25,320,442,441      cpu_atom/branches/               #  324.570 M/sec                       (46.18%)
    69,254,709,870      cpu_core/branches/               #  887.740 M/sec                       (53.82%)
       343,029,928      cpu_atom/branch-misses/          #    1.35% of all branches             (46.18%)
       519,561,406      cpu_core/branch-misses/          #    0.75% of all branches             (53.82%)
             TopdownL1 (cpu_atom)                 #     32.7 %  tma_backend_bound        (46.18%)
                                                  #     38.5 %  tma_backend_bound      
                                                  #      6.5 %  tma_bad_speculation    
                                                  #     35.5 %  tma_frontend_bound     
                                                  #     19.4 %  tma_retiring             (53.82%)
             TopdownL1 (cpu_atom)                 #     15.4 %  tma_retiring             (46.18%)
             TopdownL1 (cpu_atom)                 #      6.8 %  tma_bad_speculation    
                                                  #     45.1 %  tma_frontend_bound       (46.18%)

      24.679384241 seconds time elapsed

      45.424897000 seconds user
      27.732913000 seconds sys

Performance of the DataStax driver:

➜ sudo perf stat node ./benchmark/logic/concurrent_insert.js cassandra-driver 2000000

 Performance counter stats for 'node ./benchmark/logic/concurrent_insert.js cassandra-driver 2000000':

         26,551.61 msec task-clock                       #    0.959 CPUs utilized             
            27,071      context-switches                 #    1.020 K/sec                     
             3,349      cpu-migrations                   #  126.132 /sec                      
           449,716      page-faults                      #   16.937 K/sec                     
    88,366,241,047      cpu_atom/instructions/           #    1.20  insn per cycle              (9.14%)
   189,619,403,347      cpu_core/instructions/           #    1.62  insn per cycle              (90.86%)
    73,571,810,673      cpu_atom/cycles/                 #    2.771 GHz                         (9.14%)
   116,797,644,248      cpu_core/cycles/                 #    4.399 GHz                         (90.86%)
    18,830,161,128      cpu_atom/branches/               #  709.191 M/sec                       (9.14%)
    40,092,228,477      cpu_core/branches/               #    1.510 G/sec                       (90.86%)
       158,303,368      cpu_atom/branch-misses/          #    0.84% of all branches             (9.14%)
       147,831,358      cpu_core/branch-misses/          #    0.37% of all branches             (90.86%)
             TopdownL1 (cpu_atom)                 #     46.9 %  tma_backend_bound        (9.14%)
                                                  #     42.6 %  tma_backend_bound      
                                                  #      6.4 %  tma_bad_speculation    
                                                  #     23.7 %  tma_frontend_bound     
                                                  #     27.3 %  tma_retiring             (90.86%)
             TopdownL1 (cpu_atom)                 #     21.1 %  tma_retiring             (9.14%)
             TopdownL1 (cpu_atom)                 #      7.4 %  tma_bad_speculation    
                                                  #     24.6 %  tma_frontend_bound       (9.14%)

      27.680081974 seconds time elapsed

      23.598783000 seconds user
       2.296997000 seconds sys

Regular insert, this with Prepared Statement Wrapper

➜ sudo perf stat node ./benchmark/logic/insert.js scylladb-javascript-driver 200000

 Performance counter stats for 'node ./benchmark/logic/insert.js scylladb-javascript-driver 200000':

         17,006.43 msec task-clock                       #    0.871 CPUs utilized             
         1,424,966      context-switches                 #   83.790 K/sec                     
            10,045      cpu-migrations                   #  590.659 /sec                      
            10,575      page-faults                      #  621.824 /sec                      
    32,552,691,527      cpu_atom/instructions/           #    0.80  insn per cycle              (11.45%)
    67,934,051,811      cpu_core/instructions/           #    1.10  insn per cycle              (88.55%)
    40,840,442,735      cpu_atom/cycles/                 #    2.401 GHz                         (11.45%)
    61,553,896,677      cpu_core/cycles/                 #    3.619 GHz                         (88.55%)
     6,556,868,857      cpu_atom/branches/               #  385.552 M/sec                       (11.45%)
    13,958,890,195      cpu_core/branches/               #  820.801 M/sec                       (88.55%)
       109,830,162      cpu_atom/branch-misses/          #    1.68% of all branches             (11.45%)
       154,957,546      cpu_core/branch-misses/          #    1.11% of all branches             (88.55%)
             TopdownL1 (cpu_atom)                 #     25.3 %  tma_backend_bound        (11.45%)
                                                  #     31.7 %  tma_backend_bound      
                                                  #     11.0 %  tma_bad_speculation    
                                                  #     30.1 %  tma_frontend_bound     
                                                  #     27.3 %  tma_retiring             (88.55%)
             TopdownL1 (cpu_atom)                 #     15.9 %  tma_retiring             (11.45%)
             TopdownL1 (cpu_atom)                 #      8.6 %  tma_bad_speculation    
                                                  #     50.2 %  tma_frontend_bound       (11.45%)

      19.519492205 seconds time elapsed

       9.844989000 seconds user
       6.927643000 seconds sys

Regular insert, this with String

➜ sudo perf stat node ./benchmark/logic/insert.js scylladb-javascript-driver 200000 

 Performance counter stats for 'node ./benchmark/logic/insert.js scylladb-javascript-driver 200000':

         17,233.65 msec task-clock                       #    0.879 CPUs utilized             
         1,438,062      context-switches                 #   83.445 K/sec                     
             9,721      cpu-migrations                   #  564.071 /sec                      
            10,617      page-faults                      #  616.062 /sec                      
    32,687,188,606      cpu_atom/instructions/           #    0.81  insn per cycle              (9.57%)
    67,994,125,430      cpu_core/instructions/           #    1.09  insn per cycle              (90.43%)
    40,553,070,823      cpu_atom/cycles/                 #    2.353 GHz                         (9.57%)
    62,539,738,970      cpu_core/cycles/                 #    3.629 GHz                         (90.43%)
     6,551,270,505      cpu_atom/branches/               #  380.144 M/sec                       (9.57%)
    13,945,795,735      cpu_core/branches/               #  809.219 M/sec                       (90.43%)
        94,236,411      cpu_atom/branch-misses/          #    1.44% of all branches             (9.57%)
       160,288,721      cpu_core/branch-misses/          #    1.15% of all branches             (90.43%)
             TopdownL1 (cpu_atom)                 #     30.2 %  tma_backend_bound        (9.57%)
                                                  #     27.0 %  tma_backend_bound      
                                                  #     12.1 %  tma_bad_speculation    
                                                  #     36.3 %  tma_frontend_bound     
                                                  #     24.6 %  tma_retiring             (90.43%)
             TopdownL1 (cpu_atom)                 #     16.4 %  tma_retiring             (9.57%)
             TopdownL1 (cpu_atom)                 #      7.5 %  tma_bad_speculation    
                                                  #     45.9 %  tma_frontend_bound       (9.57%)

      19.611854331 seconds time elapsed

       9.939355000 seconds user
       7.052177000 seconds sys

@adespawn adespawn requested a review from wprzytula October 14, 2025 15:44
@adespawn adespawn self-assigned this Oct 14, 2025
@adespawn adespawn added P1 Highest normal priority enhancement New feature or request Performance Driver goes brrrr labels Oct 14, 2025
@wprzytula
Copy link
Contributor

wprzytula commented Oct 14, 2025

Impressive performance gains! Please confirm that I'm getting you correctly:

  1. With just the JS-side encoding, we get performance comparable to the DataStax driver.
  2. With this + PreparedStatementWrapper -> String optimisation, we outperform the DataStax driver.
  3. With this + PreparedStatementWrapper -> String optimisation, we get performance comparable to the Approach 2 (serialization fully in Rust).
  4. The PreparedStatementWrapper -> String optimisation provides meaningful gains for concurrent execution API, while slowing down the non-concurrent execution API only slightly.

If I got it right, then the results confirm that the JS-side encoding is indeed the way we want. After all, the whole encoding code is already there (so the wrapper part is thinner), and we lose no performance due to that.
Perfect! 🚀

Copy link
Contributor

@wprzytula wprzytula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah, what a giant piece of useful work! Keep going, let's make our wrapper even thinner and, at the same time, faster! ✨ 🚀

@adespawn
Copy link
Contributor Author

adespawn commented Oct 15, 2025

With just the JS-side encoding, we get performance comparable to the DataStax driver.

With this + PreparedStatementWrapper -> String optimisation, we outperform the DataStax driver.

When looking at the execution time - yes, for this specific n. I would need to run more benchmarks to determine if this will be the case for higher number of executed queries. (Remember that we had a lower starting point, but slightly steeper increase before)

With this + PreparedStatementWrapper -> String optimisation, we get performance comparable to the Approach 2 (serialization fully in Rust).

Yes

The PreparedStatementWrapper -> String optimisation provides meaningful gains for concurrent execution API, while slowing down the non-concurrent execution API only slightly.

The difference for the non-concurrent benchmark is within the run to run variance, so I'm unable to determine if there is any meaningful difference there

The following changes were made:
 - Change the CQL unset to JS undefined value, to allow easier
 transfer of the data to Rust layer.
 - Extend how the type information look to match the type object,
 returned from the Rust layer.
@adespawn adespawn force-pushed the js-encoder branch 4 times, most recently from aaeb4be to f003c7f Compare October 17, 2025 10:35
adespawn added a commit that referenced this pull request Oct 17, 2025
Convert the rust Complex type to an object of the type expected
by the JS encoder.

This commit is copied from #275 PR
let fistSupport = type.getFirstSupportType();
let secondSupport = type.getSecondSupportType();
let otherTypes = type.getInnerTypes();
if (fistSupport != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: not really resolved

Convert the rust Complex type to an object of the type expected
by the JS encoder.
Add a rust type, that stores already serialized, null or unset values.
This struct implements the trait that converts:
 - JS `undefined` to CQL `unset`
 - JS `null` to CQL `null`
 - JS Buffer to serialized CQL value
Change the client to use js serialization,
instead of the current hybrid approach.
adespawn and others added 4 commits October 22, 2025 13:12
This mostly changes checks for error types and messages.
This removes most of the old encoding logic
Replace PreparedStatementWrapper with strings, when executing
prepared statements, when calling endpoint to execute queries.
With this change we only return expected types for prepared statements,
instead of the whole PreparedStatementWrapper object from the Rust.

This change introduces a speed improvement when executing queries
through executeConcurrent, and does not slow down other endpoints.
@adespawn adespawn merged commit 3ddf5e9 into main Oct 22, 2025
8 checks passed
@adespawn adespawn deleted the js-encoder branch October 22, 2025 12:18
adespawn added a commit that referenced this pull request Oct 23, 2025
And remove the Unprovided variant from CqlType enum.
This is some additional cleanup that is possible after #275
adespawn added a commit that referenced this pull request Oct 23, 2025
And remove the Unprovided variant from CqlType enum.
This is some additional cleanup that is possible after #275
adespawn added a commit that referenced this pull request Oct 23, 2025
And remove the Unprovided variant from CqlType enum.
This is some additional cleanup that is possible after #275
adespawn added a commit that referenced this pull request Oct 23, 2025
The old one is no longer used, since merging #275.
The new type guessing is the same one that was used in the DSx driver
adespawn added a commit that referenced this pull request Oct 29, 2025
With the implementation of #275, type guessing is done in the Encoder,
instead of the separate extracted function.
This commit removes that unused code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request P1 Highest normal priority Performance Driver goes brrrr

Projects

None yet

3 participants