You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#8029 introduced ArrowWriter.get_column_writers to expose Vec<ArrowColumnWriter> of a the "in progress" ArrowRowGroupWriter. This was to enable downstream libraries to concurrently write columns and row groups. However only one ArrowRowGroupWriter will exist at a time and all ArrowColumnWriters need to complete before a new RowGroup can proceed to be serialized. This can be solved with locking but is not ideal. See apache/datafusion#16738 (comment).
We could:
Have downstream users locking and only serialize one RowGroup at a time.
Have ArrowWriter keep a Vec<ArrowRowGroupWriter> for all RowGroups currently being serialized.
Expose ArrowRowGroupWriterFactory of active ArrowWriter