-
-
Notifications
You must be signed in to change notification settings - Fork 0
Refactor & delete unused code #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRemoves the custom filesql driver package and its tests, simplifies filesql API by deleting registration and specializing dump logic, and refactors the builder to a direct in-memory SQLite streaming workflow with auto-save validation and support. Updates README and CONTRIBUTING docs (multi-language) to reflect streaming, flexible inputs, and removal of the driver. Adjusts linter config. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as User
participant B as Builder
participant FS as FileSystem/FS/Readers
participant D as Deduper
participant DB as SQLite (in-memory)
participant AS as Auto-save
U->>B: Build(ctx)
B->>B: validateAutoSaveConfig()
B->>FS: collect inputs (paths, dirs, readers, FS)
B->>D: deduplicate (abs paths, prefer uncompressed)
B-->>U: build result (collected paths/readers)
U->>B: Open(ctx)
B->>DB: open in-memory connection
loop for each input
B->>DB: streamReaderToSQLite (create tables, insert rows)
alt header-only/empty
B->>DB: createEmptyTable/createTableFromHeaders
end
end
alt auto-save enabled
B->>AS: re-open with auto-save wrapper
B->>DB: re-stream into new DB
end
B-->>U: *sql.DB
sequenceDiagram
autonumber
participant U as User
participant DB as *sql.DB (auto-save wrapped)
participant TX as Transaction (wrapped)
participant DMP as DumpDatabase
participant OUT as Output (dir or overwrite originals)
U->>DB: Close() / TX Commit()
alt auto-save on Close/Commit
DB->>DMP: performAutoSave()
DMP->>OUT: DumpDatabase(export)
DMP-->>DB: result
else no auto-save
DB-->>U: close/commit only
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Code Metrics Report
Details | | main (bf81dc8) | #16 (edecb13) | +/- |
|---------------------|----------------|---------------|-------|
+ | Coverage | 80.8% | 81.1% | +0.3% |
| Files | 9 | 8 | -1 |
| Lines | 1474 | 1130 | -344 |
- | Covered | 1191 | 917 | -274 |
| Test Execution Time | 3s | 3s | 0s | Code coverage of files in pull request scope (77.4% → 78.3%)
Reported by octocov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
filesql.go (1)
308-310
: Harden identifier quoting to prevent SQL injection via table names.
Table names are formatted into SQL with backticks; if a file name contains a backtick, injection is possible. Prefer double-quoted identifiers with escaping.Apply this diff:
- query := fmt.Sprintf("SELECT * FROM `%s`", tableName) //nolint:gosec // Table name is validated and comes from database metadata + query := fmt.Sprintf(`SELECT * FROM "%s"`, strings.ReplaceAll(tableName, `"`, `""`)) // escape " per SQLiteAnd in getSQLiteTableColumns:
- query := fmt.Sprintf("PRAGMA table_info(`%s`)", tableName) + query := fmt.Sprintf(`PRAGMA table_info("%s")`, strings.ReplaceAll(tableName, `"`, `""`))Also applies to: 323-327
builder.go (2)
775-808
: Close compression readers (gzip/zstd) to release resources.
gzip.Reader and zstd.Decoder need Close(); currently they’re leaked.Change createDecompressedReader to return a cleanup function and use it in streamFileToSQLite:
-func (b *DBBuilder) createDecompressedReader(file *os.File, filePath string) (io.Reader, error) { +func (b *DBBuilder) createDecompressedReader(file *os.File, filePath string) (io.Reader, func() error, error) { // gzip - gzReader, err := gzip.NewReader(file) + gzReader, err := gzip.NewReader(file) if err != nil { - return nil, fmt.Errorf("failed to create gzip reader: %w", err) + return nil, nil, fmt.Errorf("failed to create gzip reader: %w", err) } - return gzReader, nil + return gzReader, gzReader.Close, nil // bz2 - return bzip2.NewReader(file), nil + return bzip2.NewReader(file), func() error { return nil }, nil // xz - xzReader, err := xz.NewReader(file) + xzReader, err := xz.NewReader(file) if err != nil { - return nil, fmt.Errorf("failed to create xz reader: %w", err) + return nil, nil, fmt.Errorf("failed to create xz reader: %w", err) } - return xzReader, nil + return xzReader, func() error { return nil }, nil // zstd - zstdReader, err := zstd.NewReader(file) + zstdReader, err := zstd.NewReader(file) if err != nil { - return nil, fmt.Errorf("failed to create zstd reader: %w", err) + return nil, nil, fmt.Errorf("failed to create zstd reader: %w", err) } - return zstdReader.IOReadCloser(), nil + closer := zstdReader.Close + return zstdReader.IOReadCloser(), closer, nil // none - return file, nil + return file, func() error { return nil }, nil }And in streamFileToSQLite:
- reader, err := b.createDecompressedReader(file, filePath) + reader, cleanup, err := b.createDecompressedReader(file, filePath) if err != nil { return fmt.Errorf("failed to create decompressed reader for %s: %w", filePath, err) } + defer func() { _ = cleanup() }()Also applies to: 618-637
724-757
: Quote and escape identifiers to avoid injection when creating tables/inserts.
If headers or table names contain quotes, current formatting allows injection.Apply this diff:
+ quote := func(s string) string { return `"` + strings.ReplaceAll(s, `"`, `""`) + `"` } columns := make([]string, 0, len(columnInfo)) for _, col := range columnInfo { - columns = append(columns, fmt.Sprintf(`"%s" %s`, col.Name, col.Type.String())) + columns = append(columns, fmt.Sprintf(`%s %s`, quote(col.Name), col.Type.String())) } - query := fmt.Sprintf( - `CREATE TABLE IF NOT EXISTS "%s" (%s)`, - chunk.TableName(), - strings.Join(columns, ", "), - ) + query := fmt.Sprintf(`CREATE TABLE IF NOT EXISTS %s (%s)`, + quote(chunk.TableName()), + strings.Join(columns, ", "), + )And in prepareInsertStatement:
- query := fmt.Sprintf( - `INSERT INTO "%s" VALUES (%s)`, - chunk.TableName(), - strings.Join(placeholders, ", "), - ) + quote := func(s string) string { return `"` + strings.ReplaceAll(s, `"`, `""`) + `"` } + query := fmt.Sprintf(`INSERT INTO %s VALUES (%s)`, + quote(chunk.TableName()), + strings.Join(placeholders, ", "), + )
🧹 Nitpick comments (10)
doc/es/README.md (1)
21-22
: Utiliza “bloques” y añade referencia a SetDefaultChunkSizePequeño ajuste de terminología para lectura nativa, más una mención de la API pública que permite configurar el tamaño de bloques.
Puntos de atención:
- En
builder.go
(líneas 172–177) hay un método públicoSetDefaultChunkSize(size int) *DBBuilder
que configura el tamaño de bloques predeterminado.- 🌊 **Procesamiento de flujos** - Maneja eficientemente archivos grandes mediante streaming con tamaños de bloques configurables + 🌊 **Procesamiento de flujos** - Maneja eficientemente archivos grandes mediante streaming con tamaños de bloques configurables (use DBBuilder.SetDefaultChunkSize para ajustar)doc/zh-cn/README.md (1)
21-22
: Minor Chinese grammar/readability fixesAdd “进行” for smoother flow and insert spaces around English tokens.
-- 🌊 **流式处理** - 通过可配置的块大小流式处理高效处理大文件 -- 📖 **灵活的输入源** - 支持文件路径、目录、io.Reader和embed.FS +- 🌊 **流式处理** - 通过可配置的块大小进行流式处理,高效处理大文件 +- 📖 **灵活的输入源** - 支持文件路径、目录、io.Reader 和 embed.FSIf the project standardizes spacing around Latin identifiers in Chinese docs, apply consistently across files. Would you like me to submit a follow-up patch to unify this?
doc/fr/README.md (1)
21-22
: French style/terminology: replace “chunk” and prefer “Prise en charge de(s)”Use “blocs” for “chunks” and a more idiomatic construction for support.
-- 🌊 **Traitement en flux** - Gère efficacement les gros fichiers grâce au streaming avec des tailles de chunk configurables -- 📖 **Sources d'entrée flexibles** - Support pour les chemins de fichiers, répertoires, io.Reader et embed.FS +- 🌊 **Traitement en flux** - Traite efficacement les gros fichiers grâce au streaming avec des tailles de blocs configurables +- 📖 **Sources d'entrée flexibles** - Prise en charge des chemins de fichiers, des répertoires, io.Reader et embed.FSIf you prefer “Gère” over “Traite” for tone consistency, keep it—both are fine. Also confirm we have an API knob for streaming chunk size to reference later in this section.
filesql.go (1)
241-243
: Avoid opening an extra sql.Conn you don’t use.
DumpDatabase no longer branches on driver type; acquiring a dedicated connection is unnecessary overhead.Apply this diff:
- // Get the underlying connection - conn, err := db.Conn(context.Background()) - if err != nil { - return fmt.Errorf("failed to get connection: %w", err) - } - defer conn.Close() - - // Use generic dump functionality for all connections - return dumpSQLiteDatabase(db, outputDir, options) + // Use generic dump functionality for all connections + return dumpSQLiteDatabase(db, outputDir, options)builder.go (3)
596-617
: Empty-file handling: prefer consistent behavior with header-only flow.
Returning “file is empty” here differs from header-only handling below. Consider skipping empty files or creating an empty table to keep behavior predictable across inputs.If you choose to skip, return nil with a debug log; if you choose to create an empty table, reuse the same path as header-only (table with headers if any, else skip).
823-892
: Fallback “column1 TEXT” table is lossy for header-only sources.
Creates a misleading schema. Consider parsing just the header line (peek/tee) to preserve real column names, or require caller-provided headers for Reader inputs.If deferring a bigger change, fail fast with a clear error for header-less files instead of creating a fake schema.
1065-1096
: Case-insensitive extension checks.
Handle .GZ/.BZ2/.XZ/.ZST variants.Apply this diff:
- tableName := model.TableFromFilePath(file) + tableName := model.TableFromFilePath(file) ... - if b.isCompressedFile(file) { + if b.isCompressedFile(file) {And implement isCompressedFile with lower-casing:
-func (b *DBBuilder) isCompressedFile(filePath string) bool { - return strings.HasSuffix(filePath, ".gz") || - strings.HasSuffix(filePath, ".bz2") || - strings.HasSuffix(filePath, ".xz") || - strings.HasSuffix(filePath, ".zst") +func (b *DBBuilder) isCompressedFile(filePath string) bool { + lp := strings.ToLower(filePath) + return strings.HasSuffix(lp, ".gz") || + strings.HasSuffix(lp, ".bz2") || + strings.HasSuffix(lp, ".xz") || + strings.HasSuffix(lp, ".zst") }Also applies to: 1098-1105
README.md (3)
23-24
: Tighten wording; prefer fs.FS over embed.FS in feature list.Minor wording polish and use the general interface
fs.FS
(which also coversembed.FS
).- - 🌊 **Stream Processing** - Efficiently handles large files through streaming with configurable chunk sizes - - 📖 **Flexible Input Sources** - Support for file paths, directories, io.Reader, and embed.FS + - 🌊 **Stream Processing** - Efficiently handles large files via streaming with configurable chunk sizes + - 📖 **Flexible Input Sources** - Supports file paths, directories, io.Reader, and fs.FS
401-409
: Be precise about the type and variable names for filesystem inputs.Use
fs.FS
terminology and a neutral variable (e.g.,fsys
) to avoid implying a concrete type.-- **Filesystems** (`AddFS`): **Only supports output directory mode** +- **Filesystems (`fs.FS`)** (`AddFS`): **Only supports output directory mode** @@ - builder.AddFS(filesystem).EnableAutoSave("") + builder.AddFS(fsys).EnableAutoSave("") @@ - builder.AddFS(filesystem).EnableAutoSave("./output") + builder.AddFS(fsys).EnableAutoSave("./output")
410-411
: Clarify restriction rationale and explicitly scope overwrite to path-based inputs; add caution.Small wording tweak to reduce ambiguity and highlight risk.
-This restriction exists because io.Reader and filesystem inputs don't have original file paths that can be overwritten. The builder will return an error at build time if you try to use overwrite mode with these input types. +This restriction exists because `io.Reader` and `fs.FS` inputs do not have stable original file paths to overwrite. Overwrite mode (`""`) is only valid with path-based inputs added via `AddPath`/`AddPaths`. The builder returns an error during `Build` if overwrite mode is used with `AddReader` or `AddFS`. Use an output directory for these inputs to avoid data loss.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (24)
.golangci.yml
(0 hunks)CONTRIBUTING.md
(0 hunks)README.md
(2 hunks)builder.go
(14 hunks)builder_test.go
(1 hunks)doc/es/CONTRIBUTING.md
(0 hunks)doc/es/README.md
(1 hunks)doc/fr/CONTRIBUTING.md
(0 hunks)doc/fr/README.md
(1 hunks)doc/ja/CONTRIBUTING.md
(0 hunks)doc/ja/README.md
(1 hunks)doc/ko/CONTRIBUTING.md
(0 hunks)doc/ko/README.md
(1 hunks)doc/ru/CONTRIBUTING.md
(0 hunks)doc/ru/README.md
(1 hunks)doc/zh-cn/CONTRIBUTING.md
(0 hunks)doc/zh-cn/README.md
(1 hunks)driver/doc.go
(0 hunks)driver/driver.go
(0 hunks)driver/driver_test.go
(0 hunks)driver/errors.go
(0 hunks)example_test.go
(1 hunks)filesql.go
(1 hunks)filesql_test.go
(0 hunks)
💤 Files with no reviewable changes (13)
- .golangci.yml
- filesql_test.go
- doc/zh-cn/CONTRIBUTING.md
- doc/ru/CONTRIBUTING.md
- CONTRIBUTING.md
- doc/es/CONTRIBUTING.md
- doc/ja/CONTRIBUTING.md
- doc/ko/CONTRIBUTING.md
- driver/driver.go
- doc/fr/CONTRIBUTING.md
- driver/doc.go
- driver/driver_test.go
- driver/errors.go
🧰 Additional context used
🧬 Code graph analysis (1)
builder.go (5)
domain/model/file.go (2)
IsSupportedFile
(141-156)FileType
(18-18)filesql.go (1)
DumpDatabase
(227-243)domain/model/stream.go (1)
NewStreamingParser
(17-23)domain/model/types.go (3)
Header
(5-5)ColumnInfo
(86-89)ColumnTypeText
(51-51)domain/model/table.go (1)
TableFromFilePath
(77-88)
🪛 LanguageTool
doc/fr/README.md
[style] ~21-~21: Ce verbe peut être considéré comme familier dans un contexte formel.
Context: ...z et .zst - 🌊 Traitement en flux - Gère efficacement les gros fichiers grâce au...
(VERBES_FAMILIERS_PREMIUM)
[typographical] ~22-~22: Il manque une espace après le point.
Context: ...ichiers, répertoires, io.Reader et embed.FS - 🚀 Configuration zéro - Aucun ser...
(ESPACE_APRES_POINT)
doc/zh-cn/README.md
[uncategorized] ~22-~22: 动词的修饰一般为‘形容词(副词)+地+动词’。您的意思是否是:灵活"地"输入
Context: ...流式处理* - 通过可配置的块大小流式处理高效处理大文件 - 📖 灵活的输入源 - 支持文件路径、目录、io.Reader和embed.FS - ?...
(wb4)
doc/ko/README.md
[grammar] ~21-~21: There might be a mistake here.
Context: ... 가능한 청크 크기로 스트리밍을 통해 대용량 파일을 효율적으로 처리합니다 - 📖 유연한 입력 소스 - 파일 경로, 디렉터리, io.Reade...
(QB_NEW_EN)
README.md
[grammar] ~23-~23: There might be a mistake here.
Context: ... streaming with configurable chunk sizes - 📖 Flexible Input Sources - Support ...
(QB_NEW_EN)
[grammar] ~24-~24: There might be a mistake here.
Context: ...** - Support for file paths, directories, io.Reader, and embed.FS - 🚀 **Zero Set...
(QB_NEW_EN)
[grammar] ~24-~24: There might be a mistake here.
Context: ...hs, directories, io.Reader, and embed.FS - 🚀 Zero Setup - No database server r...
(QB_NEW_EN)
🔇 Additional comments (8)
doc/ja/README.md (1)
21-22
: 日本語版READMEの文言改善
以下のとおりdoc/ja/README.md
の 21–22 行目を修正すると、より自然で明確な表現になります。API 側でもSetDefaultChunkSize
やAddFS
などのメソッドが確認できており、ドキュメント内容と合致しています。- 🌊 **ストリーム処理** - 設定可能なチャンクサイズでストリーミングにより大きなファイルを効率的に処理します - 📖 **柔軟な入力ソース** - ファイルパス、ディレクトリ、io.Reader、embed.FSをサポートします + 🌊 **ストリーム処理** - 設定可能なチャンクサイズのストリーミングで大容量ファイルを効率的に処理します + 📖 **柔軟な入力ソース** - ファイルパス、ディレクトリ、io.Reader、および embed.FS をサポートします
- ドキュメントにある「設定可能なチャンクサイズ」は
DBBuilder.SetDefaultChunkSize
で提供されています。- 「柔軟な入力ソース」は
AddPath
/AddPaths
(ファイル・ディレクトリ)、AddReader
(io.Reader)、AddFS
(embed.FS)で実現されています。
これらの API が正しく存在することを確認しましたので、こちらの文言修正を適用してください。doc/ko/README.md (1)
21-22
: 문서 용어 및 문장 연결자 수정 적용 요청작업 중인
doc/ko/README.md
에서 다음 두 줄을 아래와 같이 변경해주세요.
변경 전 (Lines 21–22)
🌊 **스트림 처리** - 구성 가능한 청크 크기로 스트리밍을 통해 대용량 파일을 효율적으로 처리합니다 📖 **유연한 입력 소스** - 파일 경로, 디렉터리, io.Reader, embed.FS를 지원합니다
변경 후 제안
🌊 **스트림 처리** - 구성 가능한 블록 크기의 스트리밍으로 대용량 파일을 효율적으로 처리합니다 📖 **유연한 입력 소스** - 파일 경로, 디렉터리, io.Reader 및 embed.FS를 지원합니다
변경 이유
• “청크”보다는 한국어독자에게 친숙한 “블록” 용어를 사용하는 것이 가독성 측면에서 더 적합합니다.
• 나열된 아이템의 맨 끝 연결에는 “및”을 추가해 자연스러운 한국어 리스트 표현을 완성합니다.doc/ru/README.md (1)
21-22
: Docs: new capabilities bullets look good and consistent with repo-wide changes.
Accurately reflects streaming and flexible inputs; no action needed.example_test.go (1)
485-491
: Stabilize timeout assertion — good change.
String-contains check avoids driver-specific wrapping noise and keeps the example deterministic.builder_test.go (1)
385-391
: Good: ensure directory-based Build has a valid CSV.
Precreating test.csv removes flakiness and aligns with new Build semantics.builder.go (2)
1042-1056
: Overwrite mode writes all tables to the first path’s directory.
This violates “overwrite original files” when inputs originate from multiple directories.Would you like a targeted refactor to track table→source path and write each table back to its original location? I can draft it with a minimal map carried from Build/Open into autoSaveConnection.
1107-1135
: Auto-save overwrite guard — nice.
Clear error message and prevents ambiguous writes when using readers/FS.README.md (1)
383-391
: Overwrite and output-directory modes for AddPath/AddPaths are correctly implemented and tested
- Both
EnableAutoSave("")
(overwrite mode) andEnableAutoSave("<dir>")
(output-directory mode) are supported for file-path inputs added via bothAddPath
andAddPaths
.- Validation in
validateAutoSaveConfig()
allows empty-output (overwrite) only when no reader/FS inputs are present, and errors appropriately otherwise.- Builder tests cover:
EnableAutoSave("")
andEnableAutoSaveOnCommit("")
with path inputs succeeding (no error)【builder_test.go 1203–1211】【builder_test.go 1222–1230】- Overwrite on multiple commits and auto-save on close scenarios with path inputs【builder_test.go 960–959】【builder_test.go 981–988】
- Doc snippets for both modes accurately reflect code behavior for both single and multiple paths; no inconsistencies found.
No further changes required.
// If it's a directory, collect all supported files from it | ||
if info.IsDir() { | ||
// Recursively collect all supported files from the directory | ||
err := filepath.WalkDir(path, func(filePath string, d os.DirEntry, err error) error { | ||
if err != nil { | ||
return err | ||
} | ||
|
||
// Skip directories and non-supported files | ||
if d.IsDir() || !model.IsSupportedFile(filePath) { | ||
return nil | ||
} | ||
|
||
// Skip files with known issues (like duplicate columns for testing) | ||
if strings.Contains(filepath.Base(filePath), "duplicate_columns") { | ||
return nil | ||
} | ||
|
||
// Get absolute path to avoid duplicates | ||
absPath, err := filepath.Abs(filePath) | ||
if err != nil { | ||
return fmt.Errorf("failed to get absolute path for %s: %w", filePath, err) | ||
} | ||
|
||
// Only add if not already processed | ||
if !processedFiles[absPath] { | ||
processedFiles[absPath] = true | ||
b.collectedPaths = append(b.collectedPaths, filePath) | ||
} | ||
return nil | ||
}) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to walk directory %s: %w", path, err) | ||
} | ||
} else { | ||
// It's a file, check if it has a supported extension | ||
if !model.IsSupportedFile(path) { | ||
return nil, fmt.Errorf("unsupported file type: %s", path) | ||
} | ||
|
||
// Get absolute path to avoid duplicates | ||
absPath, err := filepath.Abs(path) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to get absolute path for %s: %w", path, err) | ||
} | ||
|
||
// Only add if not already processed | ||
if !processedFiles[absPath] { | ||
processedFiles[absPath] = true | ||
b.collectedPaths = append(b.collectedPaths, path) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Don’t bake test-specific filters into production (skipping “duplicate_columns”).
This silently ignores legitimate files whose names accidentally match, surprising users.
Apply this diff:
- // Skip files with known issues (like duplicate columns for testing)
- if strings.Contains(filepath.Base(filePath), "duplicate_columns") {
- return nil
- }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// If it's a directory, collect all supported files from it | |
if info.IsDir() { | |
// Recursively collect all supported files from the directory | |
err := filepath.WalkDir(path, func(filePath string, d os.DirEntry, err error) error { | |
if err != nil { | |
return err | |
} | |
// Skip directories and non-supported files | |
if d.IsDir() || !model.IsSupportedFile(filePath) { | |
return nil | |
} | |
// Skip files with known issues (like duplicate columns for testing) | |
if strings.Contains(filepath.Base(filePath), "duplicate_columns") { | |
return nil | |
} | |
// Get absolute path to avoid duplicates | |
absPath, err := filepath.Abs(filePath) | |
if err != nil { | |
return fmt.Errorf("failed to get absolute path for %s: %w", filePath, err) | |
} | |
// Only add if not already processed | |
if !processedFiles[absPath] { | |
processedFiles[absPath] = true | |
b.collectedPaths = append(b.collectedPaths, filePath) | |
} | |
return nil | |
}) | |
if err != nil { | |
return nil, fmt.Errorf("failed to walk directory %s: %w", path, err) | |
} | |
} else { | |
// It's a file, check if it has a supported extension | |
if !model.IsSupportedFile(path) { | |
return nil, fmt.Errorf("unsupported file type: %s", path) | |
} | |
// Get absolute path to avoid duplicates | |
absPath, err := filepath.Abs(path) | |
if err != nil { | |
return nil, fmt.Errorf("failed to get absolute path for %s: %w", path, err) | |
} | |
// Only add if not already processed | |
if !processedFiles[absPath] { | |
processedFiles[absPath] = true | |
b.collectedPaths = append(b.collectedPaths, path) | |
} | |
} | |
// If it's a directory, collect all supported files from it | |
if info.IsDir() { | |
// Recursively collect all supported files from the directory | |
err := filepath.WalkDir(path, func(filePath string, d os.DirEntry, err error) error { | |
if err != nil { | |
return err | |
} | |
// Skip directories and non-supported files | |
if d.IsDir() || !model.IsSupportedFile(filePath) { | |
return nil | |
} | |
// Get absolute path to avoid duplicates | |
absPath, err := filepath.Abs(filePath) | |
if err != nil { | |
return fmt.Errorf("failed to get absolute path for %s: %w", filePath, err) | |
} | |
// Only add if not already processed | |
if !processedFiles[absPath] { | |
processedFiles[absPath] = true | |
b.collectedPaths = append(b.collectedPaths, filePath) | |
} | |
return nil | |
}) | |
if err != nil { | |
return nil, fmt.Errorf("failed to walk directory %s: %w", path, err) | |
} | |
} else { | |
// It's a file, check if it has a supported extension | |
if !model.IsSupportedFile(path) { | |
return nil, fmt.Errorf("unsupported file type: %s", path) | |
} | |
// Get absolute path to avoid duplicates | |
absPath, err := filepath.Abs(path) | |
if err != nil { | |
return nil, fmt.Errorf("failed to get absolute path for %s: %w", path, err) | |
} | |
// Only add if not already processed | |
if !processedFiles[absPath] { | |
processedFiles[absPath] = true | |
b.collectedPaths = append(b.collectedPaths, path) | |
} | |
} |
🤖 Prompt for AI Agents
In builder.go around lines 309-360, remove the hard-coded test-only filter that
skips files with "duplicate_columns" in their name and the related comment;
instead add a configurable ignore pattern (or boolean) to the Builder struct
(default off/empty) and change the WalkDir logic to consult that configuration
when deciding to skip a file. Keep all other logic (supported-file check, abs
path dedupe, processedFiles map) unchanged; update tests to set the Builder's
ignore-pattern to "duplicate_columns" (or enable the test-only flag) so tests
retain the current behavior without baking test filters into production code.
// Remove compressed duplicates when uncompressed versions exist | ||
b.collectedPaths = b.deduplicateCompressedFiles(b.collectedPaths) | ||
|
||
// Case 1: Only file paths, use filesql driver for autosave support | ||
if len(b.collectedPaths) > 0 && len(b.readers) == 0 { | ||
// Create DSN with all collected paths and auto-save config | ||
dsn := strings.Join(b.collectedPaths, ";") | ||
// Unified approach: always use direct SQLite with streaming | ||
// Create in-memory SQLite database using the same method as filesql driver | ||
sqliteDriver := &sqlite.Driver{} | ||
conn, err := sqliteDriver.Open(":memory:") | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to create in-memory database: %w", err) | ||
} | ||
|
||
// Append auto-save configuration to DSN if enabled | ||
if b.autoSaveConfig != nil && b.autoSaveConfig.Enabled { | ||
configJSON, err := json.Marshal(b.autoSaveConfig) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to serialize auto-save config: %w", err) | ||
} | ||
configEncoded := base64.StdEncoding.EncodeToString(configJSON) | ||
dsn += "?autosave=" + configEncoded | ||
} | ||
// Wrap the driver connection in sql.DB | ||
db := sql.OpenDB(&directConnector{conn: conn}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single underlying driver.Conn without pool limits can break with database/sql.
directConnector returns the same Conn; without limiting the pool, database/sql may create multiple logical connections referencing the same Conn, risking concurrent use of a non-thread-safe Conn.
Apply this minimal safety net (keeps one physical connection):
- db := sql.OpenDB(&directConnector{conn: conn})
+ db := sql.OpenDB(&directConnector{conn: conn})
+ // Ensure a single physical connection to avoid concurrent use of the same driver.Conn
+ db.SetMaxOpenConns(1)
+ db.SetMaxIdleConns(1)
+ db.SetConnMaxLifetime(0)
Note: If you later increase pool size, switch to a DSN like file:memdb?mode=memory&cache=shared and return a fresh driver.Conn per Connect.
🤖 Prompt for AI Agents
In builder.go around lines 430 to 443, the current sql.OpenDB wraps a single
physical driver.Conn but doesn't restrict database/sql's pool, risking
concurrent use of that one Conn; after creating db := sql.OpenDB(...), set the
connection pool limits to a single logical connection (e.g.,
db.SetMaxOpenConns(1) and db.SetMaxIdleConns(1), optionally
db.SetConnMaxLifetime(0)) so database/sql never opens multiple concurrent
connections against the single underlying driver.Conn; if you later need more
concurrency switch to a true in-memory DSN and return fresh driver.Conn per
Connect.
// Process reader inputs using streaming | ||
for _, readerInput := range b.readers { | ||
if err := b.streamReaderToSQLite(ctx, db, readerInput); err != nil { | ||
_ = db.Close() // Ignore close error during error handling | ||
return nil, fmt.Errorf("failed to stream reader input for table '%s': %w", readerInput.TableName, err) | ||
} | ||
|
||
// Note: Auto-save for readers-only case is not fully implemented yet | ||
// This requires extending the driver to support auto-save without original file paths | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Close fs.FS readers after streaming to avoid descriptor leaks.
ReaderInput may wrap fs.File or compressed readers that implement io.Closer; they’re never closed here.
Apply this diff:
for _, readerInput := range b.readers {
- if err := b.streamReaderToSQLite(ctx, db, readerInput); err != nil {
+ if err := b.streamReaderToSQLite(ctx, db, readerInput); err != nil {
_ = db.Close() // Ignore close error during error handling
return nil, fmt.Errorf("failed to stream reader input for table '%s': %w", readerInput.TableName, err)
}
+ // Best-effort close if the reader is closable (fs.File, gzip, zstd, etc.)
+ if c, ok := readerInput.Reader.(io.Closer); ok {
+ _ = c.Close()
+ }
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// Process reader inputs using streaming | |
for _, readerInput := range b.readers { | |
if err := b.streamReaderToSQLite(ctx, db, readerInput); err != nil { | |
_ = db.Close() // Ignore close error during error handling | |
return nil, fmt.Errorf("failed to stream reader input for table '%s': %w", readerInput.TableName, err) | |
} | |
// Note: Auto-save for readers-only case is not fully implemented yet | |
// This requires extending the driver to support auto-save without original file paths | |
} | |
// Process reader inputs using streaming | |
for _, readerInput := range b.readers { | |
if err := b.streamReaderToSQLite(ctx, db, readerInput); err != nil { | |
_ = db.Close() // Ignore close error during error handling | |
return nil, fmt.Errorf("failed to stream reader input for table '%s': %w", readerInput.TableName, err) | |
} | |
// Best-effort close if the reader is closable (fs.File, gzip, zstd, etc.) | |
if c, ok := readerInput.Reader.(io.Closer); ok { | |
_ = c.Close() | |
} | |
} |
🤖 Prompt for AI Agents
In builder.go around lines 452 to 458, the loop that streams b.readers does not
close reader resources and can leak file descriptors; after calling
b.streamReaderToSQLite(ctx, db, readerInput) you must detect if
readerInput.Reader (or the underlying reader) implements io.Closer and invoke
Close() when done. Ensure the Close() is called regardless of whether
streamReaderToSQLite succeeded or returned an error (close before returning),
and surface or log any Close() error (do not ignore it silently). Implement the
Close() call directly after streaming each reader, handling the closer assertion
and propagating/annotating errors appropriately.
// Create auto-save enabled database using our connector | ||
connector := &autoSaveConnector{ | ||
sqliteConn: freshConn, | ||
autoSaveConfig: b.autoSaveConfig, | ||
originalPaths: b.collectOriginalPaths(), | ||
} | ||
db = sql.OpenDB(connector) | ||
|
||
// Stream all data to new connection | ||
for _, path := range b.collectedPaths { | ||
if err := b.streamFileToSQLite(ctx, db, path); err != nil { | ||
_ = db.Close() // Ignore close error during error handling | ||
return nil, fmt.Errorf("failed to stream file %s: %w", path, err) | ||
} | ||
} | ||
|
||
for _, readerInput := range b.readers { | ||
if err := b.streamReaderToSQLite(ctx, db, readerInput); err != nil { | ||
_ = db.Close() // Ignore close error during error handling | ||
return nil, fmt.Errorf("failed to stream reader input for table '%s': %w", readerInput.TableName, err) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Mirror pool limits for the auto-save connection.
Same rationale as above; keep the connection count at 1.
Apply this diff:
- db = sql.OpenDB(connector)
+ db = sql.OpenDB(connector)
+ db.SetMaxOpenConns(1)
+ db.SetMaxIdleConns(1)
+ db.SetConnMaxLifetime(0)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// Create auto-save enabled database using our connector | |
connector := &autoSaveConnector{ | |
sqliteConn: freshConn, | |
autoSaveConfig: b.autoSaveConfig, | |
originalPaths: b.collectOriginalPaths(), | |
} | |
db = sql.OpenDB(connector) | |
// Stream all data to new connection | |
for _, path := range b.collectedPaths { | |
if err := b.streamFileToSQLite(ctx, db, path); err != nil { | |
_ = db.Close() // Ignore close error during error handling | |
return nil, fmt.Errorf("failed to stream file %s: %w", path, err) | |
} | |
} | |
for _, readerInput := range b.readers { | |
if err := b.streamReaderToSQLite(ctx, db, readerInput); err != nil { | |
_ = db.Close() // Ignore close error during error handling | |
return nil, fmt.Errorf("failed to stream reader input for table '%s': %w", readerInput.TableName, err) | |
} | |
} | |
} | |
// Create auto-save enabled database using our connector | |
connector := &autoSaveConnector{ | |
sqliteConn: freshConn, | |
autoSaveConfig: b.autoSaveConfig, | |
originalPaths: b.collectOriginalPaths(), | |
} | |
db = sql.OpenDB(connector) | |
db.SetMaxOpenConns(1) | |
db.SetMaxIdleConns(1) | |
db.SetConnMaxLifetime(0) | |
// Stream all data to new connection | |
for _, path := range b.collectedPaths { | |
if err := b.streamFileToSQLite(ctx, db, path); err != nil { | |
_ = db.Close() // Ignore close error during error handling | |
return nil, fmt.Errorf("failed to stream file %s: %w", path, err) | |
} | |
} | |
for _, readerInput := range b.readers { | |
if err := b.streamReaderToSQLite(ctx, db, readerInput); err != nil { | |
_ = db.Close() // Ignore close error during error handling | |
return nil, fmt.Errorf("failed to stream reader input for table '%s': %w", readerInput.TableName, err) | |
} | |
} |
🤖 Prompt for AI Agents
In builder.go around lines 488 to 510, the auto-save connection created with
sql.OpenDB(connector) does not mirror the single-connection pool limits used
elsewhere; after creating db = sql.OpenDB(connector) set the connection pool
limits to a single connection by calling db.SetMaxOpenConns(1) and
db.SetMaxIdleConns(1) (and optionally SetConnMaxIdleTime/SetConnMaxLifetime to
match other usage if present) before streaming files/readers so the auto-save
connection remains limited to one connection.
- **io.Reader** (`AddReader`): **Only supports output directory mode** | ||
```go | ||
// ❌ Build error - overwrite mode not supported | ||
builder.AddReader(reader, "table", model.FileTypeCSV).EnableAutoSave("") | ||
|
||
// ✅ Must specify output directory | ||
builder.AddReader(reader, "table", model.FileTypeCSV).EnableAutoSave("./output") | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs reference internal symbol model.FileTypeCSV
; use public API (likely filesql.FileTypeCSV).
End users shouldn't import internal packages. The snippet should reference an exported constant from the public package.
- builder.AddReader(reader, "table", model.FileTypeCSV).EnableAutoSave("")
+ builder.AddReader(reader, "table", filesql.FileTypeCSV).EnableAutoSave("")
@@
- builder.AddReader(reader, "table", model.FileTypeCSV).EnableAutoSave("./output")
+ builder.AddReader(reader, "table", filesql.FileTypeCSV).EnableAutoSave("./output")
If the exported identifier differs (e.g., filesql.CSV
), adjust accordingly.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- **io.Reader** (`AddReader`): **Only supports output directory mode** | |
```go | |
// ❌ Build error - overwrite mode not supported | |
builder.AddReader(reader, "table", model.FileTypeCSV).EnableAutoSave("") | |
// ✅ Must specify output directory | |
builder.AddReader(reader, "table", model.FileTypeCSV).EnableAutoSave("./output") | |
``` | |
// ❌ Build error - overwrite mode not supported | |
builder.AddReader(reader, "table", filesql.FileTypeCSV).EnableAutoSave("") | |
// ✅ Must specify output directory | |
builder.AddReader(reader, "table", filesql.FileTypeCSV).EnableAutoSave("./output") |
🤖 Prompt for AI Agents
In README.md around lines 392 to 400 the example references the internal symbol
model.FileTypeCSV; replace it with the exported public constant from the public
package (e.g., filesql.FileTypeCSV or filesql.CSV) and update the code snippet
import and usage accordingly — change the snippet to import the public package
(filesql) instead of model, use the correct exported identifier, and verify the
README compiles against the public API shown to avoid suggesting internal
package usage.
Summary by CodeRabbit