Refactor & delete unused code #16

nao1215 · 2025-08-27T12:41:11Z

Summary by CodeRabbit

New Features
- Streaming processing for large files with configurable chunk sizes.
- Flexible input sources: file paths, directories, io.Reader, and embed.FS.
Refactor
- Removed the legacy driver and auto-registration; unified data export through a generic path.
Documentation
- Updated READMEs (multi-language) to include streaming and input sources.
- Documented auto-save restrictions by input type and added a manual export workflow.
- Removed driver directory references from contribution guides.
Tests
- Stabilized example timeout output; ensured directory-build tests include a valid CSV.
Chores
- Lint configuration update.

coderabbitai · 2025-08-27T12:41:18Z

Walkthrough

Removes the custom filesql driver package and its tests, simplifies filesql API by deleting registration and specializing dump logic, and refactors the builder to a direct in-memory SQLite streaming workflow with auto-save validation and support. Updates README and CONTRIBUTING docs (multi-language) to reflect streaming, flexible inputs, and removal of the driver. Adjusts linter config.

Changes

Cohort / File(s)	Summary of Changes
Builder streaming + auto-save refactor `builder.go`, `builder_test.go`	Reworked to stream directly into in-memory SQLite, added input dedup and compressed-variant handling, header-only table creation, and explicit auto-save validation/flow. Tests create a valid CSV in temp dirs for Build.
Driver package removal `driver/doc.go`, `driver/driver.go`, `driver/driver_test.go`, `driver/errors.go`	Deleted the filesql driver implementation, its package docs, exported errors, and its entire test suite.
Filesql API simplification `filesql.go`, `filesql_test.go`	Removed DriverName and Register(); unified DumpDatabase to generic SQLite dumping; deleted TestRegisterDriver.
README feature updates `README.md`, `doc/es/README.md`, `doc/fr/README.md`, `doc/ja/README.md`, `doc/ko/README.md`, `doc/ru/README.md`, `doc/zh-cn/README.md`	Added streaming and flexible input sources features; main README documents auto-save restrictions and manual export; ko README replaces compression bullet.
CONTRIBUTING: remove driver dir `CONTRIBUTING.md`, `doc/es/CONTRIBUTING.md`, `doc/fr/CONTRIBUTING.md`, `doc/ja/CONTRIBUTING.md`, `doc/ko/CONTRIBUTING.md`, `doc/ru/CONTRIBUTING.md`, `doc/zh-cn/CONTRIBUTING.md`	Removed references to the driver/ directory from project structure and directory roles across languages.
Linter config `.golangci.yml`	Dropped SA1019 exclusion for `driver/.*\.go`; SA1019 now reported for that path (though files are removed).
Examples `example_test.go`	Normalized timeout error output by matching on "context deadline exceeded".

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as User
  participant B as Builder
  participant FS as FileSystem/FS/Readers
  participant D as Deduper
  participant DB as SQLite (in-memory)
  participant AS as Auto-save

  U->>B: Build(ctx)
  B->>B: validateAutoSaveConfig()
  B->>FS: collect inputs (paths, dirs, readers, FS)
  B->>D: deduplicate (abs paths, prefer uncompressed)
  B-->>U: build result (collected paths/readers)

  U->>B: Open(ctx)
  B->>DB: open in-memory connection
  loop for each input
    B->>DB: streamReaderToSQLite (create tables, insert rows)
    alt header-only/empty
      B->>DB: createEmptyTable/createTableFromHeaders
    end
  end
  alt auto-save enabled
    B->>AS: re-open with auto-save wrapper
    B->>DB: re-stream into new DB
  end
  B-->>U: *sql.DB

sequenceDiagram
  autonumber
  participant U as User
  participant DB as *sql.DB (auto-save wrapped)
  participant TX as Transaction (wrapped)
  participant DMP as DumpDatabase
  participant OUT as Output (dir or overwrite originals)

  U->>DB: Close() / TX Commit()
  alt auto-save on Close/Commit
    DB->>DMP: performAutoSave()
    DMP->>OUT: DumpDatabase(export)
    DMP-->>DB: result
  else no auto-save
    DB-->>U: close/commit only
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: auto save #11 — Also implements auto-save; overlaps with builder/driver integration for persistence behaviors.
Add document for contributors with fixing some linter issue #7 — Adjusts the same golangci-lint SA1019 exclusion for driver/.*\.go; this PR reverses that change.
Add Builder pattern and FS support #10 — Modifies builder logic and FS-based input handling; touches similar areas as the current builder refactor.

Poem

A bunny streams with tidy grace,
From files to tables, row by row we trace.
The driver hopped away—goodbye, old lane—
Now auto-save burrows paths again.
With readers, dirs, and headers neat,
The carrots of data are crisp to eat. 🥕

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch nchika/refacto-last

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

github-actions · 2025-08-27T12:41:41Z

Code Metrics Report

	main (bf81dc8)	#16 (edecb13)	+/-
Coverage	80.8%	81.1%	+0.3%
Test Execution Time	3s	3s	0s

Details

  |                     | main (bf81dc8) | #16 (edecb13) |  +/-  |
  |---------------------|----------------|---------------|-------|
+ | Coverage            |          80.8% |         81.1% | +0.3% |
  |   Files             |              9 |             8 |    -1 |
  |   Lines             |           1474 |          1130 |  -344 |
- |   Covered           |           1191 |           917 |  -274 |
  | Test Execution Time |             3s |            3s |    0s |

Code coverage of files in pull request scope (77.4% → 78.3%)

Files	Coverage	+/-	Status
builder.go	76.6%	-4.0%	modified
driver/driver.go	0.0%	-82.6%	removed
filesql.go	82.3%	+24.7%	modified

Reported by octocov

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

filesql.go (1)

308-310: Harden identifier quoting to prevent SQL injection via table names.
Table names are formatted into SQL with backticks; if a file name contains a backtick, injection is possible. Prefer double-quoted identifiers with escaping.

Apply this diff:
-	query := fmt.Sprintf("SELECT * FROM `%s`", tableName) //nolint:gosec // Table name is validated and comes from database metadata
+	query := fmt.Sprintf(`SELECT * FROM "%s"`, strings.ReplaceAll(tableName, `"`, `""`)) // escape " per SQLite
And in getSQLiteTableColumns:
-	query := fmt.Sprintf("PRAGMA table_info(`%s`)", tableName)
+	query := fmt.Sprintf(`PRAGMA table_info("%s")`, strings.ReplaceAll(tableName, `"`, `""`))
Also applies to: 323-327

builder.go (2)

775-808: Close compression readers (gzip/zstd) to release resources.
gzip.Reader and zstd.Decoder need Close(); currently they’re leaked.

Change createDecompressedReader to return a cleanup function and use it in streamFileToSQLite:

-func (b *DBBuilder) createDecompressedReader(file *os.File, filePath string) (io.Reader, error) {
+func (b *DBBuilder) createDecompressedReader(file *os.File, filePath string) (io.Reader, func() error, error) {
   // gzip
-		gzReader, err := gzip.NewReader(file)
+		gzReader, err := gzip.NewReader(file)
 		if err != nil {
-			return nil, fmt.Errorf("failed to create gzip reader: %w", err)
+			return nil, nil, fmt.Errorf("failed to create gzip reader: %w", err)
 		}
-		return gzReader, nil
+		return gzReader, gzReader.Close, nil
   // bz2
-		return bzip2.NewReader(file), nil
+		return bzip2.NewReader(file), func() error { return nil }, nil
   // xz
-		xzReader, err := xz.NewReader(file)
+		xzReader, err := xz.NewReader(file)
 		if err != nil {
-			return nil, fmt.Errorf("failed to create xz reader: %w", err)
+			return nil, nil, fmt.Errorf("failed to create xz reader: %w", err)
 		}
-		return xzReader, nil
+		return xzReader, func() error { return nil }, nil
   // zstd
-		zstdReader, err := zstd.NewReader(file)
+		zstdReader, err := zstd.NewReader(file)
 		if err != nil {
-			return nil, fmt.Errorf("failed to create zstd reader: %w", err)
+			return nil, nil, fmt.Errorf("failed to create zstd reader: %w", err)
 		}
-		return zstdReader.IOReadCloser(), nil
+		closer := zstdReader.Close
+		return zstdReader.IOReadCloser(), closer, nil
   // none
-	return file, nil
+	return file, func() error { return nil }, nil
}

And in streamFileToSQLite:

-	reader, err := b.createDecompressedReader(file, filePath)
+	reader, cleanup, err := b.createDecompressedReader(file, filePath)
 	if err != nil {
 		return fmt.Errorf("failed to create decompressed reader for %s: %w", filePath, err)
 	}
+	defer func() { _ = cleanup() }()

Also applies to: 618-637

724-757: Quote and escape identifiers to avoid injection when creating tables/inserts.
If headers or table names contain quotes, current formatting allows injection.

Apply this diff:

+	quote := func(s string) string { return `"` + strings.ReplaceAll(s, `"`, `""`) + `"` }
 	columns := make([]string, 0, len(columnInfo))
 	for _, col := range columnInfo {
-		columns = append(columns, fmt.Sprintf(`"%s" %s`, col.Name, col.Type.String()))
+		columns = append(columns, fmt.Sprintf(`%s %s`, quote(col.Name), col.Type.String()))
 	}
-	query := fmt.Sprintf(
-		`CREATE TABLE IF NOT EXISTS "%s" (%s)`,
-		chunk.TableName(),
-		strings.Join(columns, ", "),
-	)
+	query := fmt.Sprintf(`CREATE TABLE IF NOT EXISTS %s (%s)`,
+		quote(chunk.TableName()),
+		strings.Join(columns, ", "),
+	)

And in prepareInsertStatement:

-	query := fmt.Sprintf(
-		`INSERT INTO "%s" VALUES (%s)`,
-		chunk.TableName(),
-		strings.Join(placeholders, ", "),
-	)
+	quote := func(s string) string { return `"` + strings.ReplaceAll(s, `"`, `""`) + `"` }
+	query := fmt.Sprintf(`INSERT INTO %s VALUES (%s)`,
+		quote(chunk.TableName()),
+		strings.Join(placeholders, ", "),
+	)

🧹 Nitpick comments (10)

doc/es/README.md (1)
21-22: Utiliza “bloques” y añade referencia a SetDefaultChunkSize

Pequeño ajuste de terminología para lectura nativa, más una mención de la API pública que permite configurar el tamaño de bloques.

Puntos de atención:

En builder.go (líneas 172–177) hay un método público SetDefaultChunkSize(size int) *DBBuilder que configura el tamaño de bloques predeterminado.
- 🌊 **Procesamiento de flujos** - Maneja eficientemente archivos grandes mediante streaming con tamaños de bloques configurables
+ 🌊 **Procesamiento de flujos** - Maneja eficientemente archivos grandes mediante streaming con tamaños de bloques configurables (use DBBuilder.SetDefaultChunkSize para ajustar)
doc/zh-cn/README.md (1)
21-22: Minor Chinese grammar/readability fixes

Add “进行” for smoother flow and insert spaces around English tokens.
-- 🌊 **流式处理** - 通过可配置的块大小流式处理高效处理大文件
-- 📖 **灵活的输入源** - 支持文件路径、目录、io.Reader和embed.FS
+- 🌊 **流式处理** - 通过可配置的块大小进行流式处理，高效处理大文件
+- 📖 **灵活的输入源** - 支持文件路径、目录、io.Reader 和 embed.FS
If the project standardizes spacing around Latin identifiers in Chinese docs, apply consistently across files. Would you like me to submit a follow-up patch to unify this?
doc/fr/README.md (1)
21-22: French style/terminology: replace “chunk” and prefer “Prise en charge de(s)”

Use “blocs” for “chunks” and a more idiomatic construction for support.
-- 🌊 **Traitement en flux** - Gère efficacement les gros fichiers grâce au streaming avec des tailles de chunk configurables
-- 📖 **Sources d'entrée flexibles** - Support pour les chemins de fichiers, répertoires, io.Reader et embed.FS
+- 🌊 **Traitement en flux** - Traite efficacement les gros fichiers grâce au streaming avec des tailles de blocs configurables
+- 📖 **Sources d'entrée flexibles** - Prise en charge des chemins de fichiers, des répertoires, io.Reader et embed.FS
If you prefer “Gère” over “Traite” for tone consistency, keep it—both are fine. Also confirm we have an API knob for streaming chunk size to reference later in this section.
filesql.go (1)
241-243: Avoid opening an extra sql.Conn you don’t use.
DumpDatabase no longer branches on driver type; acquiring a dedicated connection is unnecessary overhead.

Apply this diff:
-	// Get the underlying connection
-	conn, err := db.Conn(context.Background())
-	if err != nil {
-		return fmt.Errorf("failed to get connection: %w", err)
-	}
-	defer conn.Close()
-
-	// Use generic dump functionality for all connections
-	return dumpSQLiteDatabase(db, outputDir, options)
+	// Use generic dump functionality for all connections
+	return dumpSQLiteDatabase(db, outputDir, options)
builder.go (3)
596-617: Empty-file handling: prefer consistent behavior with header-only flow.
Returning “file is empty” here differs from header-only handling below. Consider skipping empty files or creating an empty table to keep behavior predictable across inputs.

If you choose to skip, return nil with a debug log; if you choose to create an empty table, reuse the same path as header-only (table with headers if any, else skip).

823-892: Fallback “column1 TEXT” table is lossy for header-only sources.
Creates a misleading schema. Consider parsing just the header line (peek/tee) to preserve real column names, or require caller-provided headers for Reader inputs.

If deferring a bigger change, fail fast with a clear error for header-less files instead of creating a fake schema.

1065-1096: Case-insensitive extension checks.
Handle .GZ/.BZ2/.XZ/.ZST variants.

Apply this diff:
-	tableName := model.TableFromFilePath(file)
+	tableName := model.TableFromFilePath(file)
 	...
-	if b.isCompressedFile(file) {
+	if b.isCompressedFile(file) {
And implement isCompressedFile with lower-casing:
-func (b *DBBuilder) isCompressedFile(filePath string) bool {
-	return strings.HasSuffix(filePath, ".gz") ||
-		strings.HasSuffix(filePath, ".bz2") ||
-		strings.HasSuffix(filePath, ".xz") ||
-		strings.HasSuffix(filePath, ".zst")
+func (b *DBBuilder) isCompressedFile(filePath string) bool {
+	lp := strings.ToLower(filePath)
+	return strings.HasSuffix(lp, ".gz") ||
+		strings.HasSuffix(lp, ".bz2") ||
+		strings.HasSuffix(lp, ".xz") ||
+		strings.HasSuffix(lp, ".zst")
 }
Also applies to: 1098-1105
README.md (3)
23-24: Tighten wording; prefer fs.FS over embed.FS in feature list.

Minor wording polish and use the general interface fs.FS (which also covers embed.FS).
- - 🌊 **Stream Processing** - Efficiently handles large files through streaming with configurable chunk sizes
- - 📖 **Flexible Input Sources** - Support for file paths, directories, io.Reader, and embed.FS
+ - 🌊 **Stream Processing** - Efficiently handles large files via streaming with configurable chunk sizes
+ - 📖 **Flexible Input Sources** - Supports file paths, directories, io.Reader, and fs.FS
401-409: Be precise about the type and variable names for filesystem inputs.

Use fs.FS terminology and a neutral variable (e.g., fsys) to avoid implying a concrete type.
-- **Filesystems** (`AddFS`): **Only supports output directory mode**
+- **Filesystems (`fs.FS`)** (`AddFS`): **Only supports output directory mode**
@@
-  builder.AddFS(filesystem).EnableAutoSave("")
+  builder.AddFS(fsys).EnableAutoSave("")
@@
-  builder.AddFS(filesystem).EnableAutoSave("./output")
+  builder.AddFS(fsys).EnableAutoSave("./output")
410-411: Clarify restriction rationale and explicitly scope overwrite to path-based inputs; add caution.

Small wording tweak to reduce ambiguity and highlight risk.
-This restriction exists because io.Reader and filesystem inputs don't have original file paths that can be overwritten. The builder will return an error at build time if you try to use overwrite mode with these input types.
+This restriction exists because `io.Reader` and `fs.FS` inputs do not have stable original file paths to overwrite. Overwrite mode (`""`) is only valid with path-based inputs added via `AddPath`/`AddPaths`. The builder returns an error during `Build` if overwrite mode is used with `AddReader` or `AddFS`. Use an output directory for these inputs to avoid data loss.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bf81dc8 and 51771de.

📒 Files selected for processing (24)

.golangci.yml (0 hunks)
CONTRIBUTING.md (0 hunks)
README.md (2 hunks)
builder.go (14 hunks)
builder_test.go (1 hunks)
doc/es/CONTRIBUTING.md (0 hunks)
doc/es/README.md (1 hunks)
doc/fr/CONTRIBUTING.md (0 hunks)
doc/fr/README.md (1 hunks)
doc/ja/CONTRIBUTING.md (0 hunks)
doc/ja/README.md (1 hunks)
doc/ko/CONTRIBUTING.md (0 hunks)
doc/ko/README.md (1 hunks)
doc/ru/CONTRIBUTING.md (0 hunks)
doc/ru/README.md (1 hunks)
doc/zh-cn/CONTRIBUTING.md (0 hunks)
doc/zh-cn/README.md (1 hunks)
driver/doc.go (0 hunks)
driver/driver.go (0 hunks)
driver/driver_test.go (0 hunks)
driver/errors.go (0 hunks)
example_test.go (1 hunks)
filesql.go (1 hunks)
filesql_test.go (0 hunks)

💤 Files with no reviewable changes (13)

.golangci.yml
filesql_test.go
doc/zh-cn/CONTRIBUTING.md
doc/ru/CONTRIBUTING.md
CONTRIBUTING.md
doc/es/CONTRIBUTING.md
doc/ja/CONTRIBUTING.md
doc/ko/CONTRIBUTING.md
driver/driver.go
doc/fr/CONTRIBUTING.md
driver/doc.go
driver/driver_test.go
driver/errors.go

🧰 Additional context used

🧬 Code graph analysis (1)

builder.go (5)

domain/model/file.go (2)

IsSupportedFile (141-156)

FileType (18-18)

filesql.go (1)

DumpDatabase (227-243)

domain/model/stream.go (1)

NewStreamingParser (17-23)

domain/model/types.go (3)

Header (5-5)

ColumnInfo (86-89)

ColumnTypeText (51-51)

domain/model/table.go (1)

TableFromFilePath (77-88)

🪛 LanguageTool

doc/fr/README.md

[style] ~21-~21: Ce verbe peut être considéré comme familier dans un contexte formel.
Context: ...z et .zst - 🌊 Traitement en flux - Gère efficacement les gros fichiers grâce au...

(VERBES_FAMILIERS_PREMIUM)

[typographical] ~22-~22: Il manque une espace après le point.
Context: ...ichiers, répertoires, io.Reader et embed.FS - 🚀 Configuration zéro - Aucun ser...

(ESPACE_APRES_POINT)

doc/zh-cn/README.md

[uncategorized] ~22-~22: 动词的修饰一般为‘形容词（副词）+地+动词’。您的意思是否是：灵活"地"输入
Context: ...流式处理* - 通过可配置的块大小流式处理高效处理大文件 - 📖 灵活的输入源 - 支持文件路径、目录、io.Reader和embed.FS - ?...

(wb4)

doc/ko/README.md

[grammar] ~21-~21: There might be a mistake here.
Context: ... 가능한 청크 크기로 스트리밍을 통해 대용량 파일을 효율적으로 처리합니다 - 📖 유연한 입력 소스 - 파일 경로, 디렉터리, io.Reade...

(QB_NEW_EN)

README.md

[grammar] ~23-~23: There might be a mistake here.
Context: ... streaming with configurable chunk sizes - 📖 Flexible Input Sources - Support ...

(QB_NEW_EN)

[grammar] ~24-~24: There might be a mistake here.
Context: ...** - Support for file paths, directories, io.Reader, and embed.FS - 🚀 **Zero Set...

(QB_NEW_EN)

[grammar] ~24-~24: There might be a mistake here.
Context: ...hs, directories, io.Reader, and embed.FS - 🚀 Zero Setup - No database server r...

(QB_NEW_EN)

🔇 Additional comments (8)

doc/ja/README.md (1)
21-22: 日本語版READMEの文言改善
以下のとおり doc/ja/README.md の 21–22 行目を修正すると、より自然で明確な表現になります。API 側でも SetDefaultChunkSize や AddFS などのメソッドが確認できており、ドキュメント内容と合致しています。
- 🌊 **ストリーム処理** - 設定可能なチャンクサイズでストリーミングにより大きなファイルを効率的に処理します
- 📖 **柔軟な入力ソース** - ファイルパス、ディレクトリ、io.Reader、embed.FSをサポートします
+ 🌊 **ストリーム処理** - 設定可能なチャンクサイズのストリーミングで大容量ファイルを効率的に処理します
+ 📖 **柔軟な入力ソース** - ファイルパス、ディレクトリ、io.Reader、および embed.FS をサポートします
ドキュメントにある「設定可能なチャンクサイズ」は DBBuilder.SetDefaultChunkSize で提供されています。

「柔軟な入力ソース」は AddPath/AddPaths（ファイル・ディレクトリ）、AddReader（io.Reader）、AddFS（embed.FS）で実現されています。
これらの API が正しく存在することを確認しましたので、こちらの文言修正を適用してください。
doc/ko/README.md (1)
21-22: 문서 용어 및 문장 연결자 수정 적용 요청

작업 중인 doc/ko/README.md에서 다음 두 줄을 아래와 같이 변경해주세요.
변경 전 (Lines 21–22)
🌊 **스트림 처리** - 구성 가능한 청크 크기로 스트리밍을 통해 대용량 파일을 효율적으로 처리합니다  
📖 **유연한 입력 소스** - 파일 경로, 디렉터리, io.Reader, embed.FS를 지원합니다
변경 후 제안
🌊 **스트림 처리** - 구성 가능한 블록 크기의 스트리밍으로 대용량 파일을 효율적으로 처리합니다  
📖 **유연한 입력 소스** - 파일 경로, 디렉터리, io.Reader 및 embed.FS를 지원합니다
변경 이유
• “청크”보다는 한국어독자에게 친숙한 “블록” 용어를 사용하는 것이 가독성 측면에서 더 적합합니다.
• 나열된 아이템의 맨 끝 연결에는 “및”을 추가해 자연스러운 한국어 리스트 표현을 완성합니다.
doc/ru/README.md (1)

21-22: Docs: new capabilities bullets look good and consistent with repo-wide changes.
Accurately reflects streaming and flexible inputs; no action needed.

example_test.go (1)

485-491: Stabilize timeout assertion — good change.
String-contains check avoids driver-specific wrapping noise and keeps the example deterministic.

builder_test.go (1)

385-391: Good: ensure directory-based Build has a valid CSV.
Precreating test.csv removes flakiness and aligns with new Build semantics.

builder.go (2)

1042-1056: Overwrite mode writes all tables to the first path’s directory.
This violates “overwrite original files” when inputs originate from multiple directories.

Would you like a targeted refactor to track table→source path and write each table back to its original location? I can draft it with a minimal map carried from Build/Open into autoSaveConnection.

1107-1135: Auto-save overwrite guard — nice.
Clear error message and prevents ambiguous writes when using readers/FS.

README.md (1)

383-391: Overwrite and output-directory modes for AddPath/AddPaths are correctly implemented and tested

Both EnableAutoSave("") (overwrite mode) and EnableAutoSave("<dir>") (output-directory mode) are supported for file-path inputs added via both AddPath and AddPaths.

Validation in validateAutoSaveConfig() allows empty-output (overwrite) only when no reader/FS inputs are present, and errors appropriately otherwise.

Builder tests cover:

EnableAutoSave("") and EnableAutoSaveOnCommit("") with path inputs succeeding (no error)【builder_test.go 1203–1211】【builder_test.go 1222–1230】

Overwrite on multiple commits and auto-save on close scenarios with path inputs【builder_test.go 960–959】【builder_test.go 981–988】

Doc snippets for both modes accurately reflect code behavior for both single and multiple paths; no inconsistencies found.

No further changes required.

coderabbitai · 2025-08-27T12:50:02Z

builder.go

+		// If it's a directory, collect all supported files from it
+		if info.IsDir() {
+			// Recursively collect all supported files from the directory
+			err := filepath.WalkDir(path, func(filePath string, d os.DirEntry, err error) error {
+				if err != nil {
+					return err
+				}
+
+				// Skip directories and non-supported files
+				if d.IsDir() || !model.IsSupportedFile(filePath) {
+					return nil
+				}
+
+				// Skip files with known issues (like duplicate columns for testing)
+				if strings.Contains(filepath.Base(filePath), "duplicate_columns") {
+					return nil
+				}
+
+				// Get absolute path to avoid duplicates
+				absPath, err := filepath.Abs(filePath)
+				if err != nil {
+					return fmt.Errorf("failed to get absolute path for %s: %w", filePath, err)
+				}
+
+				// Only add if not already processed
+				if !processedFiles[absPath] {
+					processedFiles[absPath] = true
+					b.collectedPaths = append(b.collectedPaths, filePath)
+				}
+				return nil
+			})
+			if err != nil {
+				return nil, fmt.Errorf("failed to walk directory %s: %w", path, err)
+			}
+		} else {
+			// It's a file, check if it has a supported extension
 			if !model.IsSupportedFile(path) {
 				return nil, fmt.Errorf("unsupported file type: %s", path)
 			}
+
+			// Get absolute path to avoid duplicates
+			absPath, err := filepath.Abs(path)
+			if err != nil {
+				return nil, fmt.Errorf("failed to get absolute path for %s: %w", path, err)
+			}
+
+			// Only add if not already processed
+			if !processedFiles[absPath] {
+				processedFiles[absPath] = true
+				b.collectedPaths = append(b.collectedPaths, path)
+			}
 		}


🛠️ Refactor suggestion

Don’t bake test-specific filters into production (skipping “duplicate_columns”).
This silently ignores legitimate files whose names accidentally match, surprising users.

Apply this diff:

- // Skip files with known issues (like duplicate columns for testing) - if strings.Contains(filepath.Base(filePath), "duplicate_columns") { - return nil - }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// If it's a directory, collect all supported files from it

if info.IsDir() {

// Recursively collect all supported files from the directory

err := filepath.WalkDir(path, func(filePath string, d os.DirEntry, err error) error {

if err != nil {

return err

}

// Skip directories and non-supported files

if d.IsDir() || !model.IsSupportedFile(filePath) {

return nil

}

// Skip files with known issues (like duplicate columns for testing)

if strings.Contains(filepath.Base(filePath), "duplicate_columns") {

return nil

}

// Get absolute path to avoid duplicates

absPath, err := filepath.Abs(filePath)

if err != nil {

return fmt.Errorf("failed to get absolute path for %s: %w", filePath, err)

}

// Only add if not already processed

if !processedFiles[absPath] {

processedFiles[absPath] = true

b.collectedPaths = append(b.collectedPaths, filePath)

}

return nil

})

if err != nil {

return nil, fmt.Errorf("failed to walk directory %s: %w", path, err)

}

} else {

// It's a file, check if it has a supported extension

if !model.IsSupportedFile(path) {

return nil, fmt.Errorf("unsupported file type: %s", path)

}

// Get absolute path to avoid duplicates

absPath, err := filepath.Abs(path)

if err != nil {

return nil, fmt.Errorf("failed to get absolute path for %s: %w", path, err)

}

// Only add if not already processed

if !processedFiles[absPath] {

processedFiles[absPath] = true

b.collectedPaths = append(b.collectedPaths, path)

}

}

// If it's a directory, collect all supported files from it

if info.IsDir() {

// Recursively collect all supported files from the directory

err := filepath.WalkDir(path, func(filePath string, d os.DirEntry, err error) error {

if err != nil {

return err

}

// Skip directories and non-supported files

if d.IsDir() || !model.IsSupportedFile(filePath) {

return nil

}

// Get absolute path to avoid duplicates

absPath, err := filepath.Abs(filePath)

if err != nil {

return fmt.Errorf("failed to get absolute path for %s: %w", filePath, err)

}

// Only add if not already processed

if !processedFiles[absPath] {

processedFiles[absPath] = true

b.collectedPaths = append(b.collectedPaths, filePath)

}

return nil

})

if err != nil {

return nil, fmt.Errorf("failed to walk directory %s: %w", path, err)

}

} else {

// It's a file, check if it has a supported extension

if !model.IsSupportedFile(path) {

return nil, fmt.Errorf("unsupported file type: %s", path)

}

// Get absolute path to avoid duplicates

absPath, err := filepath.Abs(path)

if err != nil {

return nil, fmt.Errorf("failed to get absolute path for %s: %w", path, err)

}

// Only add if not already processed

if !processedFiles[absPath] {

processedFiles[absPath] = true

b.collectedPaths = append(b.collectedPaths, path)

}

}

🤖 Prompt for AI Agents

In builder.go around lines 309-360, remove the hard-coded test-only filter that skips files with "duplicate_columns" in their name and the related comment; instead add a configurable ignore pattern (or boolean) to the Builder struct (default off/empty) and change the WalkDir logic to consult that configuration when deciding to skip a file. Keep all other logic (supported-file check, abs path dedupe, processedFiles map) unchanged; update tests to set the Builder's ignore-pattern to "duplicate_columns" (or enable the test-only flag) so tests retain the current behavior without baking test filters into production code.

coderabbitai · 2025-08-27T12:50:03Z

builder.go

+	// Remove compressed duplicates when uncompressed versions exist
+	b.collectedPaths = b.deduplicateCompressedFiles(b.collectedPaths)

-	// Case 1: Only file paths, use filesql driver for autosave support
-	if len(b.collectedPaths) > 0 && len(b.readers) == 0 {
-		// Create DSN with all collected paths and auto-save config
-		dsn := strings.Join(b.collectedPaths, ";")
+	// Unified approach: always use direct SQLite with streaming
+	// Create in-memory SQLite database using the same method as filesql driver
+	sqliteDriver := &sqlite.Driver{}
+	conn, err := sqliteDriver.Open(":memory:")
+	if err != nil {
+		return nil, fmt.Errorf("failed to create in-memory database: %w", err)
+	}

-		// Append auto-save configuration to DSN if enabled
-		if b.autoSaveConfig != nil && b.autoSaveConfig.Enabled {
-			configJSON, err := json.Marshal(b.autoSaveConfig)
-			if err != nil {
-				return nil, fmt.Errorf("failed to serialize auto-save config: %w", err)
-			}
-			configEncoded := base64.StdEncoding.EncodeToString(configJSON)
-			dsn += "?autosave=" + configEncoded
-		}
+	// Wrap the driver connection in sql.DB
+	db := sql.OpenDB(&directConnector{conn: conn})



⚠️ Potential issue

Single underlying driver.Conn without pool limits can break with database/sql.
directConnector returns the same Conn; without limiting the pool, database/sql may create multiple logical connections referencing the same Conn, risking concurrent use of a non-thread-safe Conn.

Apply this minimal safety net (keeps one physical connection):

- db := sql.OpenDB(&directConnector{conn: conn}) + db := sql.OpenDB(&directConnector{conn: conn}) + // Ensure a single physical connection to avoid concurrent use of the same driver.Conn + db.SetMaxOpenConns(1) + db.SetMaxIdleConns(1) + db.SetConnMaxLifetime(0)

Note: If you later increase pool size, switch to a DSN like file:memdb?mode=memory&cache=shared and return a fresh driver.Conn per Connect.

🤖 Prompt for AI Agents

In builder.go around lines 430 to 443, the current sql.OpenDB wraps a single physical driver.Conn but doesn't restrict database/sql's pool, risking concurrent use of that one Conn; after creating db := sql.OpenDB(...), set the connection pool limits to a single logical connection (e.g., db.SetMaxOpenConns(1) and db.SetMaxIdleConns(1), optionally db.SetConnMaxLifetime(0)) so database/sql never opens multiple concurrent connections against the single underlying driver.Conn; if you later need more concurrency switch to a true in-memory DSN and return fresh driver.Conn per Connect.

coderabbitai · 2025-08-27T12:50:03Z

builder.go

+	// Process reader inputs using streaming
+	for _, readerInput := range b.readers {
+		if err := b.streamReaderToSQLite(ctx, db, readerInput); err != nil {
+			_ = db.Close() // Ignore close error during error handling
+			return nil, fmt.Errorf("failed to stream reader input for table '%s': %w", readerInput.TableName, err)
 		}
-
-		// Note: Auto-save for readers-only case is not fully implemented yet
-		// This requires extending the driver to support auto-save without original file paths
 	}


🛠️ Refactor suggestion

Close fs.FS readers after streaming to avoid descriptor leaks.
ReaderInput may wrap fs.File or compressed readers that implement io.Closer; they’re never closed here.

Apply this diff:

for _, readerInput := range b.readers { - if err := b.streamReaderToSQLite(ctx, db, readerInput); err != nil { + if err := b.streamReaderToSQLite(ctx, db, readerInput); err != nil { _ = db.Close() // Ignore close error during error handling return nil, fmt.Errorf("failed to stream reader input for table '%s': %w", readerInput.TableName, err) } + // Best-effort close if the reader is closable (fs.File, gzip, zstd, etc.) + if c, ok := readerInput.Reader.(io.Closer); ok { + _ = c.Close() + } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Process reader inputs using streaming

for _, readerInput := range b.readers {

if err := b.streamReaderToSQLite(ctx, db, readerInput); err != nil {

_ = db.Close() // Ignore close error during error handling

return nil, fmt.Errorf("failed to stream reader input for table '%s': %w", readerInput.TableName, err)

}

// Note: Auto-save for readers-only case is not fully implemented yet

// This requires extending the driver to support auto-save without original file paths

}

// Process reader inputs using streaming

for _, readerInput := range b.readers {

if err := b.streamReaderToSQLite(ctx, db, readerInput); err != nil {

_ = db.Close() // Ignore close error during error handling

return nil, fmt.Errorf("failed to stream reader input for table '%s': %w", readerInput.TableName, err)

}

// Best-effort close if the reader is closable (fs.File, gzip, zstd, etc.)

if c, ok := readerInput.Reader.(io.Closer); ok {

_ = c.Close()

}

}

🤖 Prompt for AI Agents

In builder.go around lines 452 to 458, the loop that streams b.readers does not close reader resources and can leak file descriptors; after calling b.streamReaderToSQLite(ctx, db, readerInput) you must detect if readerInput.Reader (or the underlying reader) implements io.Closer and invoke Close() when done. Ensure the Close() is called regardless of whether streamReaderToSQLite succeeded or returned an error (close before returning), and surface or log any Close() error (do not ignore it silently). Implement the Close() call directly after streaming each reader, handling the closer assertion and propagating/annotating errors appropriately.

coderabbitai · 2025-08-27T12:50:03Z

builder.go

+		// Create auto-save enabled database using our connector
+		connector := &autoSaveConnector{
+			sqliteConn:     freshConn,
+			autoSaveConfig: b.autoSaveConfig,
+			originalPaths:  b.collectOriginalPaths(),
+		}
+		db = sql.OpenDB(connector)
+
+		// Stream all data to new connection
+		for _, path := range b.collectedPaths {
+			if err := b.streamFileToSQLite(ctx, db, path); err != nil {
+				_ = db.Close() // Ignore close error during error handling
+				return nil, fmt.Errorf("failed to stream file %s: %w", path, err)
+			}
+		}
+
+		for _, readerInput := range b.readers {
+			if err := b.streamReaderToSQLite(ctx, db, readerInput); err != nil {
+				_ = db.Close() // Ignore close error during error handling
+				return nil, fmt.Errorf("failed to stream reader input for table '%s': %w", readerInput.TableName, err)
+			}
+		}
+	}


🛠️ Refactor suggestion

Mirror pool limits for the auto-save connection.
Same rationale as above; keep the connection count at 1.

Apply this diff:

- db = sql.OpenDB(connector) + db = sql.OpenDB(connector) + db.SetMaxOpenConns(1) + db.SetMaxIdleConns(1) + db.SetConnMaxLifetime(0)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Create auto-save enabled database using our connector

connector := &autoSaveConnector{

sqliteConn: freshConn,

autoSaveConfig: b.autoSaveConfig,

originalPaths: b.collectOriginalPaths(),

}

db = sql.OpenDB(connector)

// Stream all data to new connection

for _, path := range b.collectedPaths {

if err := b.streamFileToSQLite(ctx, db, path); err != nil {

_ = db.Close() // Ignore close error during error handling

return nil, fmt.Errorf("failed to stream file %s: %w", path, err)

}

}

for _, readerInput := range b.readers {

if err := b.streamReaderToSQLite(ctx, db, readerInput); err != nil {

_ = db.Close() // Ignore close error during error handling

return nil, fmt.Errorf("failed to stream reader input for table '%s': %w", readerInput.TableName, err)

}

}

}

// Create auto-save enabled database using our connector

connector := &autoSaveConnector{

sqliteConn: freshConn,

autoSaveConfig: b.autoSaveConfig,

originalPaths: b.collectOriginalPaths(),

}

db = sql.OpenDB(connector)

db.SetMaxOpenConns(1)

db.SetMaxIdleConns(1)

db.SetConnMaxLifetime(0)

// Stream all data to new connection

for _, path := range b.collectedPaths {

if err := b.streamFileToSQLite(ctx, db, path); err != nil {

_ = db.Close() // Ignore close error during error handling

return nil, fmt.Errorf("failed to stream file %s: %w", path, err)

}

}

for _, readerInput := range b.readers {

if err := b.streamReaderToSQLite(ctx, db, readerInput); err != nil {

_ = db.Close() // Ignore close error during error handling

return nil, fmt.Errorf("failed to stream reader input for table '%s': %w", readerInput.TableName, err)

}

}

🤖 Prompt for AI Agents

In builder.go around lines 488 to 510, the auto-save connection created with sql.OpenDB(connector) does not mirror the single-connection pool limits used elsewhere; after creating db = sql.OpenDB(connector) set the connection pool limits to a single connection by calling db.SetMaxOpenConns(1) and db.SetMaxIdleConns(1) (and optionally SetConnMaxIdleTime/SetConnMaxLifetime to match other usage if present) before streaming files/readers so the auto-save connection remains limited to one connection.

coderabbitai · 2025-08-27T12:50:03Z

README.md

+- **io.Reader** (`AddReader`): **Only supports output directory mode**
+  ```go
+  // ❌ Build error - overwrite mode not supported
+  builder.AddReader(reader, "table", model.FileTypeCSV).EnableAutoSave("")
+
+  // ✅ Must specify output directory
+  builder.AddReader(reader, "table", model.FileTypeCSV).EnableAutoSave("./output")
+  ```
+


⚠️ Potential issue

Docs reference internal symbol model.FileTypeCSV; use public API (likely filesql.FileTypeCSV).

End users shouldn't import internal packages. The snippet should reference an exported constant from the public package.

- builder.AddReader(reader, "table", model.FileTypeCSV).EnableAutoSave("") + builder.AddReader(reader, "table", filesql.FileTypeCSV).EnableAutoSave("") @@ - builder.AddReader(reader, "table", model.FileTypeCSV).EnableAutoSave("./output") + builder.AddReader(reader, "table", filesql.FileTypeCSV).EnableAutoSave("./output")

If the exported identifier differs (e.g., filesql.CSV), adjust accordingly.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- **io.Reader** (`AddReader`): **Only supports output directory mode**

```go

// ❌ Build error - overwrite mode not supported

builder.AddReader(reader, "table", model.FileTypeCSV).EnableAutoSave("")

// ✅ Must specify output directory

builder.AddReader(reader, "table", model.FileTypeCSV).EnableAutoSave("./output")

```

// ❌ Build error - overwrite mode not supported

builder.AddReader(reader, "table", filesql.FileTypeCSV).EnableAutoSave("")

// ✅ Must specify output directory

builder.AddReader(reader, "table", filesql.FileTypeCSV).EnableAutoSave("./output")

🤖 Prompt for AI Agents

In README.md around lines 392 to 400 the example references the internal symbol model.FileTypeCSV; replace it with the exported public constant from the public package (e.g., filesql.FileTypeCSV or filesql.CSV) and update the code snippet import and usage accordingly — change the snippet to import the public package (filesql) instead of model, use the correct exported identifier, and verify the README compiles against the public API shown to avoid suggesting internal package usage.

nao1215 added 3 commits August 27, 2025 20:39

Fix: auto save (unified approch)

54b9336

auto save limitation

f525e7b

Delete unused codes

51771de

nao1215 merged commit 756ba3e into main Aug 27, 2025
7 of 8 checks passed

coderabbitai bot reviewed Aug 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Refactor & delete unused code #16

Refactor & delete unused code #16

Uh oh!

nao1215 commented Aug 27, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 27, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

github-actions bot commented Aug 27, 2025

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Aug 27, 2025

Uh oh!

coderabbitai bot Aug 27, 2025

Uh oh!

coderabbitai bot Aug 27, 2025

Uh oh!

coderabbitai bot Aug 27, 2025

Uh oh!

coderabbitai bot Aug 27, 2025

Uh oh!

Uh oh!

Uh oh!

Refactor & delete unused code #16

Refactor & delete unused code #16

Uh oh!

Conversation

nao1215 commented Aug 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

github-actions bot commented Aug 27, 2025

Code Metrics Report

Code coverage of files in pull request scope (77.4% → 78.3%)

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nao1215 commented Aug 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 27, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)