-
-
Couldn't load subscription status.
- Fork 11
Open
Labels
Description
Describe the problem
TTY::File::CompareFiles#call seems read a file by chunk of block size.
When there is a multibyte character (CJK character, emoji, etc) crosses between blocks, the character will be broken.
Steps to reproduce the problem
./diff-j.rb
diff 4096-a.txt and 4096-aj.txt
--- 4096-a.txt
+++ 4096-aj.txt
@@ -1 +1 @@
-aaa(repeats 4096 times )aaa�
@@ -1 +1 @@
-A
+��い
4096-a.txt
aaa(repeats 4096 times)aaaA
4096-aj.txt
aaa(repeats 4096 times)aaaあい
check
puts TTY::File.diff("4096-a.txt", "4096-aj.txt")
Actual behaviour
Multi byte character あ is divided by byte, and broken.
�
��い
Expected behaviour
./diff-j.rb
diff 4096-a.txt and 4096-aj.txt
--- 4096-a.txt
+++ 4096-aj.txt
@@ -1 +1 @@
-aaa(repeats 4096 times )aaa
@@ -1 +1 @@
-A
+あい
It looks hard to solve with current implementation using block reads.
Describe your environment
- OS version: Debian 11
- Ruby version: 2.7.4
- TTY::File version: 0.10.0
diff-j.zip