Huge input causes OOM. 

With the changes in [Correctly split text into sentences #204](https://github.com/WorksApplications/Sudachi/pull/204), `SudachiTokenizer` analyzed all characters (was only first 4096). 

The change is fine. But due to the change, I see OOM issue.
`SudachiTokenizer.reset()` analyzes all text and store the result in `ArrayList<MorphemeList>`.  It causes OOM due to large list size.

I think it would be better to change the analyzing to be done gradually with the `SudachiTokenizer.incrementToken()` function, instead of all at once with the `SudachiTokenizer.reset()` function.  as well as [StandartTokenizer.java](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java#L152)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Huge input causes OOM. #131

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Huge input causes OOM. #131

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions