Commit 69a6735
authored
Update special token handling in conversion scripts for gpt2 derived tokenizers (#3746)
We still have the heads up in `README.md` regarding `bpe` tokenizers and this patch is needed for
- a couple of tokenizer tests
- some more `special` and `non-special` added tokens handling (as far as I understand it)
* Update special token handling
* Add mpt1 parent 5be6c80 commit 69a6735
File tree
5 files changed
+56
-19
lines changed5 files changed
+56
-19
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
118 | 118 | | |
119 | 119 | | |
120 | 120 | | |
| 121 | + | |
121 | 122 | | |
122 | 123 | | |
123 | 124 | | |
124 | | - | |
125 | | - | |
126 | | - | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
127 | 137 | | |
128 | 138 | | |
129 | | - | |
130 | 139 | | |
131 | 140 | | |
132 | 141 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
123 | 123 | | |
124 | 124 | | |
125 | 125 | | |
| 126 | + | |
126 | 127 | | |
127 | 128 | | |
128 | 129 | | |
129 | | - | |
130 | | - | |
131 | | - | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
132 | 142 | | |
133 | 143 | | |
134 | | - | |
135 | 144 | | |
136 | 145 | | |
137 | 146 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
136 | 136 | | |
137 | 137 | | |
138 | 138 | | |
139 | | - | |
140 | 139 | | |
141 | | - | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
142 | 144 | | |
143 | 145 | | |
144 | 146 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
139 | 139 | | |
140 | 140 | | |
141 | 141 | | |
| 142 | + | |
142 | 143 | | |
143 | 144 | | |
144 | 145 | | |
145 | | - | |
146 | | - | |
147 | | - | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
148 | 158 | | |
149 | 159 | | |
150 | | - | |
151 | 160 | | |
152 | 161 | | |
153 | 162 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
111 | 111 | | |
112 | 112 | | |
113 | 113 | | |
| 114 | + | |
114 | 115 | | |
115 | 116 | | |
116 | 117 | | |
117 | | - | |
118 | | - | |
119 | | - | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
120 | 130 | | |
121 | 131 | | |
122 | | - | |
123 | 132 | | |
124 | | - | |
125 | 133 | | |
126 | 134 | | |
127 | 135 | | |
| |||
0 commit comments