-
Notifications
You must be signed in to change notification settings - Fork 1k
Add Chinese error message translations #3935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
working version of chinese translation
|
I would love to contribute on my favorite and most used package! |
|
No problem I am at your command |
|
Thanks for your research on this. I'd love to contribute! |
|
About the package size, will it be possible to separate language files into separate download?
|
|
I haven't get time to look at how this is implemented yet, though if the translation task do need multiple human hours, it will be ideal that the translation task can be distributed to volunteer translators that with minimal setup requirement (I'm expecting translators should at least quite familiar with R and data.table, but the steps mentioned above can be a little bit overwhelming). Is it possible to create a table and translators can just fill in the table? One organizer can distribute the tasks to translators, standardize the frequently used words/concepts. This will make the translation part much easier. Then there could be some script to read the table and create the needed files. I'm not sure if this is possible or easy to implement, since I haven't look at the details. |
|
Yep, that's basically what I have in mind! First I want to reach out on Twitter for more potential translators, just waiting to hear some first reactions from @jangorecki & @mattdowle |
|
Sounds good to me! In #3937 I reduced installed size down by 1MB to about 4.0MB. You mentioned a few size figures for |
|
Idea is good. |
Yes, this is a sticking point. I'm not sure the best way around this. Right now, I think a good approach is to add a new start-up message that can point non-native users to the Open to other suggestions here. |
|
Another direction I think would be helpful is to provide documentation, vignettes, and error message table in multi-languages online (in data table website, github wiki, etc.). This completely does not rely on how R implements multi-lingual support and would not add package size at all, and is much easier to edit and contribute. |
|
For Google search error message, is it practical to assign an error number to every message? |
|
OK here are some of my thoughts for how to proceed. I prefer to keep pretty focused for now. As mentioned we're the first major R package to try this so we're in unexplored territory -- I don't want to venture too far outside the status quo for package dev while we figure out what are the tradeoffs of adding non-English support. So, as much as I think things like a website, wiki, etc. added in Chinese are quite useful, let's pause on that for now. Re: adding numbered messages, I lean towards not doing it. I'm not sure how to scale that or how practical it is to do at this stage. (1) 15+ years of Lastly, I think that coordinating everything here on GitHub will get inconvenient/impractical very quickly. Instead, I propose to use a (through-now dormant) Slack channel to simplify things. This is for convenience/dev speed, however, I want to keep as transparent as possible with this (1) because of the importance of transparency in FOSS in general and (2) because such archive can serve quite useful to other R teams/packages considering going a similar route in the future. Therefore, all chats should happen in public channels, and at the conclusion (or maybe regularly -- if the volume of messages is large, we'll have to download occasionally given the free messaging limits in Slack), I'll post the full archive here as a textfile. Current team of volunteers is: @renkun-ken @shrektan @dracodoc @hongyuanjia @Leo-Lee15 @amy17519 @zhiiiyang @GuangchuangYu Please contact me at my gmail address: MichaelChirico4 and I'll invite you to the channel (rdatatable.slack). Thanks so much everyone! It's been very encouraging to see the outpouring of support :) |
|
Where is |
This latest commit was simply applying section 1.8.1 of WRE: https://cran.r-project.org/doc/manuals/r-release/R-exts.html#C_002dlevel-messages I've added this step to the instructions above. I think this makes sense as a single commit, though it does still need touch-up |
|
@MichaelChirico I don't think we should be translating verbose messages, errors and warnings should be enough. |
| // starts can now be NA (<0): if (INTEGER(starts)[0]<0 || INTEGER(lens)[0]<0) error("starts[1]<0 or lens[1]<0"); | ||
| if (!isNull(jiscols) && LENGTH(order) && !LOGICAL(on)[0]) error("Internal error: jiscols not NULL but o__ has length"); // # nocov | ||
| if (!isNull(xjiscols) && LENGTH(order) && !LOGICAL(on)[0]) error("Internal error: xjiscols not NULL but o__ has length"); // # nocov | ||
| if(!isEnvironment(env)) error("’env’ should be an environment"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
angled quotes are non-ASCII, failed xgettext
* finish the Chinese message translation for group 17 * change as KingdaShi suggests Co-authored-by: Michael Chirico <[email protected]>
* message * message * added hongyuans review Co-authored-by: Michael Chirico <[email protected]>
* done some translation! * done some translation * messsage * messsage Co-authored-by: Michael Chirico <[email protected]>
| */ | ||
| void frollmeanFast(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool narm, int hasna, bool verbose) { | ||
| if (verbose) | ||
| snprintf(end(ans->message[0]), 500, "frollmeanFast: running for input length %"PRIu64", window %d, hasna %d, narm %d\n", (uint64_t)nx, k, hasna, (int)narm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Treat this more as a template so the message doesn't need to be translated redundantly (see elsewhere same file as well as frolladaptive)
* covering trivial translations from among the leftovers * progress on paring down leftover translations * completed translations!!! * adjustments
| } | ||
| tt = vapply_1i(byval,length) | ||
| if (any(tt!=xnrow)) stop("The items in the 'by' or 'keyby' list are length (",paste(tt,collapse=","),"). Each must be length ", xnrow, "; the same length as there are rows in x (after subsetting if i is provided).") | ||
| if (any(tt!=xnrow)) stop(gettextf("The items in the 'by' or 'keyby' list are length(s) (%s). Each must be length %d; the same length as there are rows in x (after subsetting if i is provided).", paste(tt, collapse=","), xnrow, domain='R-data.table')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converted some messages that are awkward to translate in chunks into one formatted message to be translated as one.
This should probably be done more broadly but it's pretty manual & Chinese matches grammar closely enough. If we ever support Japanese though, probably we'll need complete overhaul...
| # as opposed to DT[order(.)] where na.last=TRUE, to be consistent with base | ||
| { | ||
| if (!is.data.frame(x)) stop("x must be a data.frame or data.table.") | ||
| if (!is.data.frame(x)) stop("x must be a data.frame or data.table") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same message appears elsewhere without ., aligning means only one translation needed
|
Hey @mattdowle we are good to go! All translations completed 🎉 |
|
Under v1.12.8 tar.gz size was 4842680 bytes, just under the 5MB CRAN limit. The On next submission we'll see if this new |
|
I can't see a NEWS item. Maybe you can add one in a follow-up PR. I imagine it'll be quite a big item thanking all the translators too. It's a huge piece of work so maybe you could convey that somehow with some statistics about the number of messages translated at R and C level, how long it took, number of commits, number of files and lines touched, etc. |
| if (verbose) | ||
| snprintf(end(ans->message[0]), 500, "fadaptiverollmeanFast: running for input length %"PRIu64", hasna %d, narm %d\n", (uint64_t)nx, hasna, (int) narm); | ||
| snprintf(end(ans->message[0]), 500, _("%s: running for input length %"PRIu64", hasna %d, narm %d\n"), "fadaptiverollmeanFast", (uint64_t)nx, hasna, (int) narm); | ||
| bool truehasna = hasna>0; // flag to re-run if NAs detected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok understood re this change. Will be interesting if this passes Rtools ATC re PR #4073 and the discussion here: #4073 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran it through win-builder ATC, and yes it passes with no warnings. So this means it was something about the __fun__ and not the %s before PRIu64. I'll add a link to the previous discussion pointing here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome, glad to hear
|
Yep indeed I missed the NEWS item 🤦♂ will add a follow-up with tags for all the contributors as well as a summary. |
Something I've been meaning to figure out for years & finally figured out. The hard part (setting up the
podirectory) is done -- the rest is the manual work of actually translating strings.Before proceeding, I wanted to file this and get some feedback about whether it's worth to proceed.
I see benefits:
podirectory). This would makedata.tablea leader in accessibility and openness in yet another dimension for the R community.Costs:
R/andsrc/going forward. Hopefully minimal, but it's new territory -- we'd be one of the only major packages stress testing the tools R provides here. See steps below for a bit more detail.po/directory is included in the built package but it would be 1000s of lines if so. The compressed file added toinst/is definitely included but is only 500B as of now... for comparison, the entirety ofr-sourcehas about 6.3Mb of.mofiles (sum(file.info(list.files('~/github/r-source', pattern = '.mo$', recursive = TRUE, full.names = TRUE))$size)/1e6)... but I think 5Kb is a very conservative estimate of how big our file could get even with several other translations included.Any other missed considerations please file below and/or edit into this post.
cc @renkun-ken @shrektan @dracodoc -- if you're willing to put any time / effort into filling out some translations please signal so here as well 😄 If we move forward we can also post on Twitter where I know we have several more Chinese/Chinese diaspora followers.
The last consideration is -- why Chinese?
It's really about perceived utility & popularity. We can of course post a poll on Twitter before proceeding (or more objectively, does CRAN offer statistics on IPs downloading?); the other two choices would seem to be Spanish or Japanese.
For my own reference as much as anything, here are some notes/steps to getting the package to a portable state (from a package with no portability, i.e. no
podirectory). WRE has a few sections but I read them many times before figuring this all out (data.tablecan always be replaced by a generic other package below):tools::update_pkg_po('.'). This will create the initialpofolder and thepo/R-data.table.potfile which has extracts of all arguments tostop/error/messageused in theRdirectory.srcis a lot more manual -- any message that's to be translated needs to be wrapped to the macro function_(e.g.error("this error")is edited toerror(_("this error"))), then we runxgettext --keyword=_ -o data.table.pot *.c, which will create a file analogous toR-data.table.potfor C-level messages. Some useful command line stuff for this:.pofile inpo/corresponding to a target language, e.g.touch po/R-zh_CN.pofor Mandarin.R-means it corresponds to the R messages; see thepo/directories in e.g.r-source/src/library/{base,stats,utils,tools}for locales already in use bybase, or see the R manual for a more official documentation of these codes.PO-Revision-Date,Last-Translator,Language-Team,Languageand the blankmsgid&msgstrlines are necessary):Run
tools::update_pkg_po('.')again; this time, the captured messages and blank translations will be added to your.pofile.Add translations where the blank strings are in the
.po. Each string corresponds to an argument of e.g.stop, so e.g.stop("hi", "you", "broke", "it")inR/would create 4 entries in the.pot&.pofiles. This may make translation more difficult if the target language has sufficiently different grammar that the components can't be made to line up in order & make sense. I think the solution here is to usegettext/ngettextbut I'm not sure.Save & run
tools::update_pkg_po('.')again. This time, the.pofile will be processed and the translations will be recorded in the.mofile ininst/po.Reinstall
data.tableand runRin the correct domain. This can be done ad hoc by temporarily setting theLANGUAGEenv, e.g.LANGUAGE=zh_CN.UTF-8 R --no-savewill start R in Chinese without editing the locale on your machine more deeply. Then trigger a few of the messages to make sure the process was successful.NB - there was a bug in
toolsleadingupdate_pkg_poto fail that was finished (as of this writing) extremely recently. If you encounter this bug, either install bleeding-edgeR-develor simply assign the corrected version oftools:::en_quoteandtools::update_pkg_poto your.GlobalEnv(as well as a few more functions internal totools) & run from console instead of using thetoolsversion.I think it will also be helpful to add a note to the start-up message about where how to find help from non-English to English. One way is to provide a link to the
.pofiles about how to convert the error message to English before proceeding. Or perhaps numbering errors? I'm not sure the best way here.