-
Notifications
You must be signed in to change notification settings - Fork 60
download subtitles and presentations and handle groups #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Awesome! Thanks for the great PR with these fixes and functionalities! I've added some comments on something to fix and some clarification needed on some parts. |
|
Is this being merged? |
| cookie["name"]: cookie["value"] for cookie in self._driver.get_cookies() | ||
| } | ||
| response = requests.get( | ||
| "https://echo360.net.au/user/enrollments", cookies=cookies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hardcoding the hostname. Should this comes from the user provided link?
| self._output_dir, "{0}".format(self._course.nice_name).strip() | ||
| ) | ||
| # replace invalid character for folder | ||
| self.regex_replace_invalid.sub("_", self._output_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this still be applied? to avoid invalid chars
| # no information about the course. | ||
| for v in self.course_data["data"]: | ||
| try: | ||
| self._course_name = v["lesson"]["video"]["published"]["courseName"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the enrollment methods always works for all institution? Should it be used as a fallback (one way or the other) instead?
| for ch in illegal_chars: | ||
| path = path.replace(ch, "_") | ||
|
|
||
| reserved_names = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what these means? what's the use of f_?
|
|
||
| def strip_illegal_path(path: str) -> str: | ||
| illegal_chars = '<>:"/\\|?*' + "".join(chr(c) for c in range(0, 32)) | ||
| for ch in illegal_chars: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't a regex be suitable for this?
| def __init__(self, video_json, driver, hostname, alternative_feeds, subtitles): | ||
| self.hostname = hostname | ||
| self._driver = driver | ||
| self._path_prefix = video_json["path_prefix"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this new dict item is set in line 212 inside an If branch. Does this always exists?
| else: | ||
| response = requests.get(media_url, cookies=cookies) | ||
| if response.status_code == 200: | ||
| print("> Downloading media {}...".format(media_filename)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the content has already been fully downloaded after the requests.get(..) in line 330. Either (i) move this print statement to be before the GET request, or (ii) use a HEAD request (and if 200) followed by a GET instead.
| head = requests.head(vtt_url, cookies=cookies) | ||
| if head.status_code == 200: | ||
| print( | ||
| f"Original subtitle name: {head.headers['Content-Disposition']}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does Original mean?
| return path | ||
|
|
||
|
|
||
| PERSISTENT_SESSION_FOLDER = "_browser_persistent_session" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep the constant near the top of the file?
|
Hey sorry not sure why github needs extra action to submit the review; the comments had stuck in limbo since Feb, thanks for pinging me. There's some changes needed, eg, some url / links are hard coded and will not work for other people in general. (e.g. Hostname in RP is using echo360.net.au) |
Apologies for putting multiple features in one PR, I was modifying it for my courses while scraping them so they were all added spontaneously.
Happy to split into multiple PRs, though I won't be able to for a while due to IRL stuff.
Features:
--subtitlesflag is passed and only downloaded if they do not exist (Resolves transcript download? #77)Fixes: