Skip to content

Commit 8f1ca78

Browse files
authored
Merge pull request #86 from andrewdelman/cloud_compatibility
ecco_access pseudo-package for access modules
2 parents 3d02248 + 363bf81 commit 8f1ca78

33 files changed

+6325
-613
lines changed

Intro_to_PO_Tutorials/Geostrophic_balance.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@
108108
"\n",
109109
"### Download to your local machine\n",
110110
"\n",
111-
"This sub-section uses the `ecco_download.py` module to help get the datasets needed for this tutorial onto your local machine (laptop, university computing server, etc.) where you can use them offline. Download the module using [this link](https://raw.githubusercontent.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/master/ECCO-ACCESS/ecco_download.py), and then run the cells below.\n",
111+
"This sub-section uses the `ecco_download.py` module to help get the datasets needed for this tutorial onto your local machine (laptop, university computing server, etc.) where you can use them offline. Download the module using [this link](https://raw.githubusercontent.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/master/ecco_access/ecco_download.py), and then run the cells below.\n",
112112
"\n",
113113
"> Tip: If you are running these tutorials on an instance in the Amazon Web Services (AWS) cloud, skip these cells and instead run the next section, [Download to your AWS instance](https://ecco-v4-python-tutorial.readthedocs.io/Geostrophic_balance.html#Download-to-your-AWS-instance)."
114114
]
@@ -320,7 +320,7 @@
320320
"\n",
321321
"### Download to your AWS instance\n",
322322
"\n",
323-
"This sub-section uses the `ecco_s3_retrieve.py` module to download to your instance (or open remotely) the datasets you need for this tutorial. If you followed the setup instructions in the [AWS Cloud setup](https://ecco-v4-python-tutorial.readthedocs.io/AWS_Cloud_getting_started.html) tutorial you already have access to this module, or you can download it [here](https://raw.githubusercontent.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/master/ECCO-ACCESS/ecco_s3_retrieve.py). Let's query the syntax of the `ecco_podaac_s3_get_diskaware` function that we will use to download or access the files. This function first assesses the disk space available on your instance, and downloads them to your instance if there is sufficient space, or opens them remotely on S3 otherwise.\n",
323+
"This sub-section uses the `ecco_s3_retrieve.py` module to download to your instance (or open remotely) the datasets you need for this tutorial. If you followed the setup instructions in the [AWS Cloud setup](https://ecco-v4-python-tutorial.readthedocs.io/AWS_Cloud_getting_started.html) tutorial you already have access to this module, or you can download it [here](https://raw.githubusercontent.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/master/ecco_access/ecco_s3_retrieve.py). Let's query the syntax of the `ecco_podaac_s3_get_diskaware` function that we will use to download or access the files. This function first assesses the disk space available on your instance, and downloads them to your instance if there is sufficient space, or opens them remotely on S3 otherwise.\n",
324324
"\n",
325325
"> Tip: In future tutorials, you will see there is a boolean variable `incloud_access` that is usually set to `False` by default. If you set this variable to `True`, the `ecco_s3_retrieve.py` module will access the datasets on the cloud properly from your instance."
326326
]

Intro_to_PO_Tutorials/Steric_height.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@
7979
"- **ECCO_L4_SSH_LLC0090GRID_MONTHLY_V4R4** (Jan 2000)\n",
8080
"- **ECCO_L4_TEMP_SALINITY_LLC0090GRID_MONTHLY_V4R4** (Jan 2000)\n",
8181
"\n",
82-
"To download these datasets to your local machine, you can use the [ecco_download.py](https://raw.githubusercontent.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/master/ECCO-ACCESS/ecco_download.py) module. See the [previous tutorial](https://ecco-v4-python-tutorial.readthedocs.io/Geostrophic_balance.html#Download-the-ECCO-output) or the [ECCO download](https://ecco-v4-python-tutorial.readthedocs.io/Downloading_ECCO_Datasets_from_PODAAC_Python.html) for more info on how to use this module. If you are working on an AWS instance, just set `incloud_access = True` in the 2nd cell below and the datasets will be downloaded or opened remotely using the `ecco_s3_retrieve.py` module."
82+
"To download these datasets to your local machine, you can use the [ecco_download.py](https://raw.githubusercontent.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/master/ecco_access/ecco_download.py) module. See the [previous tutorial](https://ecco-v4-python-tutorial.readthedocs.io/Geostrophic_balance.html#Download-the-ECCO-output) or the [ECCO download](https://ecco-v4-python-tutorial.readthedocs.io/Downloading_ECCO_Datasets_from_PODAAC_Python.html) for more info on how to use this module. If you are working on an AWS instance, just set `incloud_access = True` in the 2nd cell below and the datasets will be downloaded or opened remotely using the `ecco_s3_retrieve.py` module."
8383
]
8484
},
8585
{

Intro_to_PO_Tutorials/Thermal_wind.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@
6363
"- **ECCO_L4_DENS_STRAT_PRESS_LLC0090GRID_MONTHLY_V4R4** (Jan 2000)\n",
6464
"- **ECCO_L4_GEOMETRY_LLC0090GRID_V4R4** (no time dimension; can use any time 1992-2017 in download functions)\n",
6565
"\n",
66-
"To download these datasets to your local machine, you can use the [ecco_download.py](https://raw.githubusercontent.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/master/ECCO-ACCESS/ecco_download.py) module. See the [previous tutorial](https://ecco-v4-python-tutorial.readthedocs.io/Geostrophic_balance.html#Download-the-ECCO-output) or the [ECCO download](https://ecco-v4-python-tutorial.readthedocs.io/Downloading_ECCO_Datasets_from_PODAAC_Python.html) for more info on how to use this module. If you are working on an AWS instance, just set `incloud_access = True` in the 2nd cell below and the datasets will be downloaded or opened remotely using the `ecco_s3_retrieve.py` module.\n",
66+
"To download these datasets to your local machine, you can use the [ecco_download.py](https://raw.githubusercontent.com/ECCO-GROUP/ECCO-v4-Python-Tutorial/master/ecco_access/ecco_download.py) module. See the [previous tutorial](https://ecco-v4-python-tutorial.readthedocs.io/Geostrophic_balance.html#Download-the-ECCO-output) or the [ECCO download](https://ecco-v4-python-tutorial.readthedocs.io/Downloading_ECCO_Datasets_from_PODAAC_Python.html) for more info on how to use this module. If you are working on an AWS instance, just set `incloud_access = True` in the 2nd cell below and the datasets will be downloaded or opened remotely using the `ecco_s3_retrieve.py` module.\n",
6767
"\n",
6868
"We are going to load the density dataset and plot density on longitude-depth axes following the line in the grid nearest to $26^{o}$ N, i.e., ```i = 73```."
6969
]
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
../ECCO-ACCESS/Cloud_access_to_ECCO_datasets/Tutorial_AWS_Cloud_getting_started.ipynb
1+
../ecco_access/Cloud_access_to_ECCO_datasets/Tutorial_AWS_Cloud_getting_started.ipynb
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
../ECCO-ACCESS/Downloading_ECCO_datasets_from_PODAAC/Tutorial_Python3_Jupyter_Notebook_Downloading_ECCO_Datasets_from_PODAAC.ipynb
1+
../ecco_access/Downloading_ECCO_datasets_from_PODAAC/Tutorial_Python3_Jupyter_Notebook_Downloading_ECCO_Datasets_from_PODAAC.ipynb
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
../ECCO-ACCESS/Downloading_ECCO_datasets_from_PODAAC/Tutorial_Python3_Downloading_ECCO_Subsets.ipynb
1+
../ecco_access/Downloading_ECCO_datasets_from_PODAAC/Tutorial_Python3_Downloading_ECCO_Subsets.ipynb

Tutorials_as_Jupyter_Notebooks/ECCO_v4_Heat_budget_closure.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2149,7 +2149,7 @@
21492149
"ds['DFrE_TH'] = ds.DFrE_TH.where(ecco_grid.hFacC.values > 0,0)\n",
21502150
"ds['DFrI_TH'] = ds.DFrI_TH.where(ecco_grid.hFacC.values > 0,0)\n",
21512151
"\n",
2152-
"# Load monthly averages of vertical diffusive fluxes\n",
2152+
"# tranpose dimensions\n",
21532153
"ds['DFrE_TH'] = ds.DFrE_TH.transpose('time','tile','k_l','j','i')\n",
21542154
"ds['DFrI_TH'] = ds.DFrI_TH.transpose('time','tile','k_l','j','i')\n",
21552155
"\n",
@@ -2925,7 +2925,7 @@
29252925
" {'time':str(curr_time_isel[0])+':'+str(curr_time_isel[-1]+1)},\\\n",
29262926
" function(ds,ecco_vars_int_pickled,vol,time_isel=curr_time_isel,k_isel=k_isel))\n",
29272927
" ds_to_write[varname].to_dataset().to_zarr(save_location,mode=\"a\")\n",
2928-
" ds_to_write.close() \n",
2928+
" ds_to_write.close()\n",
29292929
"\n",
29302930
"\n",
29312931
"# the zarr archive will occupy ~15 GB, so require 20 GB free storage as a buffer\n",

Tutorials_as_Jupyter_Notebooks/ECCO_v4_Memory_management.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1035,7 +1035,7 @@
10351035
"id": "1a84fa35-3bcd-4a59-a2a7-b07988f0aadd",
10361036
"metadata": {},
10371037
"source": [
1038-
"The same procedure did not help to free up the memory from `copy_array_3`, which was probably too interconnected with Dask since a Dask array was created from it. Usually this is not a problem for memory management since we are more likely to load Dask arrays from files, rather than creating them from NumPy arrays already in memory.\n",
1038+
"So we were able to recover the memory from `copy_array_3` as well.\n",
10391039
"\n",
10401040
"Let's take a look at the memory log and compare the case that did not involve Dask with the one that did."
10411041
]
@@ -1089,7 +1089,7 @@
10891089
"id": "6cc6978b-c31b-458b-a4e4-402588498f67",
10901090
"metadata": {},
10911091
"source": [
1092-
"With NumPy only (no Dask) we were able to recover essentially all of the memory we used during the computation. With Dask we achieved a net loss of ~10 MB at entry #4 on the right (just over the size of one of the arrays) before losing more memory when saving `copy_array_3` to disk.\n",
1092+
"So in both cases we were able to complete the calculation and recover the memory used (i.e., make it available again). The procedure was just a little more complicated when using Dask arrays.\n",
10931093
"\n",
10941094
"It is worth remembering that Dask can also significantly *help* you limit the memory usage in your workspace (as we saw earlier with a [reduction operation](#Dask-arrays-and-delayed-computations))...even though you relinquish some control over memory management when using it."
10951095
]
@@ -2240,7 +2240,7 @@
22402240
"id": "af74327d-83e3-4e3b-9ffe-6544c8134de7",
22412241
"metadata": {},
22422242
"source": [
2243-
"In the code above we did explicitly something that Dask does behind the scenes...looped through chunks so that our data loads are done in smaller pieces. But the change in memory is about the same as before, because by the end of the loop all of the top 25 depth levels of `THETA` have been cached. \n",
2243+
"In the code above we did explicitly something that Dask does behind the scenes...looped through chunks so that our data loads are done in smaller pieces. But the memory used is still a lot more than we should need for 1 depth level, because chunks were cached that include the top **25** depth levels of `THETA`.\n",
22442244
"\n",
22452245
"### A workaround to save workspace memory\n",
22462246
"\n",

0 commit comments

Comments
 (0)