Skip to content

Macbook Pro Setup for Data Science at EnvReportBC

Andy Teucher edited this page Jul 22, 2019 · 13 revisions

Instructions for setting up a MacBook Pro in bcgov, tailored for data science workflows used by the Environmental Reporting BC team.

Software:

Point & Click:

  • R
  • RStudio
  • Sublime Text (licence fee, however any text editor can be used, including VSC)
    • Packages (cmd+shift+p; MarkdownEditing, Markdown Preview, Gutter Color)
    • Add the subl command-line tool by running this in your Terminal:
      • ln -s /Applications/Sublime\ Text.app/Contents/SharedSupport/bin/subl /usr/local/bin/
  • Visual Studio Code
    • Extensions (Docker, Python, R, Git History)
  • Inkscape
  • Docker
  • Trello*
  • Meld - visual file diff tool (alternative to WinMerge)
  • Browsers (Chrome & Firefox)
  • VirtualBox (can also be done via brew cask install virtualbox in the Terminal)
    • Windows IE/Edge VMs can be downloaded here
    • For testing websites on Windows
  • Visual Git tool (SourceTree or Git Kraken)
  • Skype for Business
  • Cisco AnyConnect VPN - Remote Access User Guide and download link for AnyConnect on the Remote Access Services Page

* Available from the Mac App Store

Software Set-Up & Tweaks

Changing IDIR password

  • You will receive a reminder from the system when the time to change your IDIR password is nearing.
  • Change your IDIR password at: https://www.pwchange.gov.bc.ca
  • It is a good idea to change your Mac administrator password to match your IDIR password at the same time: System Preferences -> Users and Groups -> [Your login] -> Password
  • You will also need to change it in Outlook. Preferences -> Accounts

Safari:

  • Safari > Preferences > Advanced > Show full website address, Show Develop menu
  • Safari > View > Show status bar

Finder:

  • Show dotfiles: Cmd + Shift + . OR make them show all the time
  • Prevent ._DStore files from being created on network shares:
    • defaults write com.apple.desktopservices DSDontWriteNetworkStores true

Alfred:

The Alfred App is super handy for quickly opening files/folders https://www.alfredapp.com/. You can see a demo and how to tailor the App in this YouTube video here.

Terminal:

Enable New Terminal at Folder: System Preferences > Keyboard > Shortcuts > Services (from here)

Git: Install and configure git

git --version # Prompts install of OSX commandline tools, including git
git config --global user.email [email protected]
git config --global user.name "Firstname Lastname"
git config --global core.editor "nano"
git config --global credential.helper osxkeychain

GitHub: Set-up GitHub authorization

Go to GitHub and create a new Personal Access Token with repo, user, and gist access. Copy the key to your clipboard. Now clone a repo from GitHub that you have access to, make a small change, commit and push. Enter your username, and paste the key in the password field. It should now be cached for future operations. More info here.

For connecting to GitHub through R/Rstudio

Go to GitHub and create a new Personal Access Token with repo, user, and gist access. Copy the key to your .Renviron file (which you can access using usethis::edit_r_environ()) like this: GITHUB_PAT=your_token

Install Homebrew

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Homebrew Permissions issue:

https://stackoverflow.com/questions/16432071/how-to-fix-homebrew-permissions/16450503#16450503

Homebrew permissions issue fix has changed with High Sierra and Mojave https://github.com/Homebrew/brew/issues/3228

High Sierra:

sudo chown -R $(whoami) /usr/local 

Mojave:

sudo chown -R $(whoami) $(brew --prefix)/*

And another fix for after upgrading to Mojave https://tommcfarlin.com/reinstalling-git-on-mojave/

xcode-select --install

Install a bunch of data-sciency tools

mostly taken from Bob Rudis' blog

brew tap caskroom/cask
 
brew install Caskroom/cask/xquartz
brew cask install java

# Tell R where Java is
R CMD javareconf JAVA_CPPFLAGS=-I/System/Library/Frameworks/JavaVM.framework/Headers
 
# For latex:  (takes a long time)
brew cask install mactex

# For pandoc citeproc (https://github.com/jgm/pandoc-citeproc)
brew install pandoc-citeproc

# Some libraries for common R packages
brew install libsvg curl libxml2 boost pandoc [email protected]

# R spatial stuff (sf etc. takes a long time) - only needed if want to build sf from source
brew install pkg-config
brew install udunits
brew install gdal

# Install ESRI File GDB drivers so can write to .gdb
# requires you to add `export GDAL_DRIVER_PATH=/usr/local/lib/gdalplugins` to your PATH
brew tap osgeo/osgeo4mac && brew tap --repair
brew install osgeo-gdal-filegdb

## Fira-code is a lovely coding font
brew tap caskroom/fonts
brew cask install font-fira-code

## imagemagick for command-line image manipulation and use of the 'magick' R package
brew install imagemagick

## mapshaper for editing Shapefile, GeoJSON, TopoJSON, CSV and several other data formats, written in JavaScript. 
## We use `rmapshaper` for most task but `rmapshaper` has a size limit.

brew install node
npm install -g mapshaper

OpenShift: Install OpenShift command-line tools:

brew install openshift-cli

Homebrew maintenance:

To update homebrew and check what packages are available to update:

brew update

To upgrade packages:

brew upgrade

To cleanup

brew cleanup

To help diagnose problems:

brew doctor

To do it all at once (this could take a while if big packages are updated)

brew update && brew upgrade && brew cleanup

R

Setup local R library location

cd && mkdir Rlibrary
echo R_LIBS=~/Rlibrary >> .Renviron

Set a PAT at github and set GITHUB_PAT in .Renviron (for use in devtools etc)

cd && echo "GITHUB_PAT=<generated_github_pat>" >> .Renviron

Install R build tools:

Java

  • Download and install the Java 8 SDK
  • In the terminal, type: sudo R CMD javareconf
  • In R, type: `install.packages("rJava", type = "source")

Homebrew Python 3:

brew install python

QGIS: Install QGIS

brew tap osgeo/osgeo4mac && brew tap --repair
brew install osgeo-qgis

It will likely complain that it needs addition Python packages at runtime. These can be installed with:

pip install psycopg2 matplotlib pyparsing requests future jinja2 pygments

Some shell/git tweaks

Add ~/bin to your path as a place to hold custom shell scripts, set editor to nano instead of vi:

cat << EOF >> $HOME/.bashrc
# Set editor to nano instead of vi
export EDITOR="nano"
# put custom shell scripts etc in ~/bin dir, make sure available on PATH:
export PATH=$HOME/bin:$PATH"
EOF

cat << EOF >> $HOME/.bash_profile
if [ -f ~/.bashrc ]; then
   source ~/.bashrc
fi
EOF

PostgreSQL/PostGIS set-up:

Install

brew install postgresql
brew install postgis

Copy and paste (and hit return) the following script, which will set environment variables/create aliases that are used often for postgis stuff:

cat <<EOF >> $HOME/.bashrc
# save pg typing
export PGDATA=/usr/local/var/postgres
export PGHOST=localhost
export PGPORT=5432
# export PGDATABASE=postgis
export PGUSER=postgres

# Fiona wants PROJ_LIB set
export PROJ_LIB="/usr/local/share/proj"

alias pg_start='pg_ctl -D $PGDATA -l $PGDATA/server.log start'
alias pg_stop='pg_ctl -D $PGDATA stop -s -m fast'

# Add path to custom gdal drivers (ESRI File GDB)
export GDAL_DRIVER_PATH=/usr/local/lib/gdalplugins
EOF

Tune your postgresql installation:

These settings work well on a 2016 Macbook Pro with 16GB of memory. For more information on tuning, see the postgresql tuning wiki.

Copy and paste this script in the terminal:

cat << EOF > /usr/local/var/postgres/pgtune.conf
log_timezone = 'Canada/Pacific'
datestyle = 'iso, mdy'
timezone = 'Canada/Pacific'
lc_messages = 'en_US.UTF-8'      # locale for system error message
lc_monetary = 'en_US.UTF-8'      # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'      # locale for number formatting
lc_time = 'en_US.UTF-8'        # locale for time formatting
default_text_search_config = 'pg_catalog.english'
default_statistics_target = 100
log_min_duration_statement = 2000

max_connections = 100
max_locks_per_transaction = 64
dynamic_shared_memory_type = posix
checkpoint_timeout = 30min    # range 30s-1d
maintenance_work_mem = 2GB
effective_cache_size = 8GB
work_mem = 500MB
max_wal_size = 10GB
wal_buffers = 16MB
shared_buffers = 4GB # Min 128KB
EOF

cat << EOF >> /usr/local/var/postgres/postgresql.conf
# Include custom settings:
include = 'pgtune.conf'
EOF

And then this, to give more system memory (You will be prompted for your password):

sudo bash -c 'cat > /etc/sysctl.conf' << EOF
kern.sysv.shmmax=17179869184
kern.sysv.shmmin=1
kern.sysv.shmmni=32
kern.sysv.shmseg=8
kern.sysv.shmall=4194304
kern.maxprocperuid=512
kern.maxproc=2048
EOF

Then back in the terminal:

initdb -E utf8 -U postgres -W
# at the prompt, set superuser password to 'postgres'
pg_start # start the database (can stop any time with pg_stop)

Sources: