Spatial Data Science Data Model¶
Installation¶
You can install SDS Data Model via pip from GitHub:
$ pip install git+https://github.com/Defra-Data-Science-Centre-of-Excellence/sds-data-model
However, if you’re installing within a Databricks notebook, you’ll need to include your Personal Access Token (PAT):
$ pip install git+https://<YOUR_PAT>@github.com/Defra-Data-Science-Centre-of-Excellence/sds-data-model
Aim¶
Please see the Aim for details.
Usage¶
Please see the Usage for details.
Local Development¶
To ensure compatibility with Databricks Runtime 10.4 LTS, this package was developed on a Linux machine running the Ubuntu 20.04 LTS operating system using Python3.8.10, GDAL 3.4.3, and spark 3.2.1..
Install Python 3.8.10 using pyenv¶
See the pyenv-installer’s Installation / Update / Uninstallation instructions.
Install Python 3.8.10 globally:
pyenv install 3.8.10
Then install it locally in the repository you’re using:
pyenv local 3.8.10
Install GDAL 3.4.3¶
Add the UbuntuGIS unstable Private Package Archive (PPA). and update your package list:
sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable \
 && sudo apt-get update
Install gdal 3.4.3, I found I also had to install python3-gdal (even though I’m going to use poetry to install it in a virtual environment later) to avoid version conflicts:
sudo apt-get install -y gdal-bin=3.4.3+dfsg-1~focal0 \
 libgdal-dev=3.4.3+dfsg-1~focal0 \
 python3-gdal=3.4.3+dfsg-1~focal0
Verify the installation:
ogrinfo --version
# GDAL 3.4.3, released 2022/04/22
Install poetry 1.2¶
See poetry’s osx / linux / bashonwindows install instructions.
Install Java¶
Java is required for Spark to work correctly. This guide details Java installation on Ubuntu.
The required commands are:
sudo apt install default-jre \
  default-jdk
Check that both the runtime environment (jre) and development kit (jdk) are installed:
java -version
# openjdk version "11.0.14" 2022-01-18
# OpenJDK Runtime Environment (build 11.0.14+9-Ubuntu-0ubuntu2)
# OpenJDK 64-Bit Server VM (build 11.0.14+9-Ubuntu-0ubuntu2, mixed mode, sharing)
javac -version
# javac 11.0.14
Clone this repository¶
git clone https://github.com/Defra-Data-Science-Centre-of-Excellence/sds-data-model.git
Install dependencies using poetry¶
poetry install
License¶
Distributed under the terms of the MIT license, SDS Data Model is free and open source software.
Issues¶
If you encounter any problems, please file an issue along with a detailed description.