Spatial Data Science Data Model

Installation

You can install SDS Data Model via pip from GitHub:

$ pip install git+https://github.com/Defra-Data-Science-Centre-of-Excellence/sds-data-model

However, if you’re installing within a Databricks notebook, you’ll need to include your Personal Access Token (PAT):

$ pip install git+https://<YOUR_PAT>@github.com/Defra-Data-Science-Centre-of-Excellence/sds-data-model

Aim

Please see the Aim for details.

Usage

Please see the Usage for details.

Local Development

To ensure compatibility with Databricks Runtime 10.4 LTS, this package was developed on a Linux machine running the Ubuntu 20.04 LTS operating system using Python3.8.10, GDAL 3.4.3, and spark 3.2.1..

Install Python 3.8.10 using pyenv

See the pyenv-installer’s Installation / Update / Uninstallation instructions.

Install Python 3.8.10 globally:

pyenv install 3.8.10

Then install it locally in the repository you’re using:

pyenv local 3.8.10

Install GDAL 3.4.3

Add the UbuntuGIS unstable Private Package Archive (PPA). and update your package list:

sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable \
 && sudo apt-get update

Install gdal 3.4.3, I found I also had to install python3-gdal (even though I’m going to use poetry to install it in a virtual environment later) to avoid version conflicts:

sudo apt-get install -y gdal-bin=3.4.3+dfsg-1~focal0 \
 libgdal-dev=3.4.3+dfsg-1~focal0 \
 python3-gdal=3.4.3+dfsg-1~focal0

Verify the installation:

ogrinfo --version
# GDAL 3.4.3, released 2022/04/22

Install poetry 1.2

See poetry’s osx / linux / bashonwindows install instructions.

Install Java

Java is required for Spark to work correctly. This guide details Java installation on Ubuntu.

The required commands are:

sudo apt install default-jre \
  default-jdk

Check that both the runtime environment (jre) and development kit (jdk) are installed:

java -version
# openjdk version "11.0.14" 2022-01-18
# OpenJDK Runtime Environment (build 11.0.14+9-Ubuntu-0ubuntu2)
# OpenJDK 64-Bit Server VM (build 11.0.14+9-Ubuntu-0ubuntu2, mixed mode, sharing)

javac -version
# javac 11.0.14

Clone this repository

git clone https://github.com/Defra-Data-Science-Centre-of-Excellence/sds-data-model.git

Install dependencies using poetry

poetry install

License

Distributed under the terms of the MIT license, SDS Data Model is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Indices and tables