Datacamp

Datacamp provide courses on all aspects of data science. Beginner coding courses, introductions to new technologies, advanced coding and non-coding courses aimed at Data leaders, decision makers and managers. Courses are video and texted based with quizzes to check knowledge. There are non-coding courses available while those that do, include a coding environment. That means there is no need to set up this on your own.

Licences for Datacamp are avaible for DEFRA staff (including ALBs) from this link.
DataCamp sign up link

Training for beginners

If you are completely new to coding, you can get an overview of different coding languages through the Government Analysis Function here.

If you have not coded before, there are lots of resources you can use to get you started. We recommend you take a look at courses from the Government Analysis Function. These courses are open to analysts and scientists working in government.

R and RStudio training

There is a whole host of introductory courses for R and RStudio including:

Python training

Just like for R, the Government Analysis Function offer lots of courses in Python:

  • Introduction to Python
    • Perform basic coding
    • Import & export data
    • Manipulate basic Data
    • Perform a linear regression
    • Basic visualisation
  • Editing & imputation in Python
    • Identifying missing values, visualising the missing value & finding duplicates in data.
    • Auto editing where you can apply restrictions, check which restrictions have been violated and correct them.
    • Impute missing values using model-based methods such as mean, meadian, ratio & regression imputation.
    • Impute missing values using donor-based imputations such as: random hot-deck, sequential hot-deck, hierarchical hot-deck & K-nearest neighbours imputation.
  • Python control flow loops & functions
    • Use For & While loops
    • Use If, Elif & Else Control Flow
    • Write your own basic functions
    • Awareness of how functions can be applied to DataFrames

SQL Training

The Government Analysis Function offers just the one course for SQL:

  • Foundations of SQL. This course will give you experience using the SQL methods of database querying, manipulation and editing. The course uses the SQLite flavour to give a solid foundation in the principles of relational databases and how to use them.
    • Query databases
    • Join tables
    • Edit tables
    • Manipulate databases

More advanced training

Government Analysis Function courses

The Government Analysis Function also offers more advanced courses, for example:

Geospatial courses

There are lots of resources for undertaking geospatial analysis in Python or R. The DASH Platform has a geospatial cluster which is optimised for running geospatial analysis in Python, therefore making for much faster running of code. Here are some examples of available courses:

Resouces for good practice

These government resources are relevant for good practice of writing code and undertaking analysis.

  • Quality Assurance of Code for Analysis and Research
    • This guide is about quality assurance of code in government
    • This includes core programming practices, structuring your project, code documentation, project documentation, version control, configuration, dData management, peer review, and testing code
  • Reproducible Analytical Pipeline learning materials from the Data Science Campus
    • These training materials cover Reproducible Analytical Pipelines (RAP)
    • Includes introduction to RAP, using the RMySQL R package, git and git with R presentations and glossaries, tutorial on creating GANTT charts in R, and a template for using RMarkdown to create word documents
  • Reproducible Analytical Pipeline companion
    • More on Reproducible Analytical Pipelines
    • Includes chapters on version control, packaging code, unit testing, automated testing, code coverage, dependency and reproducibility, quality assurance of the pipeline, and producing the publication
  • Spark at the ONS
    • For readers familiar with Python or R
    • If you need to process big data using PySpark in Python or sparklyr in R
    • Python and R code cells throughout the book to help explain the topics covered