Skip to content

Used Datasets and packages

Datasets

wget "https://gutenberg.org/cache/epub/2701/pg2701.txt"
  • Tennis WTA matches can be downloaded from the github repository

wget "https://github.com/JeffSackmann/tennis_wta/archive/refs/heads/master.zip"
unzip master.zip
The dataset contains .csv files with WTA matches from 1968 until 2023.

  • Iris Flowers Dataset can be downloaded from many sources, in this tutorial I used one from Kaggle

Requirements

  • python3
  • mrjob pip install mrjob
  • pyspark pip install pyspark To use pyspark you need previosly installed java.
  • networkx pip install networkx.
  • matplotlib pip install matplotlib
  • pandas pip install pandas

Literature