Quick Start
Installation and usage
Data Integration
pip
We recommend using a python 3.8 virtual environment:
pip install pybind11
pip install frankgraphbench
Install the full dataset using bash scripts located at datasets/:
cd datasets
bash ml-100k.sh # Downloaded at `datasets/ml-100k` folder
bash ml-1m.sh # Downloaded at `datasets/ml-1m` folder
Usage:
data_integration [-h] -d DATASET -i INPUT_PATH -o OUTPUT_PATH [-ci] [-cu] [-cr] [-cs] [-map] [-enrich] [-w]
Arguments:
-h: Shows the help message.
-d: Name of a supported dataset. It will be the same name of the folder created by the bash script provided for the dataset. For now, check
data_integration/dataset2class.pyto see the supported ones.-i: Input path where the full dataset is placed.
-o: Output path where the integrated dataset will be placed.
-ci: Use this flag if you want to convert item data.
-cu: Use this flag if you want to convert user data.
-cr: Use this flag if you want to convert rating data.
-cs: Use this flag if you want to convert social link data.
-map: Use this flag if you want to map dataset items with DBpedia. At least the item data should be already converted.
-enrich: Use this flag if you want to enrich dataset with DBpedia.
-w: Choose the number of workers(threads) to be used for parallel queries.
Usage Example:
data_integration -d 'ml-100k' -i 'datasets/ml-100k' -o 'datasets/ml-100k/processed' \
-ci -cu -cr -map -enrich -w 8
source
Install the require packages using python virtualenv, using:
python3 -m venv venv_data_integration/
source venv_data_integration/bin/activate
pip3 install -r requirements_data_integration.txt
Install the full dataset using bash scripts located at datasets/:
cd datasets
bash ml-100k.sh # Downloaded at `datasets/ml-100k` folder
bash ml-1m.sh # Downloaded at `datasets/ml-1m` folder
Usage:
python3 src/data_integration.py [-h] -d DATASET -i INPUT_PATH -o OUTPUT_PATH [-ci] [-cu] [-cr] [-cs] [-map] [-enrich] [-w]
Arguments:
-h: Shows the help message.
-d: Name of a supported dataset. Will be the same name of the folder created by the bash script provided for the dataset. For now, check
data_integration/dataset2class.pyto see the supported ones.-i: Input path where the full dataset is placed.
-o: Output path where the integrated dataset will be placed.
-ci: Use this flag if you want to convert item data.
-cu: Use this flag if you want to convert user data.
-cr: Use this flag if you want to convert rating data.
-cs: Use this flag if you want to convert social link data.
-map: Use this flag if you want to map dataset items with DBpedia. At least the item data should be already converted.
-enrich: Use this flag if you want to enrich dataset with DBpedia.
-w: Choose the number of workers(threads) to be used for parallel queries.
Usage Example:
python3 src/data_integration.py -d 'ml-100k' -i 'datasets/ml-100k' -o 'datasets/ml-100k/processed' \
-ci -cu -cr -map -enrich -w 8
Check Makefile for more examples.
Framework
pip
For x86 windows and linux PCs we recommend using a python 3.8 virtual environment. For Apple Silicon we recommend a python 3.11 virtual environment and using the manual install with requirements_framework_apple.txt, however the deep_walk_based embedding method does not work in this version:
pip install pybind11
pip install frankgraphbench
Usage:
framework -c 'config_files/test.yml'
Arguments:
-c: Experiment configuration file path.
The experiment config file should be a .yaml file like this:
experiment:
dataset:
name: ml-100k
item:
path: datasets/ml-100k/processed/item.csv
extra_features: [movie_year, movie_title]
user:
path: datasets/ml-100k/processed/user.csv
extra_features: [gender, occupation]
ratings:
path: datasets/ml-100k/processed/rating.csv
timestamp: True
enrich:
map_path: datasets/ml-100k/processed/map.csv
enrich_path: datasets/ml-100k/processed/enriched.csv
remove_unmatched: False
properties:
- type: subject
grouped: True
sep: "::"
- type: director
grouped: True
sep: "::"
preprocess:
- method: filter_kcore
parameters:
k: 20
iterations: 1
target: user
split:
seed: 42
test:
method: k_fold
k: 2
level: 'user'
models:
- name: deepwalk_based
config:
save_weights: True
parameters:
walk_len: 10
p: 1.0
q: 1.0
n_walks: 50
embedding_size: 64
epochs: 1
evaluation:
k: 5
relevance_threshold: 3
metrics: [MAP, nDCG]
report:
file: 'experiment_results/ml100k_enriched/run1.csv'
See the config_files/ directory for more examples.
source
Install the require packages using python virtualenv, using:
python3 -m venv venv_framework/
source venv_framework/bin/activate
pip3 install -r requirements_framework.txt
Apple silicon:
python3 -m venv venv_framework/
source venv_framework/bin/activate
pip3 install -r requirements_framework_apple.txt
Usage:
python3 src/framework.py -c 'config_files/test.yml'
Arguments:
-c: Experiment configuration file path.
The experiment config file should be a .yaml file like this:
experiment:
dataset:
name: ml-100k
item:
path: datasets/ml-100k/processed/item.csv
extra_features: [movie_year, movie_title]
user:
path: datasets/ml-100k/processed/user.csv
extra_features: [gender, occupation]
ratings:
path: datasets/ml-100k/processed/rating.csv
timestamp: True
enrich:
map_path: datasets/ml-100k/processed/map.csv
enrich_path: datasets/ml-100k/processed/enriched.csv
remove_unmatched: False
properties:
- type: subject
grouped: True
sep: "::"
- type: director
grouped: True
sep: "::"
preprocess:
- method: filter_kcore
parameters:
k: 20
iterations: 1
target: user
split:
seed: 42
test:
method: k_fold
k: 2
level: 'user'
models:
- name: deepwalk_based
config:
save_weights: True
parameters:
walk_len: 10
p: 1.0
q: 1.0
n_walks: 50
embedding_size: 64
epochs: 1
evaluation:
k: 5
relevance_threshold: 3
metrics: [MAP, nDCG]
report:
file: 'experiment_results/ml100k_enriched/run1.csv'
Check config_files/ directory for more examples.
Chart generation
Chart generation module based on: https://github.com/hfawaz/cd-diagram
pip
We recommend using a python 3.8 virtual environment
pip install pybind11
pip install frankgraphbench
After obtaining results from some experiments
Usage:
chart_generation [-h] -c CHART -p PERFORMANCE_METRIC -f INPUT_FILES -i INPUT_PATH -o OUTPUT_PATH -n FILE_NAME
Arguments:
-h: Shows the help message.
-p: Name of the performance metric within the file to use for chart generation.
-f: List of .csv files to use for generating the chart.
-i: Path where results data to generate chart is located in .csv files.
-o: Path where generated charts will be placed.
-n: Add a name (and file extension) to the chart that will be generated.
Usage Example:
chart_generation -c 'cd-diagram' -p 'MAP@5' -f "['ml-100k.csv', 'ml-1m.csv', 'lastfm.csv', 'ml-100k_enriched.csv', 'ml-1m_enriched.csv', 'lastfm_enriched.csv']" -i 'experiment_results' -o 'charts' -n 'MAP@5.pdf'
source
Install the required packages using python virtualenv, using:
python3 -m venv venv_chart_generation/
source venv_chart_generation/bin/activate
pip3 install -r requirements_chart_generation.txt
After obtaining results from some experiments
Usage:
python3 src/chart_generation.py [-h] -c CHART -p PERFORMANCE_METRIC -f INPUT_FILES -i INPUT_PATH -o OUTPUT_PATH -n FILE_NAME
Arguments:
-h: Shows the help message.
-p: Name of the performance metric within the file to use for chart generation.
-f: List of .csv files to use for generating the chart.
-i: Path where results data to generate chart is located in .csv files.
-o: Path where generated charts will be placed.
-n: Add a name (and file extension) to the chart that will be generated.
Usage Example:
python3 src/chart_generation.py -c 'cd-diagram' -p 'MAP@5' -f "['ml-100k.csv', 'ml-1m.csv', 'lastfm.csv', 'ml-100k_enriched.csv', 'ml-1m_enriched.csv', 'lastfm_enriched.csv']" -i 'experiment_results' -o 'charts' -n 'MAP@5.pdf'