Note

Not sure how to handle all the different types of datasets. What may work is to have a generic front page like this one simply listing the catalog and data types, but then have a separate pages for more details. We can autogenerate some from the data itself i.e. using pandas-profiling or some simple pandas commands. This should be extendable for other datasets to enable automatic generation.

raw

Name Type Path Details
companies pandas.CSVDataSet data/01_raw/companies.csv basic info, pandas profiling
reviews pandas.CSVDataSet data/01_raw/reviews.csv basic info, pandas profiling
shuttles pandas.ExcelDataSet data/01_raw/shuttles.xlsx basic info, pandas profiling

intermediate

Name Type Path Details
preprocessed_companies pandas.ParquetDataSe data/02_intermediate/preprocessed_companies.pq
preprocessed_shuttles pandas.ParquetDataSet data/02_intermediate/preprocessed_shuttles.pq

primary

Name Type Path Details
model_input_table pandas.ParquetDataSet data/03_primary/model_input_table.pq basic info, pandas profiling

models

Name Type Path Details
active_modelling_pipeline.regressor pickle.PickleDataSet data/06_models/regressor_active.pickle
regressor_candidate.regressor pickle.PickleDataSet data/06_models/regressor_candidate.pickle