Data overview

Note

Not sure how to handle all the different types of datasets. What may work is to have a generic front page like this one simply listing the catalog and data types, but then have a separate pages for more details. We can autogenerate some from the data itself i.e. using pandas-profiling or some simple pandas commands. This should be extendable for other datasets to enable automatic generation.

raw

Name	Type	Path	Details
companies	pandas.CSVDataSet	data/01_raw/companies.csv	basic info, pandas profiling
reviews	pandas.CSVDataSet	data/01_raw/reviews.csv	basic info, pandas profiling
shuttles	pandas.ExcelDataSet	data/01_raw/shuttles.xlsx	basic info, pandas profiling

intermediate

Name	Type	Path	Details
preprocessed_companies	pandas.ParquetDataSe	data/02_intermediate/preprocessed_companies.pq
preprocessed_shuttles	pandas.ParquetDataSet	data/02_intermediate/preprocessed_shuttles.pq

primary

Name	Type	Path	Details
model_input_table	pandas.ParquetDataSet	data/03_primary/model_input_table.pq	basic info, pandas profiling

models

Name	Type	Path	Details
active_modelling_pipeline.regressor	pickle.PickleDataSet	data/06_models/regressor_active.pickle
regressor_candidate.regressor	pickle.PickleDataSet	data/06_models/regressor_candidate.pickle