Note
Not sure how to handle all the different types of datasets. What may work is to have a generic front page like this one simply listing the catalog and data types, but then have a separate pages for more details. We can autogenerate some from the data itself i.e. using pandas-profiling
or some simple pandas
commands. This should be extendable for other datasets to enable automatic generation.
raw
Name | Type | Path | Details |
---|---|---|---|
companies | pandas.CSVDataSet | data/01_raw/companies.csv | basic info, pandas profiling |
reviews | pandas.CSVDataSet | data/01_raw/reviews.csv | basic info, pandas profiling |
shuttles | pandas.ExcelDataSet | data/01_raw/shuttles.xlsx | basic info, pandas profiling |
intermediate
Name | Type | Path | Details |
---|---|---|---|
preprocessed_companies | pandas.ParquetDataSe | data/02_intermediate/preprocessed_companies.pq | |
preprocessed_shuttles | pandas.ParquetDataSet | data/02_intermediate/preprocessed_shuttles.pq |
primary
Name | Type | Path | Details |
---|---|---|---|
model_input_table | pandas.ParquetDataSet | data/03_primary/model_input_table.pq | basic info, pandas profiling |
models
Name | Type | Path | Details |
---|---|---|---|
active_modelling_pipeline.regressor | pickle.PickleDataSet | data/06_models/regressor_active.pickle | |
regressor_candidate.regressor | pickle.PickleDataSet | data/06_models/regressor_candidate.pickle |