Overview

Dataset statistics

Number of variables10
Number of observations77096
Missing cells168060
Missing cells (%)21.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.9 MiB
Average record size in memory80.0 B

Variable types

Numeric10

Alerts

review_scores_rating is highly overall correlated with review_scores_comfort and 5 other fieldsHigh correlation
review_scores_comfort is highly overall correlated with review_scores_rating and 5 other fieldsHigh correlation
review_scores_amenities is highly overall correlated with review_scores_rating and 4 other fieldsHigh correlation
review_scores_trip is highly overall correlated with review_scores_rating and 4 other fieldsHigh correlation
review_scores_crew is highly overall correlated with review_scores_rating and 5 other fieldsHigh correlation
review_scores_price is highly overall correlated with review_scores_rating and 5 other fieldsHigh correlation
number_of_reviews is highly overall correlated with reviews_per_monthHigh correlation
reviews_per_month is highly overall correlated with number_of_reviewsHigh correlation
review_scores_location is highly overall correlated with review_scores_rating and 3 other fieldsHigh correlation
review_scores_rating has 21140 (27.4%) missing valuesMissing
review_scores_comfort has 21200 (27.5%) missing valuesMissing
review_scores_amenities has 21187 (27.5%) missing valuesMissing
review_scores_trip has 21263 (27.6%) missing valuesMissing
review_scores_crew has 21194 (27.5%) missing valuesMissing
review_scores_location has 21265 (27.6%) missing valuesMissing
review_scores_price has 21268 (27.6%) missing valuesMissing
reviews_per_month has 19543 (25.3%) missing valuesMissing
shuttle_id is uniformly distributedUniform
shuttle_id has unique valuesUnique
number_of_reviews has 19400 (25.2%) zerosZeros

Reproduction

Analysis started2022-11-24 10:38:24.288959
Analysis finished2022-11-24 10:38:47.073557
Duration22.78 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

shuttle_id
Real number (ℝ)

UNIFORM
UNIQUE

Distinct77096
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38548.5
Minimum1
Maximum77096
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size602.4 KiB
2022-11-24T10:38:47.274770image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3855.75
Q119274.75
median38548.5
Q357822.25
95-th percentile73241.25
Maximum77096
Range77095
Interquartile range (IQR)38547.5

Descriptive statistics

Standard deviation22255.843
Coefficient of variation (CV)0.57734652
Kurtosis-1.2
Mean38548.5
Median Absolute Deviation (MAD)19274
Skewness1.7649979 × 10-18
Sum2.9719352 × 109
Variance4.9532253 × 108
MonotonicityNot monotonic
2022-11-24T10:38:47.485816image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
63561 1
 
< 0.1%
25124 1
 
< 0.1%
44449 1
 
< 0.1%
4748 1
 
< 0.1%
30795 1
 
< 0.1%
8614 1
 
< 0.1%
62491 1
 
< 0.1%
40429 1
 
< 0.1%
20507 1
 
< 0.1%
24295 1
 
< 0.1%
Other values (77086) 77086
> 99.9%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
77096 1
< 0.1%
77095 1
< 0.1%
77094 1
< 0.1%
77093 1
< 0.1%
77092 1
< 0.1%
77091 1
< 0.1%
77090 1
< 0.1%
77089 1
< 0.1%
77088 1
< 0.1%
77087 1
< 0.1%

review_scores_rating
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct55
Distinct (%)0.1%
Missing21140
Missing (%)27.4%
Infinite0
Infinite (%)0.0%
Mean92.760473
Minimum20
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size602.4 KiB
2022-11-24T10:38:47.692354image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile78
Q190
median96
Q3100
95-th percentile100
Maximum100
Range80
Interquartile range (IQR)10

Descriptive statistics

Standard deviation9.7597506
Coefficient of variation (CV)0.10521454
Kurtosis15.005726
Mean92.760473
Median Absolute Deviation (MAD)4
Skewness-3.0846607
Sum5190505
Variance95.252732
MonotonicityNot monotonic
2022-11-24T10:38:48.129873image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100 16274
21.1%
98 3536
 
4.6%
97 3341
 
4.3%
93 3291
 
4.3%
96 3278
 
4.3%
90 3064
 
4.0%
95 3042
 
3.9%
80 2981
 
3.9%
94 2080
 
2.7%
99 2034
 
2.6%
Other values (45) 13035
16.9%
(Missing) 21140
27.4%
ValueCountFrequency (%)
20 212
0.3%
27 1
 
< 0.1%
30 6
 
< 0.1%
33 1
 
< 0.1%
40 176
0.2%
45 1
 
< 0.1%
47 14
 
< 0.1%
48 2
 
< 0.1%
50 74
 
0.1%
52 3
 
< 0.1%
ValueCountFrequency (%)
100 16274
21.1%
99 2034
 
2.6%
98 3536
 
4.6%
97 3341
 
4.3%
96 3278
 
4.3%
95 3042
 
3.9%
94 2080
 
2.7%
93 3291
 
4.3%
92 1702
 
2.2%
91 1502
 
1.9%

review_scores_comfort
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct9
Distinct (%)< 0.1%
Missing21200
Missing (%)27.5%
Infinite0
Infinite (%)0.0%
Mean9.517461
Minimum2
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size602.4 KiB
2022-11-24T10:38:48.368557image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile8
Q19
median10
Q310
95-th percentile10
Maximum10
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.95591115
Coefficient of variation (CV)0.10043762
Kurtosis19.913285
Mean9.517461
Median Absolute Deviation (MAD)0
Skewness-3.6651314
Sum531988
Variance0.91376613
MonotonicityNot monotonic
2022-11-24T10:38:48.600971image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
10 38081
49.4%
9 12759
 
16.5%
8 3348
 
4.3%
7 648
 
0.8%
6 586
 
0.8%
2 236
 
0.3%
4 135
 
0.2%
5 95
 
0.1%
3 8
 
< 0.1%
(Missing) 21200
27.5%
ValueCountFrequency (%)
2 236
 
0.3%
3 8
 
< 0.1%
4 135
 
0.2%
5 95
 
0.1%
6 586
 
0.8%
7 648
 
0.8%
8 3348
 
4.3%
9 12759
 
16.5%
10 38081
49.4%
ValueCountFrequency (%)
10 38081
49.4%
9 12759
 
16.5%
8 3348
 
4.3%
7 648
 
0.8%
6 586
 
0.8%
5 95
 
0.1%
4 135
 
0.2%
3 8
 
< 0.1%
2 236
 
0.3%

review_scores_amenities
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct9
Distinct (%)< 0.1%
Missing21187
Missing (%)27.5%
Infinite0
Infinite (%)0.0%
Mean9.2805094
Minimum2
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size602.4 KiB
2022-11-24T10:38:48.764977image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile7
Q19
median10
Q310
95-th percentile10
Maximum10
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.1220804
Coefficient of variation (CV)0.1209072
Kurtosis10.852266
Mean9.2805094
Median Absolute Deviation (MAD)0
Skewness-2.7028513
Sum518864
Variance1.2590644
MonotonicityNot monotonic
2022-11-24T10:38:48.897985image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
10 31261
40.5%
9 15734
20.4%
8 5785
 
7.5%
7 1427
 
1.9%
6 979
 
1.3%
2 272
 
0.4%
4 258
 
0.3%
5 175
 
0.2%
3 18
 
< 0.1%
(Missing) 21187
27.5%
ValueCountFrequency (%)
2 272
 
0.4%
3 18
 
< 0.1%
4 258
 
0.3%
5 175
 
0.2%
6 979
 
1.3%
7 1427
 
1.9%
8 5785
 
7.5%
9 15734
20.4%
10 31261
40.5%
ValueCountFrequency (%)
10 31261
40.5%
9 15734
20.4%
8 5785
 
7.5%
7 1427
 
1.9%
6 979
 
1.3%
5 175
 
0.2%
4 258
 
0.3%
3 18
 
< 0.1%
2 272
 
0.4%

review_scores_trip
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct9
Distinct (%)< 0.1%
Missing21263
Missing (%)27.6%
Infinite0
Infinite (%)0.0%
Mean9.6399441
Minimum2
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size602.4 KiB
2022-11-24T10:38:49.086976image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile8
Q110
median10
Q310
95-th percentile10
Maximum10
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.86514388
Coefficient of variation (CV)0.089745736
Kurtosis26.937725
Mean9.6399441
Median Absolute Deviation (MAD)0
Skewness-4.3361867
Sum538227
Variance0.74847393
MonotonicityNot monotonic
2022-11-24T10:38:49.197846image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
10 42622
55.3%
9 9604
 
12.5%
8 2241
 
2.9%
7 532
 
0.7%
6 451
 
0.6%
2 191
 
0.2%
4 117
 
0.2%
5 69
 
0.1%
3 6
 
< 0.1%
(Missing) 21263
27.6%
ValueCountFrequency (%)
2 191
 
0.2%
3 6
 
< 0.1%
4 117
 
0.2%
5 69
 
0.1%
6 451
 
0.6%
7 532
 
0.7%
8 2241
 
2.9%
9 9604
 
12.5%
10 42622
55.3%
ValueCountFrequency (%)
10 42622
55.3%
9 9604
 
12.5%
8 2241
 
2.9%
7 532
 
0.7%
6 451
 
0.6%
5 69
 
0.1%
4 117
 
0.2%
3 6
 
< 0.1%
2 191
 
0.2%

review_scores_crew
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct9
Distinct (%)< 0.1%
Missing21194
Missing (%)27.5%
Infinite0
Infinite (%)0.0%
Mean9.6748775
Minimum2
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size602.4 KiB
2022-11-24T10:38:49.319528image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile8
Q110
median10
Q310
95-th percentile10
Maximum10
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.84564741
Coefficient of variation (CV)0.087406525
Kurtosis30.450189
Mean9.6748775
Median Absolute Deviation (MAD)0
Skewness-4.6564075
Sum540845
Variance0.71511955
MonotonicityNot monotonic
2022-11-24T10:38:49.435465image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
10 44179
57.3%
9 8405
 
10.9%
8 2060
 
2.7%
7 466
 
0.6%
6 419
 
0.5%
2 197
 
0.3%
4 106
 
0.1%
5 63
 
0.1%
3 7
 
< 0.1%
(Missing) 21194
27.5%
ValueCountFrequency (%)
2 197
 
0.3%
3 7
 
< 0.1%
4 106
 
0.1%
5 63
 
0.1%
6 419
 
0.5%
7 466
 
0.6%
8 2060
 
2.7%
9 8405
 
10.9%
10 44179
57.3%
ValueCountFrequency (%)
10 44179
57.3%
9 8405
 
10.9%
8 2060
 
2.7%
7 466
 
0.6%
6 419
 
0.5%
5 63
 
0.1%
4 106
 
0.1%
3 7
 
< 0.1%
2 197
 
0.3%

review_scores_location
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct8
Distinct (%)< 0.1%
Missing21265
Missing (%)27.6%
Infinite0
Infinite (%)0.0%
Mean9.4703122
Minimum2
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size602.4 KiB
2022-11-24T10:38:49.563664image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile8
Q19
median10
Q310
95-th percentile10
Maximum10
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.83945496
Coefficient of variation (CV)0.088640685
Kurtosis15.713622
Mean9.4703122
Median Absolute Deviation (MAD)0
Skewness-2.911843
Sum528737
Variance0.70468462
MonotonicityNot monotonic
2022-11-24T10:38:49.967201image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
10 33853
43.9%
9 16873
21.9%
8 3931
 
5.1%
6 481
 
0.6%
7 472
 
0.6%
2 103
 
0.1%
4 84
 
0.1%
5 34
 
< 0.1%
(Missing) 21265
27.6%
ValueCountFrequency (%)
2 103
 
0.1%
4 84
 
0.1%
5 34
 
< 0.1%
6 481
 
0.6%
7 472
 
0.6%
8 3931
 
5.1%
9 16873
21.9%
10 33853
43.9%
ValueCountFrequency (%)
10 33853
43.9%
9 16873
21.9%
8 3931
 
5.1%
7 472
 
0.6%
6 481
 
0.6%
5 34
 
< 0.1%
4 84
 
0.1%
2 103
 
0.1%

review_scores_price
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct9
Distinct (%)< 0.1%
Missing21268
Missing (%)27.6%
Infinite0
Infinite (%)0.0%
Mean9.2836928
Minimum2
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size602.4 KiB
2022-11-24T10:38:50.095100image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile8
Q19
median10
Q310
95-th percentile10
Maximum10
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.0038427
Coefficient of variation (CV)0.10812967
Kurtosis13.325731
Mean9.2836928
Median Absolute Deviation (MAD)0
Skewness-2.8084432
Sum518290
Variance1.0077001
MonotonicityNot monotonic
2022-11-24T10:38:50.283365image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
10 28060
36.4%
9 20316
26.4%
8 5305
 
6.9%
7 880
 
1.1%
6 751
 
1.0%
2 211
 
0.3%
4 183
 
0.2%
5 110
 
0.1%
3 12
 
< 0.1%
(Missing) 21268
27.6%
ValueCountFrequency (%)
2 211
 
0.3%
3 12
 
< 0.1%
4 183
 
0.2%
5 110
 
0.1%
6 751
 
1.0%
7 880
 
1.1%
8 5305
 
6.9%
9 20316
26.4%
10 28060
36.4%
ValueCountFrequency (%)
10 28060
36.4%
9 20316
26.4%
8 5305
 
6.9%
7 880
 
1.1%
6 751
 
1.0%
5 110
 
0.1%
4 183
 
0.2%
3 12
 
< 0.1%
2 211
 
0.3%

number_of_reviews
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct369
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.255642
Minimum0
Maximum578
Zeros19400
Zeros (%)25.2%
Negative0
Negative (%)0.0%
Memory size602.4 KiB
2022-11-24T10:38:50.545805image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median4
Q315
95-th percentile70
Maximum578
Range578
Interquartile range (IQR)15

Descriptive statistics

Standard deviation32.200955
Coefficient of variation (CV)2.1107571
Kurtosis37.850035
Mean15.255642
Median Absolute Deviation (MAD)4
Skewness5.0140523
Sum1176149
Variance1036.9015
MonotonicityNot monotonic
2022-11-24T10:38:50.685451image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 19400
25.2%
1 8459
 
11.0%
2 5532
 
7.2%
3 4113
 
5.3%
4 3278
 
4.3%
5 2720
 
3.5%
6 2288
 
3.0%
7 1964
 
2.5%
8 1781
 
2.3%
9 1588
 
2.1%
Other values (359) 25973
33.7%
ValueCountFrequency (%)
0 19400
25.2%
1 8459
11.0%
2 5532
 
7.2%
3 4113
 
5.3%
4 3278
 
4.3%
5 2720
 
3.5%
6 2288
 
3.0%
7 1964
 
2.5%
8 1781
 
2.3%
9 1588
 
2.1%
ValueCountFrequency (%)
578 1
< 0.1%
529 1
< 0.1%
508 1
< 0.1%
507 1
< 0.1%
501 1
< 0.1%
481 1
< 0.1%
471 1
< 0.1%
468 1
< 0.1%
467 1
< 0.1%
461 1
< 0.1%

reviews_per_month
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct949
Distinct (%)1.6%
Missing19543
Missing (%)25.3%
Infinite0
Infinite (%)0.0%
Mean1.2611798
Minimum0.01
Maximum16.56
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size602.4 KiB
2022-11-24T10:38:50.864224image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0.01
5-th percentile0.06
Q10.27
median0.77
Q31.71
95-th percentile4.23
Maximum16.56
Range16.55
Interquartile range (IQR)1.44

Descriptive statistics

Standard deviation1.461234
Coefficient of variation (CV)1.1586247
Kurtosis7.2776849
Mean1.2611798
Median Absolute Deviation (MAD)0.58
Skewness2.3060339
Sum72584.68
Variance2.1352049
MonotonicityNot monotonic
2022-11-24T10:38:51.031254image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1527
 
2.0%
0.03 1024
 
1.3%
0.07 856
 
1.1%
0.06 817
 
1.1%
0.04 709
 
0.9%
0.05 654
 
0.8%
0.08 621
 
0.8%
2 615
 
0.8%
0.09 609
 
0.8%
0.17 601
 
0.8%
Other values (939) 49520
64.2%
(Missing) 19543
 
25.3%
ValueCountFrequency (%)
0.01 20
 
< 0.1%
0.02 359
 
0.5%
0.03 1024
1.3%
0.04 709
0.9%
0.05 654
0.8%
0.06 817
1.1%
0.07 856
1.1%
0.08 621
0.8%
0.09 609
0.8%
0.1 586
0.8%
ValueCountFrequency (%)
16.56 1
< 0.1%
15.69 1
< 0.1%
15 1
< 0.1%
14.13 1
< 0.1%
13.5 1
< 0.1%
13.26 1
< 0.1%
12.6 1
< 0.1%
12.59 1
< 0.1%
12.5 1
< 0.1%
12.44 1
< 0.1%

Interactions

2022-11-24T10:38:43.941042image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:27.258601image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:29.029778image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:30.997764image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:32.755071image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:34.459701image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:36.095630image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:37.753753image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:39.661747image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:42.087862image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:44.111765image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:27.457153image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:29.207917image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:31.168960image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:32.940024image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:34.622609image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:36.264986image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:37.932115image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:39.907746image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:42.290636image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:44.289173image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:27.642097image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:29.377349image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:31.335721image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:33.102853image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:34.792059image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:36.427787image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:38.101506image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:40.130743image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:42.493374image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:44.463306image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:27.806569image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:29.546699image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:31.515926image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:33.271943image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:34.961349image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:36.597247image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:38.270848image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:40.604212image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:42.654654image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:44.654797image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:27.989593image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:29.731678image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:31.683373image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:33.456946image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:35.124175image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:36.782229image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:38.433750image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:40.773535image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:42.853509image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:44.881030image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:28.174338image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:29.894474image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:31.912638image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:33.619689image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:35.277884image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:36.929451image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:38.603111image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:40.936367image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:43.043531image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:45.298218image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:28.343611image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:30.048279image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:32.068792image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:33.789052image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:35.447226image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:37.083185image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:38.756547image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:41.138158image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:43.230303image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:45.479152image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:28.522047image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:30.233296image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:32.237984image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:33.958452image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:35.610038image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:37.252198image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:38.919399image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:41.402141image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:43.419532image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:45.653253image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:28.691121image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:30.411734image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:32.428359image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:34.121252image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:35.779464image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:37.430643image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:39.169744image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:41.641135image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:43.594562image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:45.827291image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:28.860445image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:30.565258image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:32.601271image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:34.290634image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:35.943854image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:37.584419image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:39.449745image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:41.869197image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:38:43.743548image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Correlations

2022-11-24T10:38:51.170088image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2022-11-24T10:38:51.388325image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-24T10:38:51.630913image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-24T10:38:51.845402image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-24T10:38:52.081122image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-24T10:38:46.046653image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-24T10:38:46.397259image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-11-24T10:38:46.838539image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

shuttle_idreview_scores_ratingreview_scores_comfortreview_scores_amenitiesreview_scores_tripreview_scores_crewreview_scores_locationreview_scores_pricenumber_of_reviewsreviews_per_month
06356197.010.09.010.010.09.010.01331.65
13626090.08.09.010.09.09.09.030.09
25701595.09.010.09.010.09.09.0140.14
31403593.010.09.09.09.010.09.0390.42
41003698.010.010.010.010.09.09.0920.94
54516391.010.09.09.09.09.09.0260.77
66464395.09.010.010.010.09.09.01181.12
72338976.08.08.08.08.09.09.050.05
83993496.010.010.010.010.010.09.0380.49
957063100.010.010.010.010.010.010.010.02
shuttle_idreview_scores_ratingreview_scores_comfortreview_scores_amenitiesreview_scores_tripreview_scores_crewreview_scores_locationreview_scores_pricenumber_of_reviewsreviews_per_month
7708655187NaNNaNNaNNaNNaNNaNNaN11.0
7708746301NaNNaNNaNNaNNaNNaNNaN0NaN
7708854977NaNNaNNaNNaNNaNNaNNaN0NaN
7708951748NaNNaNNaNNaNNaNNaNNaN0NaN
7709044668NaNNaNNaNNaNNaNNaNNaN0NaN
770914368NaNNaNNaNNaNNaNNaNNaN0NaN
770922983NaNNaNNaNNaNNaNNaNNaN0NaN
7709369684NaNNaNNaNNaNNaNNaNNaN0NaN
7709421738NaNNaNNaNNaNNaNNaNNaN0NaN
7709572645NaNNaNNaNNaNNaNNaNNaN0NaN