Overview

Dataset statistics

Number of variables13
Number of observations77096
Missing cells188
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.6 MiB
Average record size in memory104.0 B

Variable types

Numeric5
Categorical6
Boolean2

Alerts

moon_clearance_complete has constant value "False"Constant
price has a high cardinality: 848 distinct valuesHigh cardinality
engines is highly overall correlated with crewHigh correlation
passenger_capacity is highly overall correlated with engine_type and 1 other fieldsHigh correlation
crew is highly overall correlated with engines and 1 other fieldsHigh correlation
shuttle_type is highly overall correlated with engine_typeHigh correlation
engine_type is highly overall correlated with shuttle_type and 1 other fieldsHigh correlation
id is uniformly distributedUniform
id has unique valuesUnique
engines has 4448 (5.8%) zerosZeros
crew has 1271 (1.6%) zerosZeros

Reproduction

Analysis started2022-11-24 10:37:51.080167
Analysis finished2022-11-24 10:38:00.491029
Duration9.41 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

id
Real number (ℝ)

UNIFORM
UNIQUE

Distinct77096
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38548.5
Minimum1
Maximum77096
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size602.4 KiB
2022-11-24T10:38:00.721743image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3855.75
Q119274.75
median38548.5
Q357822.25
95-th percentile73241.25
Maximum77096
Range77095
Interquartile range (IQR)38547.5

Descriptive statistics

Standard deviation22255.843
Coefficient of variation (CV)0.57734652
Kurtosis-1.2
Mean38548.5
Median Absolute Deviation (MAD)19274
Skewness1.7649979 × 10-18
Sum2.9719352 × 109
Variance4.9532253 × 108
MonotonicityNot monotonic
2022-11-24T10:38:00.969182image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
63561 1
 
< 0.1%
25124 1
 
< 0.1%
44449 1
 
< 0.1%
4748 1
 
< 0.1%
30795 1
 
< 0.1%
8614 1
 
< 0.1%
62491 1
 
< 0.1%
40429 1
 
< 0.1%
20507 1
 
< 0.1%
24295 1
 
< 0.1%
Other values (77086) 77086
> 99.9%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
77096 1
< 0.1%
77095 1
< 0.1%
77094 1
< 0.1%
77093 1
< 0.1%
77092 1
< 0.1%
77091 1
< 0.1%
77090 1
< 0.1%
77089 1
< 0.1%
77088 1
< 0.1%
77087 1
< 0.1%

shuttle_location
Categorical

Distinct30
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size602.4 KiB
Malta
8487 
Barbados
8328 
Rwanda
7513 
Bouvet Island (Bouvetoya)
5907 
United Kingdom
5551 
Other values (25)
41310 

Length

Max length25
Median length18
Mean length10.900215
Min length4

Characters and Unicode

Total characters840363
Distinct characters44
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNiue
2nd rowAnguilla
3rd rowRussian Federation
4th rowBarbados
5th rowSao Tome and Principe

Common Values

ValueCountFrequency (%)
Malta 8487
11.0%
Barbados 8328
10.8%
Rwanda 7513
9.7%
Bouvet Island (Bouvetoya) 5907
 
7.7%
United Kingdom 5551
 
7.2%
Micronesia 5466
 
7.1%
Nicaragua 5461
 
7.1%
Russian Federation 4759
 
6.2%
Niue 4299
 
5.6%
Sao Tome and Principe 3916
 
5.1%
Other values (20) 17409
22.6%

Length

2022-11-24T10:38:01.152289image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
malta 8487
 
7.1%
barbados 8328
 
7.0%
rwanda 7513
 
6.3%
bouvet 5907
 
4.9%
island 5907
 
4.9%
bouvetoya 5907
 
4.9%
united 5551
 
4.6%
kingdom 5551
 
4.6%
and 5476
 
4.6%
micronesia 5466
 
4.6%
Other values (34) 55318
46.3%

Most occurring characters

ValueCountFrequency (%)
a 126716
15.1%
i 65706
 
7.8%
n 62245
 
7.4%
o 57518
 
6.8%
e 51891
 
6.2%
d 47389
 
5.6%
s 42627
 
5.1%
42315
 
5.0%
t 38865
 
4.6%
r 32646
 
3.9%
Other values (34) 272445
32.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 672827
80.1%
Uppercase Letter 113407
 
13.5%
Space Separator 42315
 
5.0%
Open Punctuation 5907
 
0.7%
Close Punctuation 5907
 
0.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 126716
18.8%
i 65706
9.8%
n 62245
9.3%
o 57518
8.5%
e 51891
7.7%
d 47389
 
7.0%
s 42627
 
6.3%
t 38865
 
5.8%
r 32646
 
4.9%
u 30594
 
4.5%
Other values (14) 116630
17.3%
Uppercase Letter
ValueCountFrequency (%)
B 21103
18.6%
M 15317
13.5%
R 14337
12.6%
N 9760
8.6%
I 9662
8.5%
U 7371
 
6.5%
F 7277
 
6.4%
K 6697
 
5.9%
S 4778
 
4.2%
P 4370
 
3.9%
Other values (7) 12735
11.2%
Space Separator
ValueCountFrequency (%)
42315
100.0%
Open Punctuation
ValueCountFrequency (%)
( 5907
100.0%
Close Punctuation
ValueCountFrequency (%)
) 5907
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 786234
93.6%
Common 54129
 
6.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 126716
16.1%
i 65706
 
8.4%
n 62245
 
7.9%
o 57518
 
7.3%
e 51891
 
6.6%
d 47389
 
6.0%
s 42627
 
5.4%
t 38865
 
4.9%
r 32646
 
4.2%
u 30594
 
3.9%
Other values (31) 230037
29.3%
Common
ValueCountFrequency (%)
42315
78.2%
( 5907
 
10.9%
) 5907
 
10.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 840363
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 126716
15.1%
i 65706
 
7.8%
n 62245
 
7.4%
o 57518
 
6.8%
e 51891
 
6.2%
d 47389
 
5.6%
s 42627
 
5.1%
42315
 
5.0%
t 38865
 
4.6%
r 32646
 
3.9%
Other values (34) 272445
32.4%

shuttle_type
Categorical

Distinct42
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size602.4 KiB
Type V5
52147 
Type F5
15636 
Type V2
 
2932
Type G0
 
2112
Type V7
 
940
Other values (37)
 
3329

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters539672
Distinct characters34
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)< 0.1%

Sample

1st rowType V5
2nd rowType V5
3rd rowType V5
4th rowType V5
5th rowType V2

Common Values

ValueCountFrequency (%)
Type V5 52147
67.6%
Type F5 15636
 
20.3%
Type V2 2932
 
3.8%
Type G0 2112
 
2.7%
Type V7 940
 
1.2%
Type O3 863
 
1.1%
Type Z6 734
 
1.0%
Type E3 297
 
0.4%
Type X3 244
 
0.3%
Type F1 227
 
0.3%
Other values (32) 964
 
1.3%

Length

2022-11-24T10:38:01.289867image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
type 77096
50.0%
v5 52147
33.8%
f5 15636
 
10.1%
v2 2932
 
1.9%
g0 2112
 
1.4%
v7 940
 
0.6%
o3 863
 
0.6%
z6 734
 
0.5%
e3 297
 
0.2%
x3 244
 
0.2%
Other values (33) 1191
 
0.8%

Most occurring characters

ValueCountFrequency (%)
T 77100
14.3%
e 77096
14.3%
77096
14.3%
y 77096
14.3%
p 77096
14.3%
5 67918
12.6%
V 56019
10.4%
F 15864
 
2.9%
2 2960
 
0.5%
0 2329
 
0.4%
Other values (24) 9098
 
1.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 231288
42.9%
Uppercase Letter 154192
28.6%
Space Separator 77096
 
14.3%
Decimal Number 77096
 
14.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 77100
50.0%
V 56019
36.3%
F 15864
 
10.3%
G 2112
 
1.4%
O 875
 
0.6%
Z 775
 
0.5%
E 297
 
0.2%
X 244
 
0.2%
N 195
 
0.1%
A 146
 
0.1%
Other values (12) 565
 
0.4%
Decimal Number
ValueCountFrequency (%)
5 67918
88.1%
2 2960
 
3.8%
0 2329
 
3.0%
3 1404
 
1.8%
7 1268
 
1.6%
6 744
 
1.0%
1 393
 
0.5%
4 80
 
0.1%
Lowercase Letter
ValueCountFrequency (%)
e 77096
33.3%
y 77096
33.3%
p 77096
33.3%
Space Separator
ValueCountFrequency (%)
77096
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 385480
71.4%
Common 154192
 
28.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 77100
20.0%
e 77096
20.0%
y 77096
20.0%
p 77096
20.0%
V 56019
14.5%
F 15864
 
4.1%
G 2112
 
0.5%
O 875
 
0.2%
Z 775
 
0.2%
E 297
 
0.1%
Other values (15) 1150
 
0.3%
Common
ValueCountFrequency (%)
77096
50.0%
5 67918
44.0%
2 2960
 
1.9%
0 2329
 
1.5%
3 1404
 
0.9%
7 1268
 
0.8%
6 744
 
0.5%
1 393
 
0.3%
4 80
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 539672
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
T 77100
14.3%
e 77096
14.3%
77096
14.3%
y 77096
14.3%
p 77096
14.3%
5 67918
12.6%
V 56019
10.4%
F 15864
 
2.9%
2 2960
 
0.5%
0 2329
 
0.4%
Other values (24) 9098
 
1.7%

engine_type
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size602.4 KiB
Plasma
42758 
Quantum
33594 
Nuclear
 
744

Length

Max length7
Median length6
Mean length6.4453928
Min length6

Characters and Unicode

Total characters496914
Distinct characters13
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowQuantum
2nd rowQuantum
3rd rowQuantum
4th rowPlasma
5th rowPlasma

Common Values

ValueCountFrequency (%)
Plasma 42758
55.5%
Quantum 33594
43.6%
Nuclear 744
 
1.0%

Length

2022-11-24T10:38:01.423429image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-11-24T10:38:01.570616image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
plasma 42758
55.5%
quantum 33594
43.6%
nuclear 744
 
1.0%

Most occurring characters

ValueCountFrequency (%)
a 119854
24.1%
m 76352
15.4%
u 67932
13.7%
l 43502
 
8.8%
P 42758
 
8.6%
s 42758
 
8.6%
Q 33594
 
6.8%
n 33594
 
6.8%
t 33594
 
6.8%
N 744
 
0.1%
Other values (3) 2232
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 419818
84.5%
Uppercase Letter 77096
 
15.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 119854
28.5%
m 76352
18.2%
u 67932
16.2%
l 43502
 
10.4%
s 42758
 
10.2%
n 33594
 
8.0%
t 33594
 
8.0%
c 744
 
0.2%
e 744
 
0.2%
r 744
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
P 42758
55.5%
Q 33594
43.6%
N 744
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 496914
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 119854
24.1%
m 76352
15.4%
u 67932
13.7%
l 43502
 
8.8%
P 42758
 
8.6%
s 42758
 
8.6%
Q 33594
 
6.8%
n 33594
 
6.8%
t 33594
 
6.8%
N 744
 
0.1%
Other values (3) 2232
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 496914
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 119854
24.1%
m 76352
15.4%
u 67932
13.7%
l 43502
 
8.8%
P 42758
 
8.6%
s 42758
 
8.6%
Q 33594
 
6.8%
n 33594
 
6.8%
t 33594
 
6.8%
N 744
 
0.1%
Other values (3) 2232
 
0.4%

engine_vendor
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size602.4 KiB
ThetaBase Services
76275 
Banks, Wood and Phillips
 
474
Warwick Technology Multinational
 
214
SIT Technology Unlimited
 
74
MCW Global
 
59

Length

Max length32
Median length18
Mean length18.075387
Min length10

Characters and Unicode

Total characters1393540
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowThetaBase Services
2nd rowThetaBase Services
3rd rowThetaBase Services
4th rowThetaBase Services
5th rowThetaBase Services

Common Values

ValueCountFrequency (%)
ThetaBase Services 76275
98.9%
Banks, Wood and Phillips 474
 
0.6%
Warwick Technology Multinational 214
 
0.3%
SIT Technology Unlimited 74
 
0.1%
MCW Global 59
 
0.1%

Length

2022-11-24T10:38:01.686530image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-11-24T10:38:02.099703image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
thetabase 76275
49.1%
services 76275
49.1%
banks 474
 
0.3%
wood 474
 
0.3%
and 474
 
0.3%
phillips 474
 
0.3%
technology 288
 
0.2%
warwick 214
 
0.1%
multinational 214
 
0.1%
sit 74
 
< 0.1%
Other values (3) 192
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e 305462
21.9%
a 154199
11.1%
s 153498
11.0%
78332
 
5.6%
i 78013
 
5.6%
h 77037
 
5.5%
t 76777
 
5.5%
c 76777
 
5.5%
B 76749
 
5.5%
T 76637
 
5.5%
Other values (23) 240059
17.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1083239
77.7%
Uppercase Letter 231495
 
16.6%
Space Separator 78332
 
5.6%
Other Punctuation 474
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 305462
28.2%
a 154199
14.2%
s 153498
14.2%
i 78013
 
7.2%
h 77037
 
7.1%
t 76777
 
7.1%
c 76777
 
7.1%
r 76489
 
7.1%
v 76275
 
7.0%
l 1856
 
0.2%
Other values (11) 6856
 
0.6%
Uppercase Letter
ValueCountFrequency (%)
B 76749
33.2%
T 76637
33.1%
S 76349
33.0%
W 747
 
0.3%
P 474
 
0.2%
M 273
 
0.1%
I 74
 
< 0.1%
U 74
 
< 0.1%
C 59
 
< 0.1%
G 59
 
< 0.1%
Space Separator
ValueCountFrequency (%)
78332
100.0%
Other Punctuation
ValueCountFrequency (%)
, 474
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1314734
94.3%
Common 78806
 
5.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 305462
23.2%
a 154199
11.7%
s 153498
11.7%
i 78013
 
5.9%
h 77037
 
5.9%
t 76777
 
5.8%
c 76777
 
5.8%
B 76749
 
5.8%
T 76637
 
5.8%
r 76489
 
5.8%
Other values (21) 163096
12.4%
Common
ValueCountFrequency (%)
78332
99.4%
, 474
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1393540
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 305462
21.9%
a 154199
11.1%
s 153498
11.0%
78332
 
5.6%
i 78013
 
5.6%
h 77037
 
5.5%
t 76777
 
5.5%
c 76777
 
5.5%
B 76749
 
5.5%
T 76637
 
5.5%
Other values (23) 240059
17.2%

engines
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct15
Distinct (%)< 0.1%
Missing39
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean1.4039607
Minimum0
Maximum44
Zeros4448
Zeros (%)5.8%
Negative0
Negative (%)0.0%
Memory size602.4 KiB
2022-11-24T10:38:02.220896image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q32
95-th percentile3
Maximum44
Range44
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.90503385
Coefficient of variation (CV)0.64462904
Kurtosis68.625371
Mean1.4039607
Median Absolute Deviation (MAD)0
Skewness3.0697699
Sum108185
Variance0.81908626
MonotonicityNot monotonic
2022-11-24T10:38:02.358946image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
1 48742
63.2%
2 16001
 
20.8%
3 5113
 
6.6%
0 4448
 
5.8%
4 1956
 
2.5%
5 619
 
0.8%
6 135
 
0.2%
7 28
 
< 0.1%
8 6
 
< 0.1%
10 2
 
< 0.1%
Other values (5) 7
 
< 0.1%
(Missing) 39
 
0.1%
ValueCountFrequency (%)
0 4448
 
5.8%
1 48742
63.2%
2 16001
 
20.8%
3 5113
 
6.6%
4 1956
 
2.5%
5 619
 
0.8%
6 135
 
0.2%
7 28
 
< 0.1%
8 6
 
< 0.1%
9 2
 
< 0.1%
ValueCountFrequency (%)
44 1
 
< 0.1%
13 1
 
< 0.1%
12 1
 
< 0.1%
11 2
 
< 0.1%
10 2
 
< 0.1%
9 2
 
< 0.1%
8 6
 
< 0.1%
7 28
 
< 0.1%
6 135
 
0.2%
5 619
0.8%

passenger_capacity
Real number (ℝ)

Distinct17
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.1550664
Minimum1
Maximum20
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size602.4 KiB
2022-11-24T10:38:02.474892image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q34
95-th percentile7
Maximum20
Range19
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.9725307
Coefficient of variation (CV)0.62519466
Kurtosis4.2753533
Mean3.1550664
Median Absolute Deviation (MAD)1
Skewness1.6690889
Sum243243
Variance3.8908772
MonotonicityNot monotonic
2022-11-24T10:38:02.575173image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
2 32858
42.6%
4 15373
19.9%
1 9710
 
12.6%
6 5961
 
7.7%
3 5322
 
6.9%
5 3567
 
4.6%
8 1771
 
2.3%
7 1241
 
1.6%
10 486
 
0.6%
9 363
 
0.5%
Other values (7) 444
 
0.6%
ValueCountFrequency (%)
1 9710
 
12.6%
2 32858
42.6%
3 5322
 
6.9%
4 15373
19.9%
5 3567
 
4.6%
6 5961
 
7.7%
7 1241
 
1.6%
8 1771
 
2.3%
9 363
 
0.5%
10 486
 
0.6%
ValueCountFrequency (%)
20 1
 
< 0.1%
16 76
 
0.1%
15 15
 
< 0.1%
14 63
 
0.1%
13 45
 
0.1%
12 164
 
0.2%
11 80
 
0.1%
10 486
 
0.6%
9 363
 
0.5%
8 1771
2.3%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size602.4 KiB
strict
33404 
flexible
25235 
moderate
18457 

Length

Max length8
Median length8
Mean length7.133444
Min length6

Characters and Unicode

Total characters549960
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowstrict
2nd rowstrict
3rd rowmoderate
4th rowstrict
5th rowstrict

Common Values

ValueCountFrequency (%)
strict 33404
43.3%
flexible 25235
32.7%
moderate 18457
23.9%

Length

2022-11-24T10:38:02.722340image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-11-24T10:38:02.860411image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
strict 33404
43.3%
flexible 25235
32.7%
moderate 18457
23.9%

Most occurring characters

ValueCountFrequency (%)
e 87384
15.9%
t 85265
15.5%
i 58639
10.7%
r 51861
9.4%
l 50470
9.2%
s 33404
 
6.1%
c 33404
 
6.1%
f 25235
 
4.6%
x 25235
 
4.6%
b 25235
 
4.6%
Other values (4) 73828
13.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 549960
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 87384
15.9%
t 85265
15.5%
i 58639
10.7%
r 51861
9.4%
l 50470
9.2%
s 33404
 
6.1%
c 33404
 
6.1%
f 25235
 
4.6%
x 25235
 
4.6%
b 25235
 
4.6%
Other values (4) 73828
13.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 549960
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 87384
15.9%
t 85265
15.5%
i 58639
10.7%
r 51861
9.4%
l 50470
9.2%
s 33404
 
6.1%
c 33404
 
6.1%
f 25235
 
4.6%
x 25235
 
4.6%
b 25235
 
4.6%
Other values (4) 73828
13.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 549960
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 87384
15.9%
t 85265
15.5%
i 58639
10.7%
r 51861
9.4%
l 50470
9.2%
s 33404
 
6.1%
c 33404
 
6.1%
f 25235
 
4.6%
x 25235
 
4.6%
b 25235
 
4.6%
Other values (4) 73828
13.4%

crew
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct20
Distinct (%)< 0.1%
Missing149
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean1.7413025
Minimum0
Maximum23
Zeros1271
Zeros (%)1.6%
Negative0
Negative (%)0.0%
Memory size602.4 KiB
2022-11-24T10:38:02.976348image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q32
95-th percentile4
Maximum23
Range23
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.2399381
Coefficient of variation (CV)0.71207512
Kurtosis14.264173
Mean1.7413025
Median Absolute Deviation (MAD)0
Skewness2.7301788
Sum133988
Variance1.5374466
MonotonicityNot monotonic
2022-11-24T10:38:03.092258image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
1 43188
56.0%
2 18631
24.2%
3 7598
 
9.9%
4 3472
 
4.5%
5 1420
 
1.8%
0 1271
 
1.6%
6 713
 
0.9%
7 289
 
0.4%
8 184
 
0.2%
9 61
 
0.1%
Other values (10) 120
 
0.2%
(Missing) 149
 
0.2%
ValueCountFrequency (%)
0 1271
 
1.6%
1 43188
56.0%
2 18631
24.2%
3 7598
 
9.9%
4 3472
 
4.5%
5 1420
 
1.8%
6 713
 
0.9%
7 289
 
0.4%
8 184
 
0.2%
9 61
 
0.1%
ValueCountFrequency (%)
23 1
 
< 0.1%
20 1
 
< 0.1%
18 1
 
< 0.1%
16 16
 
< 0.1%
15 3
 
< 0.1%
14 9
 
< 0.1%
13 3
 
< 0.1%
12 20
 
< 0.1%
11 7
 
< 0.1%
10 59
0.1%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size75.4 KiB
False
46762 
True
30334 
ValueCountFrequency (%)
False 46762
60.7%
True 30334
39.3%
2022-11-24T10:38:03.208173image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size75.4 KiB
False
77096 
ValueCountFrequency (%)
False 77096
100.0%
2022-11-24T10:38:03.324099image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

price
Categorical

Distinct848
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size602.4 KiB
$1,520.0
 
2873
$2,170.0
 
2752
$1,390.0
 
2674
$1,325.0
 
2333
$1,260.0
 
2250
Other values (843)
64214 

Length

Max length10
Median length8
Mean length8.0059666
Min length6

Characters and Unicode

Total characters617228
Distinct characters13
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique253 ?
Unique (%)0.3%

Sample

1st row$1,325.0
2nd row$1,780.0
3rd row$1,715.0
4th row$4,770.0
5th row$2,820.0

Common Values

ValueCountFrequency (%)
$1,520.0 2873
 
3.7%
$2,170.0 2752
 
3.6%
$1,390.0 2674
 
3.5%
$1,325.0 2333
 
3.0%
$1,260.0 2250
 
2.9%
$2,430.0 2205
 
2.9%
$1,455.0 2150
 
2.8%
$1,650.0 2131
 
2.8%
$2,820.0 2042
 
2.6%
$1,910.0 2042
 
2.6%
Other values (838) 53644
69.6%

Length

2022-11-24T10:38:03.440011image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1,520.0 2873
 
3.7%
2,170.0 2752
 
3.6%
1,390.0 2674
 
3.5%
1,325.0 2333
 
3.0%
1,260.0 2250
 
2.9%
2,430.0 2205
 
2.9%
1,455.0 2150
 
2.8%
1,650.0 2131
 
2.8%
2,820.0 2042
 
2.6%
1,910.0 2042
 
2.6%
Other values (838) 53644
69.6%

Most occurring characters

ValueCountFrequency (%)
0 121718
19.7%
$ 77096
12.5%
. 77096
12.5%
, 77039
12.5%
1 62820
10.2%
2 44530
 
7.2%
5 36385
 
5.9%
3 25685
 
4.2%
7 23871
 
3.9%
4 23007
 
3.7%
Other values (3) 47981
 
7.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 385997
62.5%
Other Punctuation 154135
 
25.0%
Currency Symbol 77096
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 121718
31.5%
1 62820
16.3%
2 44530
 
11.5%
5 36385
 
9.4%
3 25685
 
6.7%
7 23871
 
6.2%
4 23007
 
6.0%
9 16649
 
4.3%
6 16134
 
4.2%
8 15198
 
3.9%
Other Punctuation
ValueCountFrequency (%)
. 77096
50.0%
, 77039
50.0%
Currency Symbol
ValueCountFrequency (%)
$ 77096
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 617228
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 121718
19.7%
$ 77096
12.5%
. 77096
12.5%
, 77039
12.5%
1 62820
10.2%
2 44530
 
7.2%
5 36385
 
5.9%
3 25685
 
4.2%
7 23871
 
3.9%
4 23007
 
3.7%
Other values (3) 47981
 
7.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 617228
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 121718
19.7%
$ 77096
12.5%
. 77096
12.5%
, 77039
12.5%
1 62820
10.2%
2 44530
 
7.2%
5 36385
 
5.9%
3 25685
 
4.2%
7 23871
 
3.9%
4 23007
 
3.7%
Other values (3) 47981
 
7.8%

company_id
Real number (ℝ)

Distinct50098
Distinct (%)65.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25155.386
Minimum1
Maximum50098
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size602.4 KiB
2022-11-24T10:38:03.593738image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2704.75
Q112935.75
median25253.5
Q337410.25
95-th percentile47520.25
Maximum50098
Range50097
Interquartile range (IQR)24474.5

Descriptive statistics

Standard deviation14300.991
Coefficient of variation (CV)0.5685061
Kurtosis-1.1684667
Mean25155.386
Median Absolute Deviation (MAD)12241.5
Skewness-0.01137149
Sum1.9393797 × 109
Variance2.0451833 × 108
MonotonicityNot monotonic
2022-11-24T10:38:03.756525image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
29647 1086
 
1.4%
45111 297
 
0.4%
28828 184
 
0.2%
32203 176
 
0.2%
20334 167
 
0.2%
18077 125
 
0.2%
4745 114
 
0.1%
10711 108
 
0.1%
22721 106
 
0.1%
19019 102
 
0.1%
Other values (50088) 74631
96.8%
ValueCountFrequency (%)
1 2
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 2
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 2
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
50098 2
< 0.1%
50097 1
< 0.1%
50096 1
< 0.1%
50095 1
< 0.1%
50094 1
< 0.1%
50093 1
< 0.1%
50092 1
< 0.1%
50091 1
< 0.1%
50090 1
< 0.1%
50089 2
< 0.1%

Interactions

2022-11-24T10:37:58.547842image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:55.424712image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:56.231556image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:56.990294image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:57.814454image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:58.732677image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:55.587479image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:56.389409image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:57.159640image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:57.977267image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:58.879831image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:55.756771image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:56.527061image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:57.325664image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:58.115306image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:59.059462image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:55.926109image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:56.694305image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:57.491551image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:58.278100image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:59.203079image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:56.073289image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:56.852496image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:57.645291image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-11-24T10:37:58.416250image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Correlations

2022-11-24T10:38:03.878955image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2022-11-24T10:38:04.079368image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-24T10:38:04.226596image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-24T10:38:04.380310image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-24T10:38:04.543102image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-11-24T10:38:04.712139image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-24T10:37:59.465417image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-24T10:37:59.866251image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-11-24T10:38:00.267333image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

idshuttle_locationshuttle_typeengine_typeengine_vendorenginespassenger_capacitycancellation_policycrewd_check_completemoon_clearance_completepricecompany_id
063561NiueType V5QuantumThetaBase Services1.02strict1.0ff$1,325.035029
136260AnguillaType V5QuantumThetaBase Services1.02strict1.0tf$1,780.030292
257015Russian FederationType V5QuantumThetaBase Services1.02moderate0.0ff$1,715.019032
314035BarbadosType V5PlasmaThetaBase Services3.06strict3.0ff$4,770.08238
410036Sao Tome and PrincipeType V2PlasmaThetaBase Services2.04strict2.0ff$2,820.030342
545163Sao Tome and PrincipeType V5PlasmaThetaBase Services2.04moderate2.0ff$1,715.032413
664643Faroe IslandsType F5QuantumThetaBase Services1.02strict1.0tf$1,247.035620
723389MicronesiaType V5QuantumThetaBase Services1.01moderate1.0ff$1,845.023820
839934RwandaType V5QuantumThetaBase Services1.03strict2.0ff$1,520.046528
957063Faroe IslandsType F5PlasmaThetaBase Services4.08strict5.0ff$3,275.011875
idshuttle_locationshuttle_typeengine_typeengine_vendorenginespassenger_capacitycancellation_policycrewd_check_completemoon_clearance_completepricecompany_id
7708655187MaltaType V5PlasmaThetaBase Services0.04flexible2.0tf$1,520.015249
7708746301United KingdomType V5QuantumThetaBase Services1.02strict1.0tf$1,455.044431
7708854977NicaraguaType V5QuantumThetaBase Services1.01flexible1.0tf$1,364.025724
7708951748UzbekistanType V5QuantumThetaBase Services1.02flexible1.0tf$1,325.032743
7709044668RwandaType F5QuantumThetaBase Services1.01strict1.0tf$1,260.019010
770914368BarbadosType V5QuantumThetaBase Services2.04flexible2.0tf$4,107.06654
770922983Bouvet Island (Bouvetoya)Type F5QuantumThetaBase Services1.01flexible1.0tf$1,169.08000
7709369684MicronesiaType V5PlasmaThetaBase Services0.02flexible1.0tf$1,910.014296
7709421738UzbekistanType V5PlasmaThetaBase Services1.02flexible1.0tf$2,170.027363
7709572645MaltaType F5QuantumThetaBase Services0.02moderate2.0tf$1,455.012542