Dataset statistics
Number of variables | 13 |
---|---|
Number of observations | 77096 |
Missing cells | 188 |
Missing cells (%) | < 0.1% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 7.6 MiB |
Average record size in memory | 104.0 B |
Variable types
Numeric | 5 |
---|---|
Categorical | 6 |
Boolean | 2 |
moon_clearance_complete has constant value "False" | Constant |
price has a high cardinality: 848 distinct values | High cardinality |
engines is highly overall correlated with crew | High correlation |
passenger_capacity is highly overall correlated with engine_type and 1 other fields | High correlation |
crew is highly overall correlated with engines and 1 other fields | High correlation |
shuttle_type is highly overall correlated with engine_type | High correlation |
engine_type is highly overall correlated with shuttle_type and 1 other fields | High correlation |
id is uniformly distributed | Uniform |
id has unique values | Unique |
engines has 4448 (5.8%) zeros | Zeros |
crew has 1271 (1.6%) zeros | Zeros |
Reproduction
Analysis started | 2022-11-24 10:37:51.080167 |
---|---|
Analysis finished | 2022-11-24 10:38:00.491029 |
Duration | 9.41 seconds |
Software version | pandas-profiling vv3.5.0 |
Download configuration | config.json |
Distinct | 77096 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 38548.5 |
Minimum | 1 |
---|---|
Maximum | 77096 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 602.4 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 3855.75 |
Q1 | 19274.75 |
median | 38548.5 |
Q3 | 57822.25 |
95-th percentile | 73241.25 |
Maximum | 77096 |
Range | 77095 |
Interquartile range (IQR) | 38547.5 |
Descriptive statistics
Standard deviation | 22255.843 |
---|---|
Coefficient of variation (CV) | 0.57734652 |
Kurtosis | -1.2 |
Mean | 38548.5 |
Median Absolute Deviation (MAD) | 19274 |
Skewness | 1.7649979 × 10-18 |
Sum | 2.9719352 × 109 |
Variance | 4.9532253 × 108 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
63561 | 1 | < 0.1% |
25124 | 1 | < 0.1% |
44449 | 1 | < 0.1% |
4748 | 1 | < 0.1% |
30795 | 1 | < 0.1% |
8614 | 1 | < 0.1% |
62491 | 1 | < 0.1% |
40429 | 1 | < 0.1% |
20507 | 1 | < 0.1% |
24295 | 1 | < 0.1% |
Other values (77086) | 77086 |
Value | Count | Frequency (%) |
1 | 1 | |
2 | 1 | |
3 | 1 | |
4 | 1 | |
5 | 1 | |
6 | 1 | |
7 | 1 | |
8 | 1 | |
9 | 1 | |
10 | 1 |
Value | Count | Frequency (%) |
77096 | 1 | |
77095 | 1 | |
77094 | 1 | |
77093 | 1 | |
77092 | 1 | |
77091 | 1 | |
77090 | 1 | |
77089 | 1 | |
77088 | 1 | |
77087 | 1 |
shuttle_location
Categorical
Distinct | 30 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 602.4 KiB |
Malta | |
---|---|
Barbados | |
Rwanda | |
Bouvet Island (Bouvetoya) | |
United Kingdom | |
Other values (25) |
Length
Max length | 25 |
---|---|
Median length | 18 |
Mean length | 10.900215 |
Min length | 4 |
Characters and Unicode
Total characters | 840363 |
---|---|
Distinct characters | 44 |
Distinct categories | 5 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | Niue |
---|---|
2nd row | Anguilla |
3rd row | Russian Federation |
4th row | Barbados |
5th row | Sao Tome and Principe |
Common Values
Value | Count | Frequency (%) |
Malta | 8487 | |
Barbados | 8328 | |
Rwanda | 7513 | |
Bouvet Island (Bouvetoya) | 5907 | 7.7% |
United Kingdom | 5551 | 7.2% |
Micronesia | 5466 | 7.1% |
Nicaragua | 5461 | 7.1% |
Russian Federation | 4759 | 6.2% |
Niue | 4299 | 5.6% |
Sao Tome and Principe | 3916 | 5.1% |
Other values (20) | 17409 |
Length
Value | Count | Frequency (%) |
malta | 8487 | 7.1% |
barbados | 8328 | 7.0% |
rwanda | 7513 | 6.3% |
bouvet | 5907 | 4.9% |
island | 5907 | 4.9% |
bouvetoya | 5907 | 4.9% |
united | 5551 | 4.6% |
kingdom | 5551 | 4.6% |
and | 5476 | 4.6% |
micronesia | 5466 | 4.6% |
Other values (34) | 55318 |
Most occurring characters
Value | Count | Frequency (%) |
a | 126716 | |
i | 65706 | 7.8% |
n | 62245 | 7.4% |
o | 57518 | 6.8% |
e | 51891 | 6.2% |
d | 47389 | 5.6% |
s | 42627 | 5.1% |
42315 | 5.0% | |
t | 38865 | 4.6% |
r | 32646 | 3.9% |
Other values (34) | 272445 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 672827 | |
Uppercase Letter | 113407 | 13.5% |
Space Separator | 42315 | 5.0% |
Open Punctuation | 5907 | 0.7% |
Close Punctuation | 5907 | 0.7% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
a | 126716 | |
i | 65706 | |
n | 62245 | |
o | 57518 | |
e | 51891 | |
d | 47389 | 7.0% |
s | 42627 | 6.3% |
t | 38865 | 5.8% |
r | 32646 | 4.9% |
u | 30594 | 4.5% |
Other values (14) | 116630 |
Uppercase Letter
Value | Count | Frequency (%) |
B | 21103 | |
M | 15317 | |
R | 14337 | |
N | 9760 | |
I | 9662 | |
U | 7371 | 6.5% |
F | 7277 | 6.4% |
K | 6697 | 5.9% |
S | 4778 | 4.2% |
P | 4370 | 3.9% |
Other values (7) | 12735 |
Space Separator
Value | Count | Frequency (%) |
42315 |
Open Punctuation
Value | Count | Frequency (%) |
( | 5907 |
Close Punctuation
Value | Count | Frequency (%) |
) | 5907 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 786234 | |
Common | 54129 | 6.4% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
a | 126716 | |
i | 65706 | 8.4% |
n | 62245 | 7.9% |
o | 57518 | 7.3% |
e | 51891 | 6.6% |
d | 47389 | 6.0% |
s | 42627 | 5.4% |
t | 38865 | 4.9% |
r | 32646 | 4.2% |
u | 30594 | 3.9% |
Other values (31) | 230037 |
Common
Value | Count | Frequency (%) |
42315 | ||
( | 5907 | 10.9% |
) | 5907 | 10.9% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 840363 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
a | 126716 | |
i | 65706 | 7.8% |
n | 62245 | 7.4% |
o | 57518 | 6.8% |
e | 51891 | 6.2% |
d | 47389 | 5.6% |
s | 42627 | 5.1% |
42315 | 5.0% | |
t | 38865 | 4.6% |
r | 32646 | 3.9% |
Other values (34) | 272445 |
shuttle_type
Categorical
Distinct | 42 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 602.4 KiB |
Type V5 | |
---|---|
Type F5 | |
Type V2 | 2932 |
Type G0 | 2112 |
Type V7 | 940 |
Other values (37) | 3329 |
Length
Max length | 7 |
---|---|
Median length | 7 |
Mean length | 7 |
Min length | 7 |
Characters and Unicode
Total characters | 539672 |
---|---|
Distinct characters | 34 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 8 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | Type V5 |
---|---|
2nd row | Type V5 |
3rd row | Type V5 |
4th row | Type V5 |
5th row | Type V2 |
Common Values
Value | Count | Frequency (%) |
Type V5 | 52147 | |
Type F5 | 15636 | 20.3% |
Type V2 | 2932 | 3.8% |
Type G0 | 2112 | 2.7% |
Type V7 | 940 | 1.2% |
Type O3 | 863 | 1.1% |
Type Z6 | 734 | 1.0% |
Type E3 | 297 | 0.4% |
Type X3 | 244 | 0.3% |
Type F1 | 227 | 0.3% |
Other values (32) | 964 | 1.3% |
Length
Value | Count | Frequency (%) |
type | 77096 | |
v5 | 52147 | |
f5 | 15636 | 10.1% |
v2 | 2932 | 1.9% |
g0 | 2112 | 1.4% |
v7 | 940 | 0.6% |
o3 | 863 | 0.6% |
z6 | 734 | 0.5% |
e3 | 297 | 0.2% |
x3 | 244 | 0.2% |
Other values (33) | 1191 | 0.8% |
Most occurring characters
Value | Count | Frequency (%) |
T | 77100 | |
e | 77096 | |
77096 | ||
y | 77096 | |
p | 77096 | |
5 | 67918 | |
V | 56019 | |
F | 15864 | 2.9% |
2 | 2960 | 0.5% |
0 | 2329 | 0.4% |
Other values (24) | 9098 | 1.7% |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 231288 | |
Uppercase Letter | 154192 | |
Space Separator | 77096 | 14.3% |
Decimal Number | 77096 | 14.3% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
T | 77100 | |
V | 56019 | |
F | 15864 | 10.3% |
G | 2112 | 1.4% |
O | 875 | 0.6% |
Z | 775 | 0.5% |
E | 297 | 0.2% |
X | 244 | 0.2% |
N | 195 | 0.1% |
A | 146 | 0.1% |
Other values (12) | 565 | 0.4% |
Decimal Number
Value | Count | Frequency (%) |
5 | 67918 | |
2 | 2960 | 3.8% |
0 | 2329 | 3.0% |
3 | 1404 | 1.8% |
7 | 1268 | 1.6% |
6 | 744 | 1.0% |
1 | 393 | 0.5% |
4 | 80 | 0.1% |
Lowercase Letter
Value | Count | Frequency (%) |
e | 77096 | |
y | 77096 | |
p | 77096 |
Space Separator
Value | Count | Frequency (%) |
77096 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 385480 | |
Common | 154192 | 28.6% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
T | 77100 | |
e | 77096 | |
y | 77096 | |
p | 77096 | |
V | 56019 | |
F | 15864 | 4.1% |
G | 2112 | 0.5% |
O | 875 | 0.2% |
Z | 775 | 0.2% |
E | 297 | 0.1% |
Other values (15) | 1150 | 0.3% |
Common
Value | Count | Frequency (%) |
77096 | ||
5 | 67918 | |
2 | 2960 | 1.9% |
0 | 2329 | 1.5% |
3 | 1404 | 0.9% |
7 | 1268 | 0.8% |
6 | 744 | 0.5% |
1 | 393 | 0.3% |
4 | 80 | 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 539672 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
T | 77100 | |
e | 77096 | |
77096 | ||
y | 77096 | |
p | 77096 | |
5 | 67918 | |
V | 56019 | |
F | 15864 | 2.9% |
2 | 2960 | 0.5% |
0 | 2329 | 0.4% |
Other values (24) | 9098 | 1.7% |
engine_type
Categorical
Distinct | 3 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 602.4 KiB |
Plasma | |
---|---|
Quantum | |
Nuclear | 744 |
Length
Max length | 7 |
---|---|
Median length | 6 |
Mean length | 6.4453928 |
Min length | 6 |
Characters and Unicode
Total characters | 496914 |
---|---|
Distinct characters | 13 |
Distinct categories | 2 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | Quantum |
---|---|
2nd row | Quantum |
3rd row | Quantum |
4th row | Plasma |
5th row | Plasma |
Common Values
Value | Count | Frequency (%) |
Plasma | 42758 | |
Quantum | 33594 | |
Nuclear | 744 | 1.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
plasma | 42758 | |
quantum | 33594 | |
nuclear | 744 | 1.0% |
Most occurring characters
Value | Count | Frequency (%) |
a | 119854 | |
m | 76352 | |
u | 67932 | |
l | 43502 | 8.8% |
P | 42758 | 8.6% |
s | 42758 | 8.6% |
Q | 33594 | 6.8% |
n | 33594 | 6.8% |
t | 33594 | 6.8% |
N | 744 | 0.1% |
Other values (3) | 2232 | 0.4% |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 419818 | |
Uppercase Letter | 77096 | 15.5% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
a | 119854 | |
m | 76352 | |
u | 67932 | |
l | 43502 | 10.4% |
s | 42758 | 10.2% |
n | 33594 | 8.0% |
t | 33594 | 8.0% |
c | 744 | 0.2% |
e | 744 | 0.2% |
r | 744 | 0.2% |
Uppercase Letter
Value | Count | Frequency (%) |
P | 42758 | |
Q | 33594 | |
N | 744 | 1.0% |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 496914 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
a | 119854 | |
m | 76352 | |
u | 67932 | |
l | 43502 | 8.8% |
P | 42758 | 8.6% |
s | 42758 | 8.6% |
Q | 33594 | 6.8% |
n | 33594 | 6.8% |
t | 33594 | 6.8% |
N | 744 | 0.1% |
Other values (3) | 2232 | 0.4% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 496914 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
a | 119854 | |
m | 76352 | |
u | 67932 | |
l | 43502 | 8.8% |
P | 42758 | 8.6% |
s | 42758 | 8.6% |
Q | 33594 | 6.8% |
n | 33594 | 6.8% |
t | 33594 | 6.8% |
N | 744 | 0.1% |
Other values (3) | 2232 | 0.4% |
engine_vendor
Categorical
Distinct | 5 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 602.4 KiB |
ThetaBase Services | |
---|---|
Banks, Wood and Phillips | 474 |
Warwick Technology Multinational | 214 |
SIT Technology Unlimited | 74 |
MCW Global | 59 |
Length
Max length | 32 |
---|---|
Median length | 18 |
Mean length | 18.075387 |
Min length | 10 |
Characters and Unicode
Total characters | 1393540 |
---|---|
Distinct characters | 33 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | ThetaBase Services |
---|---|
2nd row | ThetaBase Services |
3rd row | ThetaBase Services |
4th row | ThetaBase Services |
5th row | ThetaBase Services |
Common Values
Value | Count | Frequency (%) |
ThetaBase Services | 76275 | |
Banks, Wood and Phillips | 474 | 0.6% |
Warwick Technology Multinational | 214 | 0.3% |
SIT Technology Unlimited | 74 | 0.1% |
MCW Global | 59 | 0.1% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
thetabase | 76275 | |
services | 76275 | |
banks | 474 | 0.3% |
wood | 474 | 0.3% |
and | 474 | 0.3% |
phillips | 474 | 0.3% |
technology | 288 | 0.2% |
warwick | 214 | 0.1% |
multinational | 214 | 0.1% |
sit | 74 | < 0.1% |
Other values (3) | 192 | 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
e | 305462 | |
a | 154199 | |
s | 153498 | |
78332 | 5.6% | |
i | 78013 | 5.6% |
h | 77037 | 5.5% |
t | 76777 | 5.5% |
c | 76777 | 5.5% |
B | 76749 | 5.5% |
T | 76637 | 5.5% |
Other values (23) | 240059 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 1083239 | |
Uppercase Letter | 231495 | 16.6% |
Space Separator | 78332 | 5.6% |
Other Punctuation | 474 | < 0.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 305462 | |
a | 154199 | |
s | 153498 | |
i | 78013 | 7.2% |
h | 77037 | 7.1% |
t | 76777 | 7.1% |
c | 76777 | 7.1% |
r | 76489 | 7.1% |
v | 76275 | 7.0% |
l | 1856 | 0.2% |
Other values (11) | 6856 | 0.6% |
Uppercase Letter
Value | Count | Frequency (%) |
B | 76749 | |
T | 76637 | |
S | 76349 | |
W | 747 | 0.3% |
P | 474 | 0.2% |
M | 273 | 0.1% |
I | 74 | < 0.1% |
U | 74 | < 0.1% |
C | 59 | < 0.1% |
G | 59 | < 0.1% |
Space Separator
Value | Count | Frequency (%) |
78332 |
Other Punctuation
Value | Count | Frequency (%) |
, | 474 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 1314734 | |
Common | 78806 | 5.7% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
e | 305462 | |
a | 154199 | |
s | 153498 | |
i | 78013 | 5.9% |
h | 77037 | 5.9% |
t | 76777 | 5.8% |
c | 76777 | 5.8% |
B | 76749 | 5.8% |
T | 76637 | 5.8% |
r | 76489 | 5.8% |
Other values (21) | 163096 |
Common
Value | Count | Frequency (%) |
78332 | ||
, | 474 | 0.6% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 1393540 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
e | 305462 | |
a | 154199 | |
s | 153498 | |
78332 | 5.6% | |
i | 78013 | 5.6% |
h | 77037 | 5.5% |
t | 76777 | 5.5% |
c | 76777 | 5.5% |
B | 76749 | 5.5% |
T | 76637 | 5.5% |
Other values (23) | 240059 |
Distinct | 15 |
---|---|
Distinct (%) | < 0.1% |
Missing | 39 |
Missing (%) | 0.1% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1.4039607 |
Minimum | 0 |
---|---|
Maximum | 44 |
Zeros | 4448 |
Zeros (%) | 5.8% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 602.4 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 1 |
median | 1 |
Q3 | 2 |
95-th percentile | 3 |
Maximum | 44 |
Range | 44 |
Interquartile range (IQR) | 1 |
Descriptive statistics
Standard deviation | 0.90503385 |
---|---|
Coefficient of variation (CV) | 0.64462904 |
Kurtosis | 68.625371 |
Mean | 1.4039607 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 3.0697699 |
Sum | 108185 |
Variance | 0.81908626 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1 | 48742 | |
2 | 16001 | 20.8% |
3 | 5113 | 6.6% |
0 | 4448 | 5.8% |
4 | 1956 | 2.5% |
5 | 619 | 0.8% |
6 | 135 | 0.2% |
7 | 28 | < 0.1% |
8 | 6 | < 0.1% |
10 | 2 | < 0.1% |
Other values (5) | 7 | < 0.1% |
(Missing) | 39 | 0.1% |
Value | Count | Frequency (%) |
0 | 4448 | 5.8% |
1 | 48742 | |
2 | 16001 | 20.8% |
3 | 5113 | 6.6% |
4 | 1956 | 2.5% |
5 | 619 | 0.8% |
6 | 135 | 0.2% |
7 | 28 | < 0.1% |
8 | 6 | < 0.1% |
9 | 2 | < 0.1% |
Value | Count | Frequency (%) |
44 | 1 | < 0.1% |
13 | 1 | < 0.1% |
12 | 1 | < 0.1% |
11 | 2 | < 0.1% |
10 | 2 | < 0.1% |
9 | 2 | < 0.1% |
8 | 6 | < 0.1% |
7 | 28 | < 0.1% |
6 | 135 | 0.2% |
5 | 619 |
passenger_capacity
Real number (ℝ)
Distinct | 17 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 3.1550664 |
Minimum | 1 |
---|---|
Maximum | 20 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 602.4 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 2 |
median | 2 |
Q3 | 4 |
95-th percentile | 7 |
Maximum | 20 |
Range | 19 |
Interquartile range (IQR) | 2 |
Descriptive statistics
Standard deviation | 1.9725307 |
---|---|
Coefficient of variation (CV) | 0.62519466 |
Kurtosis | 4.2753533 |
Mean | 3.1550664 |
Median Absolute Deviation (MAD) | 1 |
Skewness | 1.6690889 |
Sum | 243243 |
Variance | 3.8908772 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
2 | 32858 | |
4 | 15373 | |
1 | 9710 | 12.6% |
6 | 5961 | 7.7% |
3 | 5322 | 6.9% |
5 | 3567 | 4.6% |
8 | 1771 | 2.3% |
7 | 1241 | 1.6% |
10 | 486 | 0.6% |
9 | 363 | 0.5% |
Other values (7) | 444 | 0.6% |
Value | Count | Frequency (%) |
1 | 9710 | 12.6% |
2 | 32858 | |
3 | 5322 | 6.9% |
4 | 15373 | |
5 | 3567 | 4.6% |
6 | 5961 | 7.7% |
7 | 1241 | 1.6% |
8 | 1771 | 2.3% |
9 | 363 | 0.5% |
10 | 486 | 0.6% |
Value | Count | Frequency (%) |
20 | 1 | < 0.1% |
16 | 76 | 0.1% |
15 | 15 | < 0.1% |
14 | 63 | 0.1% |
13 | 45 | 0.1% |
12 | 164 | 0.2% |
11 | 80 | 0.1% |
10 | 486 | 0.6% |
9 | 363 | 0.5% |
8 | 1771 |
cancellation_policy
Categorical
Distinct | 3 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 602.4 KiB |
strict | |
---|---|
flexible | |
moderate |
Length
Max length | 8 |
---|---|
Median length | 8 |
Mean length | 7.133444 |
Min length | 6 |
Characters and Unicode
Total characters | 549960 |
---|---|
Distinct characters | 14 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | strict |
---|---|
2nd row | strict |
3rd row | moderate |
4th row | strict |
5th row | strict |
Common Values
Value | Count | Frequency (%) |
strict | 33404 | |
flexible | 25235 | |
moderate | 18457 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
strict | 33404 | |
flexible | 25235 | |
moderate | 18457 |
Most occurring characters
Value | Count | Frequency (%) |
e | 87384 | |
t | 85265 | |
i | 58639 | |
r | 51861 | |
l | 50470 | |
s | 33404 | 6.1% |
c | 33404 | 6.1% |
f | 25235 | 4.6% |
x | 25235 | 4.6% |
b | 25235 | 4.6% |
Other values (4) | 73828 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 549960 |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 87384 | |
t | 85265 | |
i | 58639 | |
r | 51861 | |
l | 50470 | |
s | 33404 | 6.1% |
c | 33404 | 6.1% |
f | 25235 | 4.6% |
x | 25235 | 4.6% |
b | 25235 | 4.6% |
Other values (4) | 73828 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 549960 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
e | 87384 | |
t | 85265 | |
i | 58639 | |
r | 51861 | |
l | 50470 | |
s | 33404 | 6.1% |
c | 33404 | 6.1% |
f | 25235 | 4.6% |
x | 25235 | 4.6% |
b | 25235 | 4.6% |
Other values (4) | 73828 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 549960 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
e | 87384 | |
t | 85265 | |
i | 58639 | |
r | 51861 | |
l | 50470 | |
s | 33404 | 6.1% |
c | 33404 | 6.1% |
f | 25235 | 4.6% |
x | 25235 | 4.6% |
b | 25235 | 4.6% |
Other values (4) | 73828 |
Distinct | 20 |
---|---|
Distinct (%) | < 0.1% |
Missing | 149 |
Missing (%) | 0.2% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1.7413025 |
Minimum | 0 |
---|---|
Maximum | 23 |
Zeros | 1271 |
Zeros (%) | 1.6% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 602.4 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 1 |
Q1 | 1 |
median | 1 |
Q3 | 2 |
95-th percentile | 4 |
Maximum | 23 |
Range | 23 |
Interquartile range (IQR) | 1 |
Descriptive statistics
Standard deviation | 1.2399381 |
---|---|
Coefficient of variation (CV) | 0.71207512 |
Kurtosis | 14.264173 |
Mean | 1.7413025 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 2.7301788 |
Sum | 133988 |
Variance | 1.5374466 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1 | 43188 | |
2 | 18631 | |
3 | 7598 | 9.9% |
4 | 3472 | 4.5% |
5 | 1420 | 1.8% |
0 | 1271 | 1.6% |
6 | 713 | 0.9% |
7 | 289 | 0.4% |
8 | 184 | 0.2% |
9 | 61 | 0.1% |
Other values (10) | 120 | 0.2% |
(Missing) | 149 | 0.2% |
Value | Count | Frequency (%) |
0 | 1271 | 1.6% |
1 | 43188 | |
2 | 18631 | |
3 | 7598 | 9.9% |
4 | 3472 | 4.5% |
5 | 1420 | 1.8% |
6 | 713 | 0.9% |
7 | 289 | 0.4% |
8 | 184 | 0.2% |
9 | 61 | 0.1% |
Value | Count | Frequency (%) |
23 | 1 | < 0.1% |
20 | 1 | < 0.1% |
18 | 1 | < 0.1% |
16 | 16 | < 0.1% |
15 | 3 | < 0.1% |
14 | 9 | < 0.1% |
13 | 3 | < 0.1% |
12 | 20 | < 0.1% |
11 | 7 | < 0.1% |
10 | 59 |
d_check_complete
Boolean
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 75.4 KiB |
False | |
---|---|
True |
Value | Count | Frequency (%) |
False | 46762 | |
True | 30334 |
moon_clearance_complete
Boolean
Distinct | 1 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 75.4 KiB |
False |
---|
Value | Count | Frequency (%) |
False | 77096 |
price
Categorical
Distinct | 848 |
---|---|
Distinct (%) | 1.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 602.4 KiB |
$1,520.0 | 2873 |
---|---|
$2,170.0 | 2752 |
$1,390.0 | 2674 |
$1,325.0 | 2333 |
$1,260.0 | 2250 |
Other values (843) |
Length
Max length | 10 |
---|---|
Median length | 8 |
Mean length | 8.0059666 |
Min length | 6 |
Characters and Unicode
Total characters | 617228 |
---|---|
Distinct characters | 13 |
Distinct categories | 3 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 253 ? |
---|---|
Unique (%) | 0.3% |
Sample
1st row | $1,325.0 |
---|---|
2nd row | $1,780.0 |
3rd row | $1,715.0 |
4th row | $4,770.0 |
5th row | $2,820.0 |
Common Values
Value | Count | Frequency (%) |
$1,520.0 | 2873 | 3.7% |
$2,170.0 | 2752 | 3.6% |
$1,390.0 | 2674 | 3.5% |
$1,325.0 | 2333 | 3.0% |
$1,260.0 | 2250 | 2.9% |
$2,430.0 | 2205 | 2.9% |
$1,455.0 | 2150 | 2.8% |
$1,650.0 | 2131 | 2.8% |
$2,820.0 | 2042 | 2.6% |
$1,910.0 | 2042 | 2.6% |
Other values (838) | 53644 |
Length
Value | Count | Frequency (%) |
1,520.0 | 2873 | 3.7% |
2,170.0 | 2752 | 3.6% |
1,390.0 | 2674 | 3.5% |
1,325.0 | 2333 | 3.0% |
1,260.0 | 2250 | 2.9% |
2,430.0 | 2205 | 2.9% |
1,455.0 | 2150 | 2.8% |
1,650.0 | 2131 | 2.8% |
2,820.0 | 2042 | 2.6% |
1,910.0 | 2042 | 2.6% |
Other values (838) | 53644 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 121718 | |
$ | 77096 | |
. | 77096 | |
, | 77039 | |
1 | 62820 | |
2 | 44530 | 7.2% |
5 | 36385 | 5.9% |
3 | 25685 | 4.2% |
7 | 23871 | 3.9% |
4 | 23007 | 3.7% |
Other values (3) | 47981 | 7.8% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 385997 | |
Other Punctuation | 154135 | 25.0% |
Currency Symbol | 77096 | 12.5% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 121718 | |
1 | 62820 | |
2 | 44530 | 11.5% |
5 | 36385 | 9.4% |
3 | 25685 | 6.7% |
7 | 23871 | 6.2% |
4 | 23007 | 6.0% |
9 | 16649 | 4.3% |
6 | 16134 | 4.2% |
8 | 15198 | 3.9% |
Other Punctuation
Value | Count | Frequency (%) |
. | 77096 | |
, | 77039 |
Currency Symbol
Value | Count | Frequency (%) |
$ | 77096 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 617228 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 121718 | |
$ | 77096 | |
. | 77096 | |
, | 77039 | |
1 | 62820 | |
2 | 44530 | 7.2% |
5 | 36385 | 5.9% |
3 | 25685 | 4.2% |
7 | 23871 | 3.9% |
4 | 23007 | 3.7% |
Other values (3) | 47981 | 7.8% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 617228 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 121718 | |
$ | 77096 | |
. | 77096 | |
, | 77039 | |
1 | 62820 | |
2 | 44530 | 7.2% |
5 | 36385 | 5.9% |
3 | 25685 | 4.2% |
7 | 23871 | 3.9% |
4 | 23007 | 3.7% |
Other values (3) | 47981 | 7.8% |
company_id
Real number (ℝ)
Distinct | 50098 |
---|---|
Distinct (%) | 65.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 25155.386 |
Minimum | 1 |
---|---|
Maximum | 50098 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 602.4 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 2704.75 |
Q1 | 12935.75 |
median | 25253.5 |
Q3 | 37410.25 |
95-th percentile | 47520.25 |
Maximum | 50098 |
Range | 50097 |
Interquartile range (IQR) | 24474.5 |
Descriptive statistics
Standard deviation | 14300.991 |
---|---|
Coefficient of variation (CV) | 0.5685061 |
Kurtosis | -1.1684667 |
Mean | 25155.386 |
Median Absolute Deviation (MAD) | 12241.5 |
Skewness | -0.01137149 |
Sum | 1.9393797 × 109 |
Variance | 2.0451833 × 108 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
29647 | 1086 | 1.4% |
45111 | 297 | 0.4% |
28828 | 184 | 0.2% |
32203 | 176 | 0.2% |
20334 | 167 | 0.2% |
18077 | 125 | 0.2% |
4745 | 114 | 0.1% |
10711 | 108 | 0.1% |
22721 | 106 | 0.1% |
19019 | 102 | 0.1% |
Other values (50088) | 74631 |
Value | Count | Frequency (%) |
1 | 2 | |
2 | 1 | |
3 | 1 | |
4 | 1 | |
5 | 1 | |
6 | 2 | |
7 | 1 | |
8 | 1 | |
9 | 2 | |
10 | 1 |
Value | Count | Frequency (%) |
50098 | 2 | |
50097 | 1 | |
50096 | 1 | |
50095 | 1 | |
50094 | 1 | |
50093 | 1 | |
50092 | 1 | |
50091 | 1 | |
50090 | 1 | |
50089 | 2 |
Auto
The auto setting is an interpretable pairwise column metric of the following mapping:- Variable_type-Variable_type : Method, Range
- Categorical-Categorical : Cramer's V, [0,1]
- Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
- Numerical-Numerical : Spearman's ρ, [-1,1]
This configuration uses the recommended metric for each pair of columns.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.id | shuttle_location | shuttle_type | engine_type | engine_vendor | engines | passenger_capacity | cancellation_policy | crew | d_check_complete | moon_clearance_complete | price | company_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 63561 | Niue | Type V5 | Quantum | ThetaBase Services | 1.0 | 2 | strict | 1.0 | f | f | $1,325.0 | 35029 |
1 | 36260 | Anguilla | Type V5 | Quantum | ThetaBase Services | 1.0 | 2 | strict | 1.0 | t | f | $1,780.0 | 30292 |
2 | 57015 | Russian Federation | Type V5 | Quantum | ThetaBase Services | 1.0 | 2 | moderate | 0.0 | f | f | $1,715.0 | 19032 |
3 | 14035 | Barbados | Type V5 | Plasma | ThetaBase Services | 3.0 | 6 | strict | 3.0 | f | f | $4,770.0 | 8238 |
4 | 10036 | Sao Tome and Principe | Type V2 | Plasma | ThetaBase Services | 2.0 | 4 | strict | 2.0 | f | f | $2,820.0 | 30342 |
5 | 45163 | Sao Tome and Principe | Type V5 | Plasma | ThetaBase Services | 2.0 | 4 | moderate | 2.0 | f | f | $1,715.0 | 32413 |
6 | 64643 | Faroe Islands | Type F5 | Quantum | ThetaBase Services | 1.0 | 2 | strict | 1.0 | t | f | $1,247.0 | 35620 |
7 | 23389 | Micronesia | Type V5 | Quantum | ThetaBase Services | 1.0 | 1 | moderate | 1.0 | f | f | $1,845.0 | 23820 |
8 | 39934 | Rwanda | Type V5 | Quantum | ThetaBase Services | 1.0 | 3 | strict | 2.0 | f | f | $1,520.0 | 46528 |
9 | 57063 | Faroe Islands | Type F5 | Plasma | ThetaBase Services | 4.0 | 8 | strict | 5.0 | f | f | $3,275.0 | 11875 |
id | shuttle_location | shuttle_type | engine_type | engine_vendor | engines | passenger_capacity | cancellation_policy | crew | d_check_complete | moon_clearance_complete | price | company_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
77086 | 55187 | Malta | Type V5 | Plasma | ThetaBase Services | 0.0 | 4 | flexible | 2.0 | t | f | $1,520.0 | 15249 |
77087 | 46301 | United Kingdom | Type V5 | Quantum | ThetaBase Services | 1.0 | 2 | strict | 1.0 | t | f | $1,455.0 | 44431 |
77088 | 54977 | Nicaragua | Type V5 | Quantum | ThetaBase Services | 1.0 | 1 | flexible | 1.0 | t | f | $1,364.0 | 25724 |
77089 | 51748 | Uzbekistan | Type V5 | Quantum | ThetaBase Services | 1.0 | 2 | flexible | 1.0 | t | f | $1,325.0 | 32743 |
77090 | 44668 | Rwanda | Type F5 | Quantum | ThetaBase Services | 1.0 | 1 | strict | 1.0 | t | f | $1,260.0 | 19010 |
77091 | 4368 | Barbados | Type V5 | Quantum | ThetaBase Services | 2.0 | 4 | flexible | 2.0 | t | f | $4,107.0 | 6654 |
77092 | 2983 | Bouvet Island (Bouvetoya) | Type F5 | Quantum | ThetaBase Services | 1.0 | 1 | flexible | 1.0 | t | f | $1,169.0 | 8000 |
77093 | 69684 | Micronesia | Type V5 | Plasma | ThetaBase Services | 0.0 | 2 | flexible | 1.0 | t | f | $1,910.0 | 14296 |
77094 | 21738 | Uzbekistan | Type V5 | Plasma | ThetaBase Services | 1.0 | 2 | flexible | 1.0 | t | f | $2,170.0 | 27363 |
77095 | 72645 | Malta | Type F5 | Quantum | ThetaBase Services | 0.0 | 2 | moderate | 2.0 | t | f | $1,455.0 | 12542 |