Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 77096 |
| Missing cells | 188 |
| Missing cells (%) | < 0.1% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 7.6 MiB |
| Average record size in memory | 104.0 B |
Variable types
| Numeric | 5 |
|---|---|
| Categorical | 6 |
| Boolean | 2 |
moon_clearance_complete has constant value "False" | Constant |
price has a high cardinality: 848 distinct values | High cardinality |
engines is highly overall correlated with crew | High correlation |
passenger_capacity is highly overall correlated with engine_type and 1 other fields | High correlation |
crew is highly overall correlated with engines and 1 other fields | High correlation |
shuttle_type is highly overall correlated with engine_type | High correlation |
engine_type is highly overall correlated with shuttle_type and 1 other fields | High correlation |
id is uniformly distributed | Uniform |
id has unique values | Unique |
engines has 4448 (5.8%) zeros | Zeros |
crew has 1271 (1.6%) zeros | Zeros |
Reproduction
| Analysis started | 2022-11-24 10:37:51.080167 |
|---|---|
| Analysis finished | 2022-11-24 10:38:00.491029 |
| Duration | 9.41 seconds |
| Software version | pandas-profiling vv3.5.0 |
| Download configuration | config.json |
| Distinct | 77096 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38548.5 |
| Minimum | 1 |
|---|---|
| Maximum | 77096 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 602.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 3855.75 |
| Q1 | 19274.75 |
| median | 38548.5 |
| Q3 | 57822.25 |
| 95-th percentile | 73241.25 |
| Maximum | 77096 |
| Range | 77095 |
| Interquartile range (IQR) | 38547.5 |
Descriptive statistics
| Standard deviation | 22255.843 |
|---|---|
| Coefficient of variation (CV) | 0.57734652 |
| Kurtosis | -1.2 |
| Mean | 38548.5 |
| Median Absolute Deviation (MAD) | 19274 |
| Skewness | 1.7649979 × 10-18 |
| Sum | 2.9719352 × 109 |
| Variance | 4.9532253 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 63561 | 1 | < 0.1% |
| 25124 | 1 | < 0.1% |
| 44449 | 1 | < 0.1% |
| 4748 | 1 | < 0.1% |
| 30795 | 1 | < 0.1% |
| 8614 | 1 | < 0.1% |
| 62491 | 1 | < 0.1% |
| 40429 | 1 | < 0.1% |
| 20507 | 1 | < 0.1% |
| 24295 | 1 | < 0.1% |
| Other values (77086) | 77086 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 |
| Value | Count | Frequency (%) |
| 77096 | 1 | |
| 77095 | 1 | |
| 77094 | 1 | |
| 77093 | 1 | |
| 77092 | 1 | |
| 77091 | 1 | |
| 77090 | 1 | |
| 77089 | 1 | |
| 77088 | 1 | |
| 77087 | 1 |
shuttle_location
Categorical
| Distinct | 30 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 602.4 KiB |
| Malta | |
|---|---|
| Barbados | |
| Rwanda | |
| Bouvet Island (Bouvetoya) | |
| United Kingdom | |
| Other values (25) |
Length
| Max length | 25 |
|---|---|
| Median length | 18 |
| Mean length | 10.900215 |
| Min length | 4 |
Characters and Unicode
| Total characters | 840363 |
|---|---|
| Distinct characters | 44 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Niue |
|---|---|
| 2nd row | Anguilla |
| 3rd row | Russian Federation |
| 4th row | Barbados |
| 5th row | Sao Tome and Principe |
Common Values
| Value | Count | Frequency (%) |
| Malta | 8487 | |
| Barbados | 8328 | |
| Rwanda | 7513 | |
| Bouvet Island (Bouvetoya) | 5907 | 7.7% |
| United Kingdom | 5551 | 7.2% |
| Micronesia | 5466 | 7.1% |
| Nicaragua | 5461 | 7.1% |
| Russian Federation | 4759 | 6.2% |
| Niue | 4299 | 5.6% |
| Sao Tome and Principe | 3916 | 5.1% |
| Other values (20) | 17409 |
Length
| Value | Count | Frequency (%) |
| malta | 8487 | 7.1% |
| barbados | 8328 | 7.0% |
| rwanda | 7513 | 6.3% |
| bouvet | 5907 | 4.9% |
| island | 5907 | 4.9% |
| bouvetoya | 5907 | 4.9% |
| united | 5551 | 4.6% |
| kingdom | 5551 | 4.6% |
| and | 5476 | 4.6% |
| micronesia | 5466 | 4.6% |
| Other values (34) | 55318 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 126716 | |
| i | 65706 | 7.8% |
| n | 62245 | 7.4% |
| o | 57518 | 6.8% |
| e | 51891 | 6.2% |
| d | 47389 | 5.6% |
| s | 42627 | 5.1% |
| 42315 | 5.0% | |
| t | 38865 | 4.6% |
| r | 32646 | 3.9% |
| Other values (34) | 272445 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 672827 | |
| Uppercase Letter | 113407 | 13.5% |
| Space Separator | 42315 | 5.0% |
| Open Punctuation | 5907 | 0.7% |
| Close Punctuation | 5907 | 0.7% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 126716 | |
| i | 65706 | |
| n | 62245 | |
| o | 57518 | |
| e | 51891 | |
| d | 47389 | 7.0% |
| s | 42627 | 6.3% |
| t | 38865 | 5.8% |
| r | 32646 | 4.9% |
| u | 30594 | 4.5% |
| Other values (14) | 116630 |
Uppercase Letter
| Value | Count | Frequency (%) |
| B | 21103 | |
| M | 15317 | |
| R | 14337 | |
| N | 9760 | |
| I | 9662 | |
| U | 7371 | 6.5% |
| F | 7277 | 6.4% |
| K | 6697 | 5.9% |
| S | 4778 | 4.2% |
| P | 4370 | 3.9% |
| Other values (7) | 12735 |
Space Separator
| Value | Count | Frequency (%) |
| 42315 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 5907 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 5907 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 786234 | |
| Common | 54129 | 6.4% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 126716 | |
| i | 65706 | 8.4% |
| n | 62245 | 7.9% |
| o | 57518 | 7.3% |
| e | 51891 | 6.6% |
| d | 47389 | 6.0% |
| s | 42627 | 5.4% |
| t | 38865 | 4.9% |
| r | 32646 | 4.2% |
| u | 30594 | 3.9% |
| Other values (31) | 230037 |
Common
| Value | Count | Frequency (%) |
| 42315 | ||
| ( | 5907 | 10.9% |
| ) | 5907 | 10.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 840363 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 126716 | |
| i | 65706 | 7.8% |
| n | 62245 | 7.4% |
| o | 57518 | 6.8% |
| e | 51891 | 6.2% |
| d | 47389 | 5.6% |
| s | 42627 | 5.1% |
| 42315 | 5.0% | |
| t | 38865 | 4.6% |
| r | 32646 | 3.9% |
| Other values (34) | 272445 |
shuttle_type
Categorical
| Distinct | 42 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 602.4 KiB |
| Type V5 | |
|---|---|
| Type F5 | |
| Type V2 | 2932 |
| Type G0 | 2112 |
| Type V7 | 940 |
| Other values (37) | 3329 |
Length
| Max length | 7 |
|---|---|
| Median length | 7 |
| Mean length | 7 |
| Min length | 7 |
Characters and Unicode
| Total characters | 539672 |
|---|---|
| Distinct characters | 34 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 8 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | Type V5 |
|---|---|
| 2nd row | Type V5 |
| 3rd row | Type V5 |
| 4th row | Type V5 |
| 5th row | Type V2 |
Common Values
| Value | Count | Frequency (%) |
| Type V5 | 52147 | |
| Type F5 | 15636 | 20.3% |
| Type V2 | 2932 | 3.8% |
| Type G0 | 2112 | 2.7% |
| Type V7 | 940 | 1.2% |
| Type O3 | 863 | 1.1% |
| Type Z6 | 734 | 1.0% |
| Type E3 | 297 | 0.4% |
| Type X3 | 244 | 0.3% |
| Type F1 | 227 | 0.3% |
| Other values (32) | 964 | 1.3% |
Length
| Value | Count | Frequency (%) |
| type | 77096 | |
| v5 | 52147 | |
| f5 | 15636 | 10.1% |
| v2 | 2932 | 1.9% |
| g0 | 2112 | 1.4% |
| v7 | 940 | 0.6% |
| o3 | 863 | 0.6% |
| z6 | 734 | 0.5% |
| e3 | 297 | 0.2% |
| x3 | 244 | 0.2% |
| Other values (33) | 1191 | 0.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| T | 77100 | |
| e | 77096 | |
| 77096 | ||
| y | 77096 | |
| p | 77096 | |
| 5 | 67918 | |
| V | 56019 | |
| F | 15864 | 2.9% |
| 2 | 2960 | 0.5% |
| 0 | 2329 | 0.4% |
| Other values (24) | 9098 | 1.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 231288 | |
| Uppercase Letter | 154192 | |
| Space Separator | 77096 | 14.3% |
| Decimal Number | 77096 | 14.3% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| T | 77100 | |
| V | 56019 | |
| F | 15864 | 10.3% |
| G | 2112 | 1.4% |
| O | 875 | 0.6% |
| Z | 775 | 0.5% |
| E | 297 | 0.2% |
| X | 244 | 0.2% |
| N | 195 | 0.1% |
| A | 146 | 0.1% |
| Other values (12) | 565 | 0.4% |
Decimal Number
| Value | Count | Frequency (%) |
| 5 | 67918 | |
| 2 | 2960 | 3.8% |
| 0 | 2329 | 3.0% |
| 3 | 1404 | 1.8% |
| 7 | 1268 | 1.6% |
| 6 | 744 | 1.0% |
| 1 | 393 | 0.5% |
| 4 | 80 | 0.1% |
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 77096 | |
| y | 77096 | |
| p | 77096 |
Space Separator
| Value | Count | Frequency (%) |
| 77096 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 385480 | |
| Common | 154192 | 28.6% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| T | 77100 | |
| e | 77096 | |
| y | 77096 | |
| p | 77096 | |
| V | 56019 | |
| F | 15864 | 4.1% |
| G | 2112 | 0.5% |
| O | 875 | 0.2% |
| Z | 775 | 0.2% |
| E | 297 | 0.1% |
| Other values (15) | 1150 | 0.3% |
Common
| Value | Count | Frequency (%) |
| 77096 | ||
| 5 | 67918 | |
| 2 | 2960 | 1.9% |
| 0 | 2329 | 1.5% |
| 3 | 1404 | 0.9% |
| 7 | 1268 | 0.8% |
| 6 | 744 | 0.5% |
| 1 | 393 | 0.3% |
| 4 | 80 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 539672 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| T | 77100 | |
| e | 77096 | |
| 77096 | ||
| y | 77096 | |
| p | 77096 | |
| 5 | 67918 | |
| V | 56019 | |
| F | 15864 | 2.9% |
| 2 | 2960 | 0.5% |
| 0 | 2329 | 0.4% |
| Other values (24) | 9098 | 1.7% |
engine_type
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 602.4 KiB |
| Plasma | |
|---|---|
| Quantum | |
| Nuclear | 744 |
Length
| Max length | 7 |
|---|---|
| Median length | 6 |
| Mean length | 6.4453928 |
| Min length | 6 |
Characters and Unicode
| Total characters | 496914 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Quantum |
|---|---|
| 2nd row | Quantum |
| 3rd row | Quantum |
| 4th row | Plasma |
| 5th row | Plasma |
Common Values
| Value | Count | Frequency (%) |
| Plasma | 42758 | |
| Quantum | 33594 | |
| Nuclear | 744 | 1.0% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| plasma | 42758 | |
| quantum | 33594 | |
| nuclear | 744 | 1.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 119854 | |
| m | 76352 | |
| u | 67932 | |
| l | 43502 | 8.8% |
| P | 42758 | 8.6% |
| s | 42758 | 8.6% |
| Q | 33594 | 6.8% |
| n | 33594 | 6.8% |
| t | 33594 | 6.8% |
| N | 744 | 0.1% |
| Other values (3) | 2232 | 0.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 419818 | |
| Uppercase Letter | 77096 | 15.5% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 119854 | |
| m | 76352 | |
| u | 67932 | |
| l | 43502 | 10.4% |
| s | 42758 | 10.2% |
| n | 33594 | 8.0% |
| t | 33594 | 8.0% |
| c | 744 | 0.2% |
| e | 744 | 0.2% |
| r | 744 | 0.2% |
Uppercase Letter
| Value | Count | Frequency (%) |
| P | 42758 | |
| Q | 33594 | |
| N | 744 | 1.0% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 496914 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 119854 | |
| m | 76352 | |
| u | 67932 | |
| l | 43502 | 8.8% |
| P | 42758 | 8.6% |
| s | 42758 | 8.6% |
| Q | 33594 | 6.8% |
| n | 33594 | 6.8% |
| t | 33594 | 6.8% |
| N | 744 | 0.1% |
| Other values (3) | 2232 | 0.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 496914 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 119854 | |
| m | 76352 | |
| u | 67932 | |
| l | 43502 | 8.8% |
| P | 42758 | 8.6% |
| s | 42758 | 8.6% |
| Q | 33594 | 6.8% |
| n | 33594 | 6.8% |
| t | 33594 | 6.8% |
| N | 744 | 0.1% |
| Other values (3) | 2232 | 0.4% |
engine_vendor
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 602.4 KiB |
| ThetaBase Services | |
|---|---|
| Banks, Wood and Phillips | 474 |
| Warwick Technology Multinational | 214 |
| SIT Technology Unlimited | 74 |
| MCW Global | 59 |
Length
| Max length | 32 |
|---|---|
| Median length | 18 |
| Mean length | 18.075387 |
| Min length | 10 |
Characters and Unicode
| Total characters | 1393540 |
|---|---|
| Distinct characters | 33 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | ThetaBase Services |
|---|---|
| 2nd row | ThetaBase Services |
| 3rd row | ThetaBase Services |
| 4th row | ThetaBase Services |
| 5th row | ThetaBase Services |
Common Values
| Value | Count | Frequency (%) |
| ThetaBase Services | 76275 | |
| Banks, Wood and Phillips | 474 | 0.6% |
| Warwick Technology Multinational | 214 | 0.3% |
| SIT Technology Unlimited | 74 | 0.1% |
| MCW Global | 59 | 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| thetabase | 76275 | |
| services | 76275 | |
| banks | 474 | 0.3% |
| wood | 474 | 0.3% |
| and | 474 | 0.3% |
| phillips | 474 | 0.3% |
| technology | 288 | 0.2% |
| warwick | 214 | 0.1% |
| multinational | 214 | 0.1% |
| sit | 74 | < 0.1% |
| Other values (3) | 192 | 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 305462 | |
| a | 154199 | |
| s | 153498 | |
| 78332 | 5.6% | |
| i | 78013 | 5.6% |
| h | 77037 | 5.5% |
| t | 76777 | 5.5% |
| c | 76777 | 5.5% |
| B | 76749 | 5.5% |
| T | 76637 | 5.5% |
| Other values (23) | 240059 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1083239 | |
| Uppercase Letter | 231495 | 16.6% |
| Space Separator | 78332 | 5.6% |
| Other Punctuation | 474 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 305462 | |
| a | 154199 | |
| s | 153498 | |
| i | 78013 | 7.2% |
| h | 77037 | 7.1% |
| t | 76777 | 7.1% |
| c | 76777 | 7.1% |
| r | 76489 | 7.1% |
| v | 76275 | 7.0% |
| l | 1856 | 0.2% |
| Other values (11) | 6856 | 0.6% |
Uppercase Letter
| Value | Count | Frequency (%) |
| B | 76749 | |
| T | 76637 | |
| S | 76349 | |
| W | 747 | 0.3% |
| P | 474 | 0.2% |
| M | 273 | 0.1% |
| I | 74 | < 0.1% |
| U | 74 | < 0.1% |
| C | 59 | < 0.1% |
| G | 59 | < 0.1% |
Space Separator
| Value | Count | Frequency (%) |
| 78332 |
Other Punctuation
| Value | Count | Frequency (%) |
| , | 474 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1314734 | |
| Common | 78806 | 5.7% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 305462 | |
| a | 154199 | |
| s | 153498 | |
| i | 78013 | 5.9% |
| h | 77037 | 5.9% |
| t | 76777 | 5.8% |
| c | 76777 | 5.8% |
| B | 76749 | 5.8% |
| T | 76637 | 5.8% |
| r | 76489 | 5.8% |
| Other values (21) | 163096 |
Common
| Value | Count | Frequency (%) |
| 78332 | ||
| , | 474 | 0.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1393540 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 305462 | |
| a | 154199 | |
| s | 153498 | |
| 78332 | 5.6% | |
| i | 78013 | 5.6% |
| h | 77037 | 5.5% |
| t | 76777 | 5.5% |
| c | 76777 | 5.5% |
| B | 76749 | 5.5% |
| T | 76637 | 5.5% |
| Other values (23) | 240059 |
| Distinct | 15 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 39 |
| Missing (%) | 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.4039607 |
| Minimum | 0 |
|---|---|
| Maximum | 44 |
| Zeros | 4448 |
| Zeros (%) | 5.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 602.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 3 |
| Maximum | 44 |
| Range | 44 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.90503385 |
|---|---|
| Coefficient of variation (CV) | 0.64462904 |
| Kurtosis | 68.625371 |
| Mean | 1.4039607 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.0697699 |
| Sum | 108185 |
| Variance | 0.81908626 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 48742 | |
| 2 | 16001 | 20.8% |
| 3 | 5113 | 6.6% |
| 0 | 4448 | 5.8% |
| 4 | 1956 | 2.5% |
| 5 | 619 | 0.8% |
| 6 | 135 | 0.2% |
| 7 | 28 | < 0.1% |
| 8 | 6 | < 0.1% |
| 10 | 2 | < 0.1% |
| Other values (5) | 7 | < 0.1% |
| (Missing) | 39 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 4448 | 5.8% |
| 1 | 48742 | |
| 2 | 16001 | 20.8% |
| 3 | 5113 | 6.6% |
| 4 | 1956 | 2.5% |
| 5 | 619 | 0.8% |
| 6 | 135 | 0.2% |
| 7 | 28 | < 0.1% |
| 8 | 6 | < 0.1% |
| 9 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 44 | 1 | < 0.1% |
| 13 | 1 | < 0.1% |
| 12 | 1 | < 0.1% |
| 11 | 2 | < 0.1% |
| 10 | 2 | < 0.1% |
| 9 | 2 | < 0.1% |
| 8 | 6 | < 0.1% |
| 7 | 28 | < 0.1% |
| 6 | 135 | 0.2% |
| 5 | 619 |
passenger_capacity
Real number (ℝ)
| Distinct | 17 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.1550664 |
| Minimum | 1 |
|---|---|
| Maximum | 20 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 602.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 2 |
| Q3 | 4 |
| 95-th percentile | 7 |
| Maximum | 20 |
| Range | 19 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.9725307 |
|---|---|
| Coefficient of variation (CV) | 0.62519466 |
| Kurtosis | 4.2753533 |
| Mean | 3.1550664 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 1.6690889 |
| Sum | 243243 |
| Variance | 3.8908772 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 32858 | |
| 4 | 15373 | |
| 1 | 9710 | 12.6% |
| 6 | 5961 | 7.7% |
| 3 | 5322 | 6.9% |
| 5 | 3567 | 4.6% |
| 8 | 1771 | 2.3% |
| 7 | 1241 | 1.6% |
| 10 | 486 | 0.6% |
| 9 | 363 | 0.5% |
| Other values (7) | 444 | 0.6% |
| Value | Count | Frequency (%) |
| 1 | 9710 | 12.6% |
| 2 | 32858 | |
| 3 | 5322 | 6.9% |
| 4 | 15373 | |
| 5 | 3567 | 4.6% |
| 6 | 5961 | 7.7% |
| 7 | 1241 | 1.6% |
| 8 | 1771 | 2.3% |
| 9 | 363 | 0.5% |
| 10 | 486 | 0.6% |
| Value | Count | Frequency (%) |
| 20 | 1 | < 0.1% |
| 16 | 76 | 0.1% |
| 15 | 15 | < 0.1% |
| 14 | 63 | 0.1% |
| 13 | 45 | 0.1% |
| 12 | 164 | 0.2% |
| 11 | 80 | 0.1% |
| 10 | 486 | 0.6% |
| 9 | 363 | 0.5% |
| 8 | 1771 |
cancellation_policy
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 602.4 KiB |
| strict | |
|---|---|
| flexible | |
| moderate |
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 7.133444 |
| Min length | 6 |
Characters and Unicode
| Total characters | 549960 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | strict |
|---|---|
| 2nd row | strict |
| 3rd row | moderate |
| 4th row | strict |
| 5th row | strict |
Common Values
| Value | Count | Frequency (%) |
| strict | 33404 | |
| flexible | 25235 | |
| moderate | 18457 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| strict | 33404 | |
| flexible | 25235 | |
| moderate | 18457 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 87384 | |
| t | 85265 | |
| i | 58639 | |
| r | 51861 | |
| l | 50470 | |
| s | 33404 | 6.1% |
| c | 33404 | 6.1% |
| f | 25235 | 4.6% |
| x | 25235 | 4.6% |
| b | 25235 | 4.6% |
| Other values (4) | 73828 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 549960 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 87384 | |
| t | 85265 | |
| i | 58639 | |
| r | 51861 | |
| l | 50470 | |
| s | 33404 | 6.1% |
| c | 33404 | 6.1% |
| f | 25235 | 4.6% |
| x | 25235 | 4.6% |
| b | 25235 | 4.6% |
| Other values (4) | 73828 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 549960 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 87384 | |
| t | 85265 | |
| i | 58639 | |
| r | 51861 | |
| l | 50470 | |
| s | 33404 | 6.1% |
| c | 33404 | 6.1% |
| f | 25235 | 4.6% |
| x | 25235 | 4.6% |
| b | 25235 | 4.6% |
| Other values (4) | 73828 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 549960 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 87384 | |
| t | 85265 | |
| i | 58639 | |
| r | 51861 | |
| l | 50470 | |
| s | 33404 | 6.1% |
| c | 33404 | 6.1% |
| f | 25235 | 4.6% |
| x | 25235 | 4.6% |
| b | 25235 | 4.6% |
| Other values (4) | 73828 |
| Distinct | 20 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 149 |
| Missing (%) | 0.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.7413025 |
| Minimum | 0 |
|---|---|
| Maximum | 23 |
| Zeros | 1271 |
| Zeros (%) | 1.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 602.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 4 |
| Maximum | 23 |
| Range | 23 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.2399381 |
|---|---|
| Coefficient of variation (CV) | 0.71207512 |
| Kurtosis | 14.264173 |
| Mean | 1.7413025 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.7301788 |
| Sum | 133988 |
| Variance | 1.5374466 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 43188 | |
| 2 | 18631 | |
| 3 | 7598 | 9.9% |
| 4 | 3472 | 4.5% |
| 5 | 1420 | 1.8% |
| 0 | 1271 | 1.6% |
| 6 | 713 | 0.9% |
| 7 | 289 | 0.4% |
| 8 | 184 | 0.2% |
| 9 | 61 | 0.1% |
| Other values (10) | 120 | 0.2% |
| (Missing) | 149 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 1271 | 1.6% |
| 1 | 43188 | |
| 2 | 18631 | |
| 3 | 7598 | 9.9% |
| 4 | 3472 | 4.5% |
| 5 | 1420 | 1.8% |
| 6 | 713 | 0.9% |
| 7 | 289 | 0.4% |
| 8 | 184 | 0.2% |
| 9 | 61 | 0.1% |
| Value | Count | Frequency (%) |
| 23 | 1 | < 0.1% |
| 20 | 1 | < 0.1% |
| 18 | 1 | < 0.1% |
| 16 | 16 | < 0.1% |
| 15 | 3 | < 0.1% |
| 14 | 9 | < 0.1% |
| 13 | 3 | < 0.1% |
| 12 | 20 | < 0.1% |
| 11 | 7 | < 0.1% |
| 10 | 59 |
d_check_complete
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 75.4 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 46762 | |
| True | 30334 |
moon_clearance_complete
Boolean
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 75.4 KiB |
| False |
|---|
| Value | Count | Frequency (%) |
| False | 77096 |
price
Categorical
| Distinct | 848 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 602.4 KiB |
| $1,520.0 | 2873 |
|---|---|
| $2,170.0 | 2752 |
| $1,390.0 | 2674 |
| $1,325.0 | 2333 |
| $1,260.0 | 2250 |
| Other values (843) |
Length
| Max length | 10 |
|---|---|
| Median length | 8 |
| Mean length | 8.0059666 |
| Min length | 6 |
Characters and Unicode
| Total characters | 617228 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 3 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 253 ? |
|---|---|
| Unique (%) | 0.3% |
Sample
| 1st row | $1,325.0 |
|---|---|
| 2nd row | $1,780.0 |
| 3rd row | $1,715.0 |
| 4th row | $4,770.0 |
| 5th row | $2,820.0 |
Common Values
| Value | Count | Frequency (%) |
| $1,520.0 | 2873 | 3.7% |
| $2,170.0 | 2752 | 3.6% |
| $1,390.0 | 2674 | 3.5% |
| $1,325.0 | 2333 | 3.0% |
| $1,260.0 | 2250 | 2.9% |
| $2,430.0 | 2205 | 2.9% |
| $1,455.0 | 2150 | 2.8% |
| $1,650.0 | 2131 | 2.8% |
| $2,820.0 | 2042 | 2.6% |
| $1,910.0 | 2042 | 2.6% |
| Other values (838) | 53644 |
Length
| Value | Count | Frequency (%) |
| 1,520.0 | 2873 | 3.7% |
| 2,170.0 | 2752 | 3.6% |
| 1,390.0 | 2674 | 3.5% |
| 1,325.0 | 2333 | 3.0% |
| 1,260.0 | 2250 | 2.9% |
| 2,430.0 | 2205 | 2.9% |
| 1,455.0 | 2150 | 2.8% |
| 1,650.0 | 2131 | 2.8% |
| 2,820.0 | 2042 | 2.6% |
| 1,910.0 | 2042 | 2.6% |
| Other values (838) | 53644 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 121718 | |
| $ | 77096 | |
| . | 77096 | |
| , | 77039 | |
| 1 | 62820 | |
| 2 | 44530 | 7.2% |
| 5 | 36385 | 5.9% |
| 3 | 25685 | 4.2% |
| 7 | 23871 | 3.9% |
| 4 | 23007 | 3.7% |
| Other values (3) | 47981 | 7.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 385997 | |
| Other Punctuation | 154135 | 25.0% |
| Currency Symbol | 77096 | 12.5% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 121718 | |
| 1 | 62820 | |
| 2 | 44530 | 11.5% |
| 5 | 36385 | 9.4% |
| 3 | 25685 | 6.7% |
| 7 | 23871 | 6.2% |
| 4 | 23007 | 6.0% |
| 9 | 16649 | 4.3% |
| 6 | 16134 | 4.2% |
| 8 | 15198 | 3.9% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 77096 | |
| , | 77039 |
Currency Symbol
| Value | Count | Frequency (%) |
| $ | 77096 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 617228 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 121718 | |
| $ | 77096 | |
| . | 77096 | |
| , | 77039 | |
| 1 | 62820 | |
| 2 | 44530 | 7.2% |
| 5 | 36385 | 5.9% |
| 3 | 25685 | 4.2% |
| 7 | 23871 | 3.9% |
| 4 | 23007 | 3.7% |
| Other values (3) | 47981 | 7.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 617228 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 121718 | |
| $ | 77096 | |
| . | 77096 | |
| , | 77039 | |
| 1 | 62820 | |
| 2 | 44530 | 7.2% |
| 5 | 36385 | 5.9% |
| 3 | 25685 | 4.2% |
| 7 | 23871 | 3.9% |
| 4 | 23007 | 3.7% |
| Other values (3) | 47981 | 7.8% |
company_id
Real number (ℝ)
| Distinct | 50098 |
|---|---|
| Distinct (%) | 65.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 25155.386 |
| Minimum | 1 |
|---|---|
| Maximum | 50098 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 602.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2704.75 |
| Q1 | 12935.75 |
| median | 25253.5 |
| Q3 | 37410.25 |
| 95-th percentile | 47520.25 |
| Maximum | 50098 |
| Range | 50097 |
| Interquartile range (IQR) | 24474.5 |
Descriptive statistics
| Standard deviation | 14300.991 |
|---|---|
| Coefficient of variation (CV) | 0.5685061 |
| Kurtosis | -1.1684667 |
| Mean | 25155.386 |
| Median Absolute Deviation (MAD) | 12241.5 |
| Skewness | -0.01137149 |
| Sum | 1.9393797 × 109 |
| Variance | 2.0451833 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 29647 | 1086 | 1.4% |
| 45111 | 297 | 0.4% |
| 28828 | 184 | 0.2% |
| 32203 | 176 | 0.2% |
| 20334 | 167 | 0.2% |
| 18077 | 125 | 0.2% |
| 4745 | 114 | 0.1% |
| 10711 | 108 | 0.1% |
| 22721 | 106 | 0.1% |
| 19019 | 102 | 0.1% |
| Other values (50088) | 74631 |
| Value | Count | Frequency (%) |
| 1 | 2 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 2 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 2 | |
| 10 | 1 |
| Value | Count | Frequency (%) |
| 50098 | 2 | |
| 50097 | 1 | |
| 50096 | 1 | |
| 50095 | 1 | |
| 50094 | 1 | |
| 50093 | 1 | |
| 50092 | 1 | |
| 50091 | 1 | |
| 50090 | 1 | |
| 50089 | 2 |
Auto
The auto setting is an interpretable pairwise column metric of the following mapping:- Variable_type-Variable_type : Method, Range
- Categorical-Categorical : Cramer's V, [0,1]
- Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
- Numerical-Numerical : Spearman's ρ, [-1,1]
This configuration uses the recommended metric for each pair of columns.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.| id | shuttle_location | shuttle_type | engine_type | engine_vendor | engines | passenger_capacity | cancellation_policy | crew | d_check_complete | moon_clearance_complete | price | company_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 63561 | Niue | Type V5 | Quantum | ThetaBase Services | 1.0 | 2 | strict | 1.0 | f | f | $1,325.0 | 35029 |
| 1 | 36260 | Anguilla | Type V5 | Quantum | ThetaBase Services | 1.0 | 2 | strict | 1.0 | t | f | $1,780.0 | 30292 |
| 2 | 57015 | Russian Federation | Type V5 | Quantum | ThetaBase Services | 1.0 | 2 | moderate | 0.0 | f | f | $1,715.0 | 19032 |
| 3 | 14035 | Barbados | Type V5 | Plasma | ThetaBase Services | 3.0 | 6 | strict | 3.0 | f | f | $4,770.0 | 8238 |
| 4 | 10036 | Sao Tome and Principe | Type V2 | Plasma | ThetaBase Services | 2.0 | 4 | strict | 2.0 | f | f | $2,820.0 | 30342 |
| 5 | 45163 | Sao Tome and Principe | Type V5 | Plasma | ThetaBase Services | 2.0 | 4 | moderate | 2.0 | f | f | $1,715.0 | 32413 |
| 6 | 64643 | Faroe Islands | Type F5 | Quantum | ThetaBase Services | 1.0 | 2 | strict | 1.0 | t | f | $1,247.0 | 35620 |
| 7 | 23389 | Micronesia | Type V5 | Quantum | ThetaBase Services | 1.0 | 1 | moderate | 1.0 | f | f | $1,845.0 | 23820 |
| 8 | 39934 | Rwanda | Type V5 | Quantum | ThetaBase Services | 1.0 | 3 | strict | 2.0 | f | f | $1,520.0 | 46528 |
| 9 | 57063 | Faroe Islands | Type F5 | Plasma | ThetaBase Services | 4.0 | 8 | strict | 5.0 | f | f | $3,275.0 | 11875 |
| id | shuttle_location | shuttle_type | engine_type | engine_vendor | engines | passenger_capacity | cancellation_policy | crew | d_check_complete | moon_clearance_complete | price | company_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 77086 | 55187 | Malta | Type V5 | Plasma | ThetaBase Services | 0.0 | 4 | flexible | 2.0 | t | f | $1,520.0 | 15249 |
| 77087 | 46301 | United Kingdom | Type V5 | Quantum | ThetaBase Services | 1.0 | 2 | strict | 1.0 | t | f | $1,455.0 | 44431 |
| 77088 | 54977 | Nicaragua | Type V5 | Quantum | ThetaBase Services | 1.0 | 1 | flexible | 1.0 | t | f | $1,364.0 | 25724 |
| 77089 | 51748 | Uzbekistan | Type V5 | Quantum | ThetaBase Services | 1.0 | 2 | flexible | 1.0 | t | f | $1,325.0 | 32743 |
| 77090 | 44668 | Rwanda | Type F5 | Quantum | ThetaBase Services | 1.0 | 1 | strict | 1.0 | t | f | $1,260.0 | 19010 |
| 77091 | 4368 | Barbados | Type V5 | Quantum | ThetaBase Services | 2.0 | 4 | flexible | 2.0 | t | f | $4,107.0 | 6654 |
| 77092 | 2983 | Bouvet Island (Bouvetoya) | Type F5 | Quantum | ThetaBase Services | 1.0 | 1 | flexible | 1.0 | t | f | $1,169.0 | 8000 |
| 77093 | 69684 | Micronesia | Type V5 | Plasma | ThetaBase Services | 0.0 | 2 | flexible | 1.0 | t | f | $1,910.0 | 14296 |
| 77094 | 21738 | Uzbekistan | Type V5 | Plasma | ThetaBase Services | 1.0 | 2 | flexible | 1.0 | t | f | $2,170.0 | 27363 |
| 77095 | 72645 | Malta | Type F5 | Quantum | ThetaBase Services | 0.0 | 2 | moderate | 2.0 | t | f | $1,455.0 | 12542 |