Data Statistic
This component will do some statistical work on the data, including
statistical mean, maximum and minimum, median, etc.
The indicators for each column that can be statistic are list as follow.
- count: Number of data
- sum: The sum of this column
- mean: The mean of this column
- variance/stddev: Variance and standard deviation of this column
- median: Median of this column
- min/max: Min and Max value of this column
- coefficient of variance: The formula is abs(stddev / mean)
- missing_count/missing_ratio: Number and ratio of missing value in
this column
- skewness: The definition can be referred to
here
- kurtosis: The definition can be referred to
here
- percentile: The value of percentile. Accept 0% to 100% while the
number before the "%" should be integer.
These static values can be used in feature selection as a criterion.