Numerical summaries

 Numerical summaries are statistical measures used to describe the level (central tendency) and spread (dispersion) of a single variable (univariate data). Instead of using graphs, they summarize the dataset with meaningful numbers to understand its pattern and distribution.


1. Level – Measures of Central Tendency

These describe the typical or central value of the dataset.

  • Mean (Average):

    Mean=Sum of valuesNumber of values\text{Mean} = \frac{\text{Sum of values}}{\text{Number of values}}

    Example: For [10, 20, 30, 40, 50], mean = 30.

  • Median: Middle value when data is ordered.
    For odd nn, it is the middle value; for even nn, average of two middle values.
    Example: Median = 30.

  • Mode: Most frequently occurring value.
    Example: In [2, 2, 3, 4, 4, 4, 5], mode = 4.

👉 These indicate the level (center) of the dataset.




2. Spread – Measures of Dispersion

These describe how much the data varies around the center.

  • Range:

    Range=Max – Min\text{Range} = \text{Max – Min}

    Example: 50 – 10 = 40.

  • Variance: Average of squared deviations from the mean.

  • Standard Deviation (σ): Square root of variance, shows typical distance from the mean.
    Example: Std. dev ≈ 15.81 for [10, 20, 30, 40, 50].

  • Interquartile Range (IQR):

    IQR=Q3Q1IQR = Q3 - Q1

    Example: Q1 = 20, Q3 = 40 → IQR = 20.

👉 These indicate the spread (variability) of the dataset.






3. Percentiles and Quartiles

  • Percentiles: Divide data into 100 parts (e.g., 90th percentile = value below which 90% of data lies).

  • Quartiles: Divide data into 4 equal parts.

    • Q1 = 25% point

    • Q2 = Median (50%)

    • Q3 = 75% point



4. Descriptive Statistics with Pandas






Advantages in Univariate Analysis

  1. Summarizes large data into meaningful numbers.

  2. Shows both center (level) and variability (spread).

  3. Helps in comparing datasets.

  4. Forms the basis for advanced statistical and machine learning methods.