1. Resistant Lines in Bivariate Analysis

Definition:
A resistant line is a line that summarizes the relationship between two quantitative variables (bivariate data) and is not heavily influenced by outliers. It provides a robust fit even when some data points deviate greatly.

Use:
Used to describe trends in scatter plots when data have outliers or are not perfectly linear.

Example:
If we plot height vs. weight and a few data points are extreme (very tall but light), the resistant line gives a better general trend than the least squares regression line.


2. Explanatory Variable and Response Variable

Explanatory Variable (Independent Variable):
The variable that explains or influences changes in another variable.
Response Variable (Dependent Variable):
The variable that responds to changes in the explanatory variable.

Example:
In studying the effect of study hours on marks:

  • Explanatory variable → Study hours
  • Response variable → Marks scored

3. Marginal Proportion in Contingency Table

Definition:
A marginal proportion is the proportion of observations in each category of one variable, obtained by dividing the marginal total by the overall total.

Example:
If 30 out of 100 students are male, the marginal proportion for “male” = 30/100 = 0.3.


4. Uses of Contingency Table

Definition:
A contingency table is used to display the frequency distribution of two or more categorical variables.

Uses:

  1. To study relationships or associations between categorical variables.
  2. To compute conditional and marginal probabilities.
  3. To perform Chi-square tests for independence.

5. Causal Explanation

Definition:
Causal explanation identifies how one variable directly influences another variable, establishing a cause-and-effect relationship.

Example:
Increased exercise → causes → decrease in body weight.


6. Multivariate Analysis

Definition:
A statistical method that analyzes more than two variables simultaneously to understand relationships among them.

Example:
Studying how income, education, and age together affect expenditure.


7. Grouping Time Series Data

Definition:
Grouping time series data means arranging observations collected over time into intervals (e.g., yearly, monthly) for better analysis.

Example:
Daily sales data grouped into monthly totals.


8. Two Common Methods for Resampling Time Series Data

  1. Upsampling: Increasing the frequency of data (e.g., daily → hourly).
  2. Downsampling: Decreasing the frequency of data (e.g., daily → monthly).

9. Advantages of Resampling in Time Series Analysis

  1. Simplifies large datasets for easier visualization.
  2. Helps in identifying trends or seasonal patterns.
  3. Useful for aligning data with other time-based variables.

10. Common Time-Based Indexing Operations

  1. Shifting (lead/lag data).
  2. Resampling (upsampling/downsampling).
  3. Rolling or moving averages.
  4. Slicing data by date/time (e.g., selecting one year or month).

11. Inequality and Measures of Inequality

Inequality:
Refers to the unequal distribution of income, wealth, or opportunities among individuals or groups.

Measures of Inequality:

  1. Gini Coefficient – measures income inequality (0 = perfect equality, 1 = perfect inequality).
  2. Lorenz Curve – graphical representation of income distribution.

12. Univariate, Bivariate, and Multivariate Data Analysis

Type Definition Example
Univariate Analysis of a single variable Analyzing the average salary of employees
Bivariate Analysis of two variables to find a relationship Studying relation between age and income
Multivariate Analysis involving three or more variables Studying how age, income, and education affect spending

13. Explanatory, Response, and Dummy Variable

  • Explanatory Variable: Variable used to explain another variable (independent).
  • Response Variable: Variable being explained or predicted (dependent).
  • Dummy Variable: A binary variable (0 or 1) representing categories for regression models.

Example:
If gender is coded as male = 1, female = 0 → it’s a dummy variable.


14. Conventions for Constructing a Causal Path Model

Conventions:

  1. Arrows indicate direction of causal influence.
  2. Variables are placed logically — causes to the left, effects to the right.
  3. No circular causation.
  4. Variables should have a theoretical or logical basis for causality.

Example (Causal Path Model):

Age Group → Feeling Unsafe Walking Alone After Dark

Older individuals are more likely to feel unsafe →
Age Group → Feeling Unsafe

📊 Arrow from “Age Group” (cause) to “Feeling Unsafe” (effect).


15. Contingency Table, Cell Frequency, and Marginal

Contingency Table:
A table showing the frequency distribution of two or more categorical variables.

Cell Frequency:
The number of observations in each cell (intersection of categories).

Marginal:
Totals of rows or columns that represent the overall frequency of each variable.

Example:

Male Female Total
Passed 20 25 45
Failed 10 5 15
Total 30 30 60
  • Cell frequency: e.g., 20 (Males who passed)
  • Marginal: e.g., Total males = 30

16. Good Table Manners

  1. Keep tables simple and clear.
  2. Use proper headings and labels.
  3. Include totals and percentages where needed.
  4. Avoid unnecessary decimals.
  5. Maintain uniform units and spacing.

17. Type I and Type II Errors

  • Type I Error (α): Rejecting a true null hypothesis (false positive).
    Example: Concluding a medicine works when it doesn’t.
  • Type II Error (β): Failing to reject a false null hypothesis (false negative).
    Example: Concluding a medicine doesn’t work when it does.

18. Box Plot

Definition:
A graphical summary that shows the distribution of data based on five-number summary — minimum, Q1, median, Q3, and maximum.

Use:
Identifies spread, central tendency, and outliers in the data.


19. Types of Sources of Income

  1. Earned Income – from employment or business (e.g., salary).
  2. Investment Income – from dividends, interest, or rent.
  3. Transfer Income – from pensions, government aid, etc.

20. Ways to Make a Contingency Table Readable

  1. Use clear labels and consistent formatting.
  2. Include totals and percentages for clarity.
  3. Arrange categories logically (alphabetical or numerical order).