Data transformation

Exploratory Data Analysis

1.9 Data Transformation Techniques

Data transformation techniques help convert raw data into a clean and ready-to-use dataset. Different techniques are applied depending on the project or data pipeline.

1. Data Smoothing

Removes noise from data to identify trends.
Techniques:
- Clustering: Group similar values; outliers are separated.
- Binning: Divide data into bins, smooth values within bins.
- Regression: Find relation between attributes, predict values.

2. Attribution Construction (Feature Construction)

Creates new features from existing attributes.
Example: From impressions and cost, create CPM (cost per million impressions).
Helps in comparing performance using a single metric.

3. Data Generalization

Converts low-level attributes → high-level attributes using hierarchy.
Example: Street → City → State → Country.
Useful for categorical data with large distinct values.

4. Data Aggregation

Summarizes raw data into compact form.
Example: Calculate average, sum, min, max for a given time period.
Types: Time aggregation and Spatial aggregation.

5. Data Discretization

Converts continuous data into intervals.
Example: Age → Youth, Middle-aged, Senior.
Methods: Equal-width, Equal-frequency, MDLP.
Improves efficiency of algorithms.

6. Data Normalization

Scales data into a smaller range for consistency.
Methods:
- Min-Max Normalization → Linear transformation.
- Z-Score Normalization → Based on mean and standard deviation.
- Decimal Scaling → Move decimal point of values.
Helps reduce skewness and improve algorithm performance.

7. Data Integration

Combines data from different sources into one unified view.
Sources: Databases, Data cubes, Flat files.
Approaches: Tight coupling and Loose coupling.

8. Data Manipulation

Alters or organizes data to make it readable and usable.
Helps identify patterns and generate insights.
Example: Grouping continuous age values into intervals for easier analysis.