Exploratory Data Analysis
1.9 Data Transformation Techniques
Data transformation techniques help convert raw data into a clean and ready-to-use dataset. Different techniques are applied depending on the project or data pipeline.
1. Data Smoothing
- Removes noise from data to identify trends.
- Techniques:
- Clustering: Group similar values; outliers are separated.
- Binning: Divide data into bins, smooth values within bins.
- Regression: Find relation between attributes, predict values.
2. Attribution Construction (Feature Construction)
- Creates new features from existing attributes.
- Example: From impressions and cost, create CPM (cost per million impressions).
- Helps in comparing performance using a single metric.
3. Data Generalization
- Converts low-level attributes → high-level attributes using hierarchy.
- Example: Street → City → State → Country.
- Useful for categorical data with large distinct values.
4. Data Aggregation
- Summarizes raw data into compact form.
- Example: Calculate average, sum, min, max for a given time period.
- Types: Time aggregation and Spatial aggregation.
5. Data Discretization
- Converts continuous data into intervals.
- Example: Age → Youth, Middle-aged, Senior.
- Methods: Equal-width, Equal-frequency, MDLP.
- Improves efficiency of algorithms.
6. Data Normalization
- Scales data into a smaller range for consistency.
- Methods:
- Min-Max Normalization → Linear transformation.
- Z-Score Normalization → Based on mean and standard deviation.
- Decimal Scaling → Move decimal point of values.
- Helps reduce skewness and improve algorithm performance.
7. Data Integration
- Combines data from different sources into one unified view.
- Sources: Databases, Data cubes, Flat files.
- Approaches: Tight coupling and Loose coupling.
8. Data Manipulation
- Alters or organizes data to make it readable and usable.
- Helps identify patterns and generate insights.
- Example: Grouping continuous age values into intervals for easier analysis.