Virtual Reality
SIGNIFICANCE OF EDA
➤ In science, economics, engineering, and marketing, large amounts of data are stored in electronic databases. Decisions should be made based on collected data.
➤ Datasets with many data points are hard to understand without computer programs. To gain insights and make further decisions, data mining is performed, which includes different analysis processes.
➤ Exploratory Data Analysis (EDA) is the first step in data mining. It helps to visualize data, understand it, and create hypotheses for further analysis. EDA creates a summary of data or insights for the next steps without assumptions.
➤ Data scientists use EDA to understand what type of modeling and hypotheses can be developed. Main components include summarizing data, statistical analysis, and visualization.
➤ Python tools for EDA:
- Pandas – summarizing
- Scipy – statistical analysis
- Matplotlib, Plotly – visualization
STEPS IN EDA
-
Problem Definition
- Define the business problem before extracting insights.
- Tasks include:
o defining objectives
o defining deliverables
o outlining roles and responsibilities
o checking current data status
o defining timetable and cost/benefit analysis - Based on this, an execution plan is created.
-
Data Preparation
- Prepare dataset before analysis.
- Tasks include:
o defining data sources
o defining schemas and tables
o understanding characteristics of data
o cleaning dataset
o deleting irrelevant data
o transforming data
o dividing data into chunks for analysis
-
Data Analysis
-
Involves descriptive statistics and analysis.
-
Tasks include:
➤ summarizing data
➤ finding hidden correlations
➤ identifying relationships
➤ developing predictive models
➤ evaluating models and calculating accuracies -
Techniques used for summarization:
• Summary Tables
• Graphs
• Descriptive Statistics
• Inferential Statistics
• Correlation Statistics
• Searching
• Grouping
• Mathematical Models
-
-
Development and Representation of Results
- Present results to stakeholders in an easy-to-understand form.
- Use graphs, summary tables, maps, diagrams.
- Graphical techniques include:
• Scatter plots
• Character plots
• Histograms
• Box plots
• Residual plots
• Mean plots