Exploratory Data Analysis with Python
Gain the critical skills to visualize and analyze data using Python language and its libraries.
(EDA-PYTHON.AJ1) / ISBN : 978-1-64459-298-4About This Course
This course is all about practicing Exploratory Data Analysis with Python. You’ll learn to visualize, transform, and analyze data using Python’s powerful tools like Pandas, Seaborn, and Matplotlip. By delving into real-world datasets, you’ll discover patterns and insights that drive decision-making. Ideal for aspiring data scientists, analysts, and anyone keen to enhance their data exploration skills.
Skills You’ll Get
- Learn the basics of Exploratory Data Analysis (EDA) in Python
- Use Python libraries like Pandas, Seaborn, and Matplotlib for data analysis
- Visualize data with various types of charts and graphs
- Transform and clean datasets for analysis
- Perform statistical analysis to uncover insights
- Group and aggregate data for deeper analysis
- Analyze correlations and understand their significance
- Handle missing values and perform data imputation
- Conduct hypothesis testing and regression analysis
- Create reproducible data analysis workflows
- Implement machine learning models for data evaluation
Interactive Lessons
13+ Interactive Lessons | 47+ Exercises | 63+ Quizzes | 80+ Flashcards | 80+ Glossary of terms
Gamified TestPrep
35+ Pre Assessment Questions | 35+ Post Assessment Questions |
Hands-On Labs
77+ LiveLab | 13+ Video tutorials | 20+ Minutes
Preface
- Who this course is for?
- What this course covers?
- To get the most out of this course
- Conventions used
Exploratory Data Analysis Fundamentals
- Understanding data science
- The significance of EDA
- Making sense of data
- Comparing EDA with classical and Bayesian analysis
- Software tools available for EDA
- Getting started with EDA
- Summary
- Further reading
Visual Aids for EDA
- Technical requirements
- Line chart
- Bar charts
- Scatter plot
- Area plot and stacked plot
- Pie chart
- Table chart
- Polar chart
- Histogram
- Lollipop chart
- Choosing the best chart
- Other libraries to explore
- Summary
- Further reading
Activity: EDA with Personal Email
- Technical requirements
- Loading the dataset
- Data transformation
- Data analysis
- Summary
- Further reading
Data Transformation
- Technical requirements
- Background
- Merging database-style dataframes
- Transformation techniques
- Benefits of data transformation
- Summary
- Further reading
Descriptive Statistics
- Technical requirements
- Understanding statistics
- Measures of central tendency
- Measures of dispersion
- Summary
- Further reading
Grouping Datasets
- Technical requirements
- Understanding groupby()
- Groupby mechanics
- Data aggregation
- Pivot tables and cross-tabulations
- Summary
- Further reading
Correlation
- Technical requirements
- Introducing correlation
- Types of analysis
- Discussing multivariate analysis using the Titanic dataset
- Outlining Simpson's paradox
- Correlation does not imply causation
- Summary
- Further reading
Activity: Time Series Analysis
- Technical requirements
- Understanding the time series dataset
- TSA with Open Power System Data
- Summary
- Further reading
Hypothesis Testing and Regression
- Hypothesis testing
- p-hacking
- Understanding regression
- Model development and evaluation
- Summary
- Further reading
Model Development and Evaluation
- Technical requirements
- Types of machine learning
- Understanding supervised learning
- Understanding unsupervised learning
- Understanding reinforcement learning
- Unified machine learning workflow
- Summary
- Further reading
Activity: EDA on Wine Quality Data Analysis
- Technical requirements
- Disclosing the wine quality dataset
- Analyzing red wine
- Analyzing white wine
- Model development and evaluation
- Summary
- Further reading
Appendix
- String manipulation
- Using pandas vectorized string functions
- Using regular expressions
- Further reading
Exploratory Data Analysis Fundamentals
- Styling a Dataframe
- Applying Function to a Dataframe
- Slicing and Subsetting
- Dividing NumPy Arrays
- Inspecting NumPy Arrays
- Defining NumPy arrays
- Selecting rows
- Reading Data from a CSV File
- Creating a Dataframe
Visual Aids for EDA
- Creating a Line chart
- Creating a Bar Chart
- Creating a Scatter Plot
- Creating a Bubble Chart
- Creating an Area Plot
- Creating a Pie Chart
- Creating a Table Chart
- Creating a Polar Chart
- Adding the Best-Fit Line for the Normal Distribution
- Creating a Histogram
- Creating a Lollipop Chart
Activity: EDA with Personal Email
- Performing EDA with Email Data
- Extracting Email Using Regex
- Converting a Field to datetime
- Removing NaN Values
- Dropping a Column
Data Transformation
- Stacking a Dataframe
- Concatenating Dataframes
- Analyzing Dataframes
- Combining Dataframes
- Merging on Index
- Permuting a Dataframe
- Removing Duplicate Data
- Replacing Values
- Interpolating Missing Values
- Backward and Forward Filling
- Handling NaN values
- Counting Missing Values
- Renaming Axis Indexes
- Binning
- Detecting Outliers
Descriptive Statistics
- Generating a Binomial Distribution Plot
- Generating an Exponential Distribution Plot
- Generating a Normal Distribution Plot
- Generating a Uniform Distribution Plot
- Using Statistical Functions
- Calculating Standard Deviation
- Finding Skewness and Kurtosis
- Creating a Box Plot
- Calculating Inter-Quartile Range
Grouping Datasets
- Finding Maximum Value for Each Group
- Grouping a Dataset
- Filtering Data
- Applying Aggregation Functions
- Creating a Pivot Table
- Creating a Cross-Tabulation Table
Correlation
- Calculating Correlation Coefficient
Activity: Time Series Analysis
- Sampling the Data
- Resampling the Data
- Changing the Index of a Dataframe
Hypothesis Testing and Regression
- Performing Z-Test
- Calculating the P-Value
- Performing T-test
- Scoring the Model
- Understanding the Linear Regression Model
Model Development and Evaluation
- Using TfidfVectorizer
Activity: EDA on Wine Quality Data Analysis
- Plotting a Heatmap
- Visualizing the Data in 3D Form
Appendix
- Accessing Characters
- String Slicing
- Updating a String
- Escape Sequencing
- Formatting Strings
- Displaying Last 10 items from a Dataframe
- Using String Functions with a Dataframe
- Finding Words from a String
- Counting Full Stops using Regex
- Matching Characters
Any questions?Check out the FAQs
Find answers to common questions about our exploratory data analysis Python course.
Contact Us NowEDA in Python is a critical process in data analysis that helps in understanding the main characteristics of a dataset through visuals and statistical techniques.
- EDA: It focuses on analyzing datasets to find patterns, trends, and relationships using statistical methods. It helps to identify and discover patterns.
- Data visualization: It presents these findings visually through charts, graphs, and plots to make insights easier to understand. Data visualization helps communicate them.
Python is ideal for EDA due to its powerful libraries like Pandas, Seaborn, and Matplotlib, which make data manipulation, visualization, and analysis straightforward and efficient.
Exploratory data analysis techniques in Python help in identifying patterns, spotting anomalies, testing hypotheses, and checking assumptions, all of which are crucial steps before building predictive models. In addition, it will allow you to take on advanced projects and pursue specialized roles in your field.
The average salary for a data analyst with EDA skills ranges from $70,000 to $100,000 per year, depending on experience, location, and industry.
By mastering EDA with Python, you’ll enhance your ability to interpret and present data, making you a valuable asset in any data-driven organization. Career opportunities include roles such as Data Analyst, Data Scientist, Business Analyst, and Data Engineer, where EDA skills are highly valued.