Data Visualization for Beginning Python Developers
1-Hour Introduction Lecture
Lecture Overview (1 minute)
Learning Objectives:
- Understand why data visualization matters
- Learn the three main Python visualization libraries
- Create common plot types (line, bar, scatter, histogram)
- Understand when to use different visualizations
- Build your first visualization from scratch
What we’ll cover: Matplotlib basics → Plot types → Seaborn styling → Quick intro to interactive plots
Part 1: Why Data Visualization Matters (5 minutes)
The Core Problem
Raw data is hard to understand. Numbers in tables don’t reveal patterns.
Example: Compare these two:
- A dataset with 1000 temperature readings as a CSV
- The same data as a line graph showing seasonal patterns
The graph tells the story instantly.
Three Key Reasons to Visualize
- Exploration: Discover patterns you didn’t expect
- Communication: Show stakeholders what the data means
- Verification: Spot errors or anomalies visually
Teaching Point
“Visualization is the bridge between raw numbers and human understanding. Before you build a model or write a report, visualize your data.”
Part 2: The Python Visualization Ecosystem (3 minutes)
The Three Main Libraries
Matplotlib (The Foundation)
- The oldest, most foundational library
- Low-level control, steeper learning curve
- Everything else builds on it
- Use when: You need fine-grained control, publishing static images
Seaborn (The Statistician)
- Built on top of Matplotlib
- Beautiful defaults, statistical focus
- Great for exploratory data analysis
- Use when: Working with pandas DataFrames, want quick beautiful plots
Plotly (The Interactive)
- Web-based, interactive visualizations
- Good for dashboards and presentations
- Easier for beginners (more intuitive)
- Use when: You want hover details, zooming, web-based sharing
For This Course
We’ll focus on Matplotlib (the foundation) and Seaborn (the practical tool).
Part 3: Matplotlib Fundamentals (12 minutes)
The Figure-Axes Model
Matplotlib uses a hierarchy:
- Figure: The entire window/image (think: canvas)
- Axes: The actual plot area where data appears (think: drawing surface)
- Artists: Everything you draw (lines, points, text)
Basic Pattern
import matplotlib.pyplot as plt
# Create figure and axes
fig, ax = plt.subplots()
# Draw on axes
ax.plot([1, 2, 3], [1, 4, 9])
# Customize
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_title('My First Plot')
# Show
plt.show()
Teaching Point
“Always create your figure and axes explicitly. This pattern scales from simple plots to complex multi-panel figures.”
Working with Multiple Subplots
# Create 2x2 grid of subplots
fig, axes = plt.subplots(2, 2, figsize=(10, 8))
# Flatten to 1D array for easier iteration
axes = axes.flatten()
# Plot on each
for i, ax in enumerate(axes):
ax.plot([1, 2, 3], [i, i+1, i+2])
ax.set_title(f'Plot {i+1}')
plt.tight_layout() # Prevent overlap
plt.show()
Key Matplotlib Methods
# Line plot (time series, trends)
ax.plot(x, y, 'b-', linewidth=2, label='Series A')
# Scatter plot (relationships)
ax.scatter(x, y, s=100, alpha=0.6, color='red')
# Bar plot (categories)
ax.bar(categories, values, color='green')
# Histogram (distributions)
ax.hist(data, bins=20, edgecolor='black')
# Styling elements
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_title('Title')
ax.legend()
ax.grid(True, alpha=0.3)
Part 4: When to Use Which Plot Type (8 minutes)
Line Plot
Use for: Time series, trends, continuous data Example: Stock prices over time, temperature throughout the day
import pandas as pd
import matplotlib.pyplot as plt
# Sample data
dates = pd.date_range('2024-01-01', periods=30)
prices = [100 + i + (i % 5) for i in range(30)]
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(dates, prices, linewidth=2, color='steelblue')
ax.set_xlabel('Date')
ax.set_ylabel('Price ($)')
ax.set_title('Stock Price Over Time')
ax.grid(True, alpha=0.3)
plt.show()
Teaching Point: “Line plots assume order matters. Use them when your x-axis has natural progression.”
Bar Plot
Use for: Comparing categorical values, rankings, counts Example: Sales by region, programming language popularity
fig, ax = plt.subplots(figsize=(10, 6))
languages = ['Python', 'JavaScript', 'Java', 'C++', 'Go']
popularity = [85, 72, 65, 45, 38]
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd']
ax.bar(languages, popularity, color=colors)
ax.set_ylabel('Popularity Score')
ax.set_title('Programming Language Popularity 2024')
ax.set_ylim(0, 100)
# Add value labels on bars
for i, v in enumerate(popularity):
ax.text(i, v + 2, str(v), ha='center', fontweight='bold')
plt.show()
Teaching Point: “Bars are easier to compare than scattered points. Order them by value for clarity.”
Scatter Plot
Use for: Relationships between two variables, outliers, clusters Example: House price vs. size, student study hours vs. test scores
import numpy as np
fig, ax = plt.subplots(figsize=(10, 6))
# Generate correlated data
np.random.seed(42)
hours_studied = np.random.uniform(0, 10, 100)
test_scores = hours_studied * 8 + np.random.normal(0, 5, 100)
test_scores = np.clip(test_scores, 0, 100)
ax.scatter(hours_studied, test_scores, alpha=0.6, s=100, color='steelblue')
ax.set_xlabel('Hours Studied')
ax.set_ylabel('Test Score')
ax.set_title('Study Hours vs Test Performance')
ax.grid(True, alpha=0.3)
# Add trend line
z = np.polyfit(hours_studied, test_scores, 1)
p = np.poly1d(z)
ax.plot(hours_studied, p(hours_studied), "r--", linewidth=2, label='Trend')
ax.legend()
plt.show()
Teaching Point: “Scatter plots reveal relationships but can hide trends. Add a trend line to clarify the pattern.”
Histogram
Use for: Distribution shape, frequency, data spread Example: Customer age distribution, test score grades
fig, ax = plt.subplots(figsize=(10, 6))
# Generate sample data (normally distributed)
data = np.random.normal(loc=70, scale=15, size=1000)
ax.hist(data, bins=30, color='steelblue', edgecolor='black', alpha=0.7)
ax.set_xlabel('Value')
ax.set_ylabel('Frequency')
ax.set_title('Distribution of Test Scores')
ax.axvline(data.mean(), color='red', linestyle='--', linewidth=2, label=f'Mean: {data.mean():.1f}')
ax.axvline(np.median(data), color='green', linestyle='--', linewidth=2, label=f'Median: {np.median(data):.1f}')
ax.legend()
plt.show()
Teaching Point: “Histograms show the shape of data. Watch for skew, bimodality, or outliers.”
Part 5: Introduction to Seaborn (8 minutes)
Why Seaborn?
Seaborn is a wrapper around Matplotlib with better defaults and simpler code for statistical visualization.
Basic Philosophy
“Seaborn is for exploratory analysis. Matplotlib is when you need full control.”
Common Seaborn Plots
import seaborn as sns
import pandas as pd
# Load sample data
iris = sns.load_dataset('iris') # Built-in dataset
# Set style
sns.set_theme(style="darkgrid")
# Scatter with hue (color by category)
fig, ax = plt.subplots(figsize=(10, 6))
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width',
hue='species', s=100, ax=ax)
ax.set_title('Iris Sepal Measurements')
plt.show()
# Line plot with confidence interval
fig, ax = plt.subplots(figsize=(10, 6))
sns.lineplot(data=iris, x='sepal_length', y='petal_length',
hue='species', ax=ax)
ax.set_title('Sepal vs Petal Length by Species')
plt.show()
# Box plot (distribution by category)
fig, ax = plt.subplots(figsize=(10, 6))
sns.boxplot(data=iris, x='species', y='sepal_length', ax=ax)
ax.set_title('Sepal Length Distribution by Species')
plt.show()
The hue Parameter
One of Seaborn’s superpowers: color data points by category without extra code.
# Without hue: you see relationships
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width')
# With hue: you see relationships BY GROUP
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width',
hue='species')
Styling
# Set overall theme
sns.set_theme(style="whitegrid") # or "dark", "white", "darkgrid"
# Set palette (colors)
sns.set_palette("husl") # or "Set2", "coolwarm", "rocket"
# Create plot with custom styling
fig, ax = plt.subplots(figsize=(10, 6))
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width',
hue='species', palette='Set2', s=150, ax=ax)
Part 6: Quick Interactive Preview - Plotly (4 minutes)
When to Use Plotly
Interactive visualizations for dashboards, web apps, and presentations.
Basic Example
import plotly.express as px
iris = px.data.iris()
# Interactive scatter plot
fig = px.scatter(iris, x='sepal_width', y='sepal_length',
color='species', hover_data=['petal_length'],
title='Interactive Iris Explorer')
fig.show()
# Interactive line plot
import pandas as pd
import numpy as np
dates = pd.date_range('2024-01-01', periods=30)
values = np.cumsum(np.random.randn(30))
df = pd.DataFrame({'date': dates, 'value': values})
fig = px.line(df, x='date', y='value',
title='Interactive Time Series',
hover_data={'date': '|%B %d, %Y'})
fig.show()
Why It’s Different
- Hover for exact values
- Zoom and pan
- Click legend to show/hide
- Export as PNG
- Embed in web pages
Teaching Point
“Plotly is great for final presentations. Use Matplotlib/Seaborn for exploration.”
Part 7: Hands-On Workshop (15 minutes)
Exercise 1: Your First Visualization (5 min)
Task: Create a line plot of monthly website traffic
import matplotlib.pyplot as plt
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
visitors = [5000, 6200, 5800, 7500, 8200, 9100]
# TODO: Create figure and axes
# TODO: Plot the data
# TODO: Add labels and title
# TODO: Show the plot
Solution:
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(months, visitors, marker='o', linewidth=2, color='steelblue')
ax.set_xlabel('Month')
ax.set_ylabel('Visitors')
ax.set_title('Monthly Website Traffic')
ax.grid(True, alpha=0.3)
plt.show()
Exercise 2: Bar Plot Comparison (5 min)
Task: Compare revenue across three product lines
import matplotlib.pyplot as plt
products = ['Product A', 'Product B', 'Product C']
revenue = [450000, 280000, 395000]
# TODO: Create bar plot
# TODO: Add value labels on bars
# TODO: Format y-axis as currency
Solution:
fig, ax = plt.subplots(figsize=(8, 6))
bars = ax.bar(products, revenue, color=['#1f77b4', '#ff7f0e', '#2ca02c'])
ax.set_ylabel('Revenue ($)')
ax.set_title('Revenue by Product Line')
# Add value labels
for bar in bars:
height = bar.get_height()
ax.text(bar.get_x() + bar.get_width()/2., height,
f'${height/1000:.0f}K',
ha='center', va='bottom', fontweight='bold')
# Format y-axis
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1000:.0f}K'))
plt.show()
Exercise 3: Multi-Panel Exploration (5 min)
Task: Create a 2x2 grid exploring a dataset
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes = axes.flatten()
# TODO: Create 4 different plots on each subplot
# Hint: scatter, box, histogram, and one more
Solution:
iris = sns.load_dataset('iris')
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Plot 1: Scatter
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width',
hue='species', ax=axes[0])
axes[0].set_title('Sepal Dimensions')
# Plot 2: Box plot
sns.boxplot(data=iris, x='species', y='sepal_length', ax=axes[1])
axes[1].set_title('Sepal Length Distribution')
# Plot 3: Histogram
axes[2].hist(iris['petal_length'], bins=20, color='steelblue', edgecolor='black')
axes[2].set_title('Petal Length Distribution')
axes[2].set_xlabel('Petal Length')
# Plot 4: Violin plot
sns.violinplot(data=iris, x='species', y='petal_width', ax=axes[3])
axes[3].set_title('Petal Width by Species')
plt.tight_layout()
plt.show()
Part 8: Best Practices & Common Mistakes (4 minutes)
Do’s
- Choose the right plot type for your data type
- Label everything (axes, title, legend)
- Use colors intentionally (not randomly)
- Keep it simple (don’t overdecorated)
- Test your visualization with different data
Don’ts
- Don’t use 3D plots (they’re harder to read than 2D)
- Don’t mix too many colors without meaning
- Don’t forget axis labels
- Don’t use dual axes unless absolutely necessary
- Don’t start y-axis at something other than 0 (unless there’s a good reason)
Common Mistake: Misleading Scales
# BAD: Exaggerates difference
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [98, 99, 100])
ax.set_ylim(97, 101) # Zoomed in too much
# GOOD: Shows actual proportions
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [98, 99, 100])
ax.set_ylim(0, 100) # Full context
Common Mistake: Overusing Pie Charts
# AVOID THIS
fig, ax = plt.subplots()
ax.pie(sizes, labels=labels) # Hard to compare slices
# DO THIS INSTEAD
fig, ax = plt.subplots()
ax.bar(labels, sizes) # Easy to compare
Part 9: Resources & Next Steps (1 minute)
Learning Resources
- Matplotlib Documentation: https://matplotlib.org/stable/contents.html
- Seaborn Documentation: https://seaborn.pydata.org/
- Plotly Documentation: https://plotly.com/python/
- Real Python Tutorials: Search “matplotlib” on realpython.com
Practice Datasets
- Kaggle: Free datasets for any interest
- Seaborn built-ins:
sns.load_dataset('name') - UCI ML Repository: Classic datasets
Next Steps
- Exploratory Analysis: Use visualization to understand new datasets first
- Publication Quality: Learn Matplotlib fine-tuning for papers/reports
- Dashboards: Combine multiple plots with Plotly or Streamlit
- Specialized Plots: Geographic maps, networks, 3D (when appropriate)
Project Idea
Find a dataset you care about. Create 5 different visualizations answering questions about it:
- What’s the distribution?
- Are there relationships?
- How does it compare across categories?
- Are there trends over time?
- What are outliers?
Summary: The Decision Tree
Need to explore data quickly? → Use Seaborn with Jupyter notebooks
Need fine control for publication? → Use Matplotlib with explicit figure/axes
Need interactive web visualization? → Use Plotly
Don’t know which plot type?
- Time series → Line plot
- Comparing categories → Bar plot
- Relationship between variables → Scatter plot
- Distribution shape → Histogram
- Distribution by category → Box/Violin plot
Appendix: Complete Working Example
A small project tying everything together:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
# Create sample dataset: Student performance
np.random.seed(42)
students = 150
data = {
'Study_Hours': np.random.uniform(0, 8, students),
'Sleep_Hours': np.random.uniform(4, 10, students),
'GPA': np.random.uniform(2.0, 4.0, students),
'Major': np.random.choice(['CS', 'Math', 'Physics'], students)
}
df = pd.DataFrame(data)
# Explore with visualization
sns.set_theme(style="whitegrid")
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Plot 1: Study hours vs GPA
sns.scatterplot(data=df, x='Study_Hours', y='GPA',
hue='Major', s=100, ax=axes[0, 0])
axes[0, 0].set_title('Study Hours vs GPA by Major')
# Plot 2: GPA distribution
axes[0, 1].hist(df['GPA'], bins=20, color='steelblue', edgecolor='black')
axes[0, 1].set_title('GPA Distribution')
axes[0, 1].set_xlabel('GPA')
# Plot 3: Sleep by major
sns.boxplot(data=df, x='Major', y='Sleep_Hours', ax=axes[1, 0])
axes[1, 0].set_title('Sleep Hours by Major')
# Plot 4: Study vs Sleep
sns.scatterplot(data=df, x='Sleep_Hours', y='Study_Hours',
hue='Major', s=100, ax=axes[1, 1])
axes[1, 1].set_title('Sleep Hours vs Study Hours')
plt.tight_layout()
plt.show()
# Key insights from visualization:
print(f"Average GPA: {df['GPA'].mean():.2f}")
print(f"Correlation (Study hrs, GPA): {df[['Study_Hours', 'GPA']].corr().iloc[0, 1]:.3f}")
Teaching Notes
Timing Breakdown
- Part 1-2: 8 minutes (Why + Ecosystem)
- Part 3: 12 minutes (Matplotlib fundamentals)
- Part 4: 8 minutes (Plot types with examples)
- Part 5: 8 minutes (Seaborn intro)
- Part 6: 4 minutes (Plotly preview)
- Part 7: 15 minutes (Hands-on exercises)
- Part 8: 4 minutes (Best practices)
- Part 9: 1 minute (Resources)
Interactive Elements
- Live coding: Build each example in the lecture, explain as you go
- Pause points: After each plot type, ask students which they’d use for their data
- Exercises: Have students code along for Part 7
Common Questions to Anticipate
- “Why Matplotlib if it’s harder than Seaborn?” → Answer: Foundation, control, understanding
- “Can I use Plotly for everything?” → Answer: Yes, but overkill for exploration
- “How do I save plots?” → Answer:
fig.savefig('name.png', dpi=300)
Assessment Ideas
- Have students create a visualization from their own dataset
- Quiz: “Which plot type would you use for…” questions
- Mini-project: Explore a Kaggle dataset and present 3 visualizations