Data Visualization for Beginning Python Developers

1-Hour Introduction Lecture

Lecture Overview (1 minute)

Learning Objectives:

Understand why data visualization matters
Learn the three main Python visualization libraries
Create common plot types (line, bar, scatter, histogram)
Understand when to use different visualizations
Build your first visualization from scratch

What we’ll cover: Matplotlib basics → Plot types → Seaborn styling → Quick intro to interactive plots

Part 1: Why Data Visualization Matters (5 minutes)

The Core Problem

Raw data is hard to understand. Numbers in tables don’t reveal patterns.

Example: Compare these two:

A dataset with 1000 temperature readings as a CSV
The same data as a line graph showing seasonal patterns

The graph tells the story instantly.

Three Key Reasons to Visualize

Exploration: Discover patterns you didn’t expect
Communication: Show stakeholders what the data means
Verification: Spot errors or anomalies visually

Teaching Point

“Visualization is the bridge between raw numbers and human understanding. Before you build a model or write a report, visualize your data.”

Part 2: The Python Visualization Ecosystem (3 minutes)

The Three Main Libraries

Matplotlib (The Foundation)

The oldest, most foundational library
Low-level control, steeper learning curve
Everything else builds on it
Use when: You need fine-grained control, publishing static images

Seaborn (The Statistician)

Built on top of Matplotlib
Beautiful defaults, statistical focus
Great for exploratory data analysis
Use when: Working with pandas DataFrames, want quick beautiful plots

Plotly (The Interactive)

Web-based, interactive visualizations
Good for dashboards and presentations
Easier for beginners (more intuitive)
Use when: You want hover details, zooming, web-based sharing

For This Course

We’ll focus on Matplotlib (the foundation) and Seaborn (the practical tool).

Part 3: Matplotlib Fundamentals (12 minutes)

The Figure-Axes Model

Matplotlib uses a hierarchy:

Figure: The entire window/image (think: canvas)
Axes: The actual plot area where data appears (think: drawing surface)
Artists: Everything you draw (lines, points, text)

Basic Pattern

import matplotlib.pyplot as plt

# Create figure and axes
fig, ax = plt.subplots()

# Draw on axes
ax.plot([1, 2, 3], [1, 4, 9])

# Customize
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_title('My First Plot')

# Show
plt.show()

Teaching Point

“Always create your figure and axes explicitly. This pattern scales from simple plots to complex multi-panel figures.”

Working with Multiple Subplots

# Create 2x2 grid of subplots
fig, axes = plt.subplots(2, 2, figsize=(10, 8))

# Flatten to 1D array for easier iteration
axes = axes.flatten()

# Plot on each
for i, ax in enumerate(axes):
    ax.plot([1, 2, 3], [i, i+1, i+2])
    ax.set_title(f'Plot {i+1}')

plt.tight_layout()  # Prevent overlap
plt.show()

Key Matplotlib Methods

# Line plot (time series, trends)
ax.plot(x, y, 'b-', linewidth=2, label='Series A')

# Scatter plot (relationships)
ax.scatter(x, y, s=100, alpha=0.6, color='red')

# Bar plot (categories)
ax.bar(categories, values, color='green')

# Histogram (distributions)
ax.hist(data, bins=20, edgecolor='black')

# Styling elements
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_title('Title')
ax.legend()
ax.grid(True, alpha=0.3)

Part 4: When to Use Which Plot Type (8 minutes)

Line Plot

Use for: Time series, trends, continuous data Example: Stock prices over time, temperature throughout the day

import pandas as pd
import matplotlib.pyplot as plt

# Sample data
dates = pd.date_range('2024-01-01', periods=30)
prices = [100 + i + (i % 5) for i in range(30)]

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(dates, prices, linewidth=2, color='steelblue')
ax.set_xlabel('Date')
ax.set_ylabel('Price ($)')
ax.set_title('Stock Price Over Time')
ax.grid(True, alpha=0.3)
plt.show()

Teaching Point: “Line plots assume order matters. Use them when your x-axis has natural progression.”

Bar Plot

Use for: Comparing categorical values, rankings, counts Example: Sales by region, programming language popularity

fig, ax = plt.subplots(figsize=(10, 6))

languages = ['Python', 'JavaScript', 'Java', 'C++', 'Go']
popularity = [85, 72, 65, 45, 38]
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd']

ax.bar(languages, popularity, color=colors)
ax.set_ylabel('Popularity Score')
ax.set_title('Programming Language Popularity 2024')
ax.set_ylim(0, 100)

# Add value labels on bars
for i, v in enumerate(popularity):
    ax.text(i, v + 2, str(v), ha='center', fontweight='bold')

plt.show()

Teaching Point: “Bars are easier to compare than scattered points. Order them by value for clarity.”

Scatter Plot

Use for: Relationships between two variables, outliers, clusters Example: House price vs. size, student study hours vs. test scores

import numpy as np

fig, ax = plt.subplots(figsize=(10, 6))

# Generate correlated data
np.random.seed(42)
hours_studied = np.random.uniform(0, 10, 100)
test_scores = hours_studied * 8 + np.random.normal(0, 5, 100)
test_scores = np.clip(test_scores, 0, 100)

ax.scatter(hours_studied, test_scores, alpha=0.6, s=100, color='steelblue')
ax.set_xlabel('Hours Studied')
ax.set_ylabel('Test Score')
ax.set_title('Study Hours vs Test Performance')
ax.grid(True, alpha=0.3)

# Add trend line
z = np.polyfit(hours_studied, test_scores, 1)
p = np.poly1d(z)
ax.plot(hours_studied, p(hours_studied), "r--", linewidth=2, label='Trend')
ax.legend()

plt.show()

Teaching Point: “Scatter plots reveal relationships but can hide trends. Add a trend line to clarify the pattern.”

Histogram

Use for: Distribution shape, frequency, data spread Example: Customer age distribution, test score grades

fig, ax = plt.subplots(figsize=(10, 6))

# Generate sample data (normally distributed)
data = np.random.normal(loc=70, scale=15, size=1000)

ax.hist(data, bins=30, color='steelblue', edgecolor='black', alpha=0.7)
ax.set_xlabel('Value')
ax.set_ylabel('Frequency')
ax.set_title('Distribution of Test Scores')
ax.axvline(data.mean(), color='red', linestyle='--', linewidth=2, label=f'Mean: {data.mean():.1f}')
ax.axvline(np.median(data), color='green', linestyle='--', linewidth=2, label=f'Median: {np.median(data):.1f}')
ax.legend()

plt.show()

Teaching Point: “Histograms show the shape of data. Watch for skew, bimodality, or outliers.”

Part 5: Introduction to Seaborn (8 minutes)

Why Seaborn?

Seaborn is a wrapper around Matplotlib with better defaults and simpler code for statistical visualization.

Basic Philosophy

“Seaborn is for exploratory analysis. Matplotlib is when you need full control.”

Common Seaborn Plots

import seaborn as sns
import pandas as pd

# Load sample data
iris = sns.load_dataset('iris')  # Built-in dataset

# Set style
sns.set_theme(style="darkgrid")

# Scatter with hue (color by category)
fig, ax = plt.subplots(figsize=(10, 6))
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width', 
                hue='species', s=100, ax=ax)
ax.set_title('Iris Sepal Measurements')
plt.show()

# Line plot with confidence interval
fig, ax = plt.subplots(figsize=(10, 6))
sns.lineplot(data=iris, x='sepal_length', y='petal_length', 
             hue='species', ax=ax)
ax.set_title('Sepal vs Petal Length by Species')
plt.show()

# Box plot (distribution by category)
fig, ax = plt.subplots(figsize=(10, 6))
sns.boxplot(data=iris, x='species', y='sepal_length', ax=ax)
ax.set_title('Sepal Length Distribution by Species')
plt.show()

The `hue` Parameter

One of Seaborn’s superpowers: color data points by category without extra code.

# Without hue: you see relationships
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width')

# With hue: you see relationships BY GROUP
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width', 
                hue='species')

Styling

# Set overall theme
sns.set_theme(style="whitegrid")  # or "dark", "white", "darkgrid"

# Set palette (colors)
sns.set_palette("husl")  # or "Set2", "coolwarm", "rocket"

# Create plot with custom styling
fig, ax = plt.subplots(figsize=(10, 6))
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width', 
                hue='species', palette='Set2', s=150, ax=ax)

Part 6: Quick Interactive Preview - Plotly (4 minutes)

When to Use Plotly

Interactive visualizations for dashboards, web apps, and presentations.

Basic Example

import plotly.express as px

iris = px.data.iris()

# Interactive scatter plot
fig = px.scatter(iris, x='sepal_width', y='sepal_length', 
                 color='species', hover_data=['petal_length'],
                 title='Interactive Iris Explorer')
fig.show()

# Interactive line plot
import pandas as pd
import numpy as np

dates = pd.date_range('2024-01-01', periods=30)
values = np.cumsum(np.random.randn(30))
df = pd.DataFrame({'date': dates, 'value': values})

fig = px.line(df, x='date', y='value', 
              title='Interactive Time Series',
              hover_data={'date': '|%B %d, %Y'})
fig.show()

Why It’s Different

Hover for exact values
Zoom and pan
Click legend to show/hide
Export as PNG
Embed in web pages

Teaching Point

“Plotly is great for final presentations. Use Matplotlib/Seaborn for exploration.”

Part 7: Hands-On Workshop (15 minutes)

Exercise 1: Your First Visualization (5 min)

Task: Create a line plot of monthly website traffic

import matplotlib.pyplot as plt

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
visitors = [5000, 6200, 5800, 7500, 8200, 9100]

# TODO: Create figure and axes
# TODO: Plot the data
# TODO: Add labels and title
# TODO: Show the plot

Solution:

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(months, visitors, marker='o', linewidth=2, color='steelblue')
ax.set_xlabel('Month')
ax.set_ylabel('Visitors')
ax.set_title('Monthly Website Traffic')
ax.grid(True, alpha=0.3)
plt.show()

Exercise 2: Bar Plot Comparison (5 min)

Task: Compare revenue across three product lines

import matplotlib.pyplot as plt

products = ['Product A', 'Product B', 'Product C']
revenue = [450000, 280000, 395000]

# TODO: Create bar plot
# TODO: Add value labels on bars
# TODO: Format y-axis as currency

Solution:

fig, ax = plt.subplots(figsize=(8, 6))
bars = ax.bar(products, revenue, color=['#1f77b4', '#ff7f0e', '#2ca02c'])
ax.set_ylabel('Revenue ($)')
ax.set_title('Revenue by Product Line')

# Add value labels
for bar in bars:
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'${height/1000:.0f}K',
            ha='center', va='bottom', fontweight='bold')

# Format y-axis
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1000:.0f}K'))

plt.show()

Exercise 3: Multi-Panel Exploration (5 min)

Task: Create a 2x2 grid exploring a dataset

import seaborn as sns
import matplotlib.pyplot as plt

iris = sns.load_dataset('iris')
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes = axes.flatten()

# TODO: Create 4 different plots on each subplot
# Hint: scatter, box, histogram, and one more

Solution:

iris = sns.load_dataset('iris')
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Plot 1: Scatter
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width', 
                hue='species', ax=axes[0])
axes[0].set_title('Sepal Dimensions')

# Plot 2: Box plot
sns.boxplot(data=iris, x='species', y='sepal_length', ax=axes[1])
axes[1].set_title('Sepal Length Distribution')

# Plot 3: Histogram
axes[2].hist(iris['petal_length'], bins=20, color='steelblue', edgecolor='black')
axes[2].set_title('Petal Length Distribution')
axes[2].set_xlabel('Petal Length')

# Plot 4: Violin plot
sns.violinplot(data=iris, x='species', y='petal_width', ax=axes[3])
axes[3].set_title('Petal Width by Species')

plt.tight_layout()
plt.show()

Part 8: Best Practices & Common Mistakes (4 minutes)

Do’s

Choose the right plot type for your data type
Label everything (axes, title, legend)
Use colors intentionally (not randomly)
Keep it simple (don’t overdecorated)
Test your visualization with different data

Don’ts

Don’t use 3D plots (they’re harder to read than 2D)
Don’t mix too many colors without meaning
Don’t forget axis labels
Don’t use dual axes unless absolutely necessary
Don’t start y-axis at something other than 0 (unless there’s a good reason)

Common Mistake: Misleading Scales

# BAD: Exaggerates difference
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [98, 99, 100])
ax.set_ylim(97, 101)  # Zoomed in too much

# GOOD: Shows actual proportions
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [98, 99, 100])
ax.set_ylim(0, 100)  # Full context

Common Mistake: Overusing Pie Charts

# AVOID THIS
fig, ax = plt.subplots()
ax.pie(sizes, labels=labels)  # Hard to compare slices

# DO THIS INSTEAD
fig, ax = plt.subplots()
ax.bar(labels, sizes)  # Easy to compare

Part 9: Resources & Next Steps (1 minute)

Learning Resources

Matplotlib Documentation: https://matplotlib.org/stable/contents.html
Seaborn Documentation: https://seaborn.pydata.org/
Plotly Documentation: https://plotly.com/python/
Real Python Tutorials: Search “matplotlib” on realpython.com

Practice Datasets

Kaggle: Free datasets for any interest
Seaborn built-ins: sns.load_dataset('name')
UCI ML Repository: Classic datasets

Next Steps

Exploratory Analysis: Use visualization to understand new datasets first
Publication Quality: Learn Matplotlib fine-tuning for papers/reports
Dashboards: Combine multiple plots with Plotly or Streamlit
Specialized Plots: Geographic maps, networks, 3D (when appropriate)

Project Idea

Find a dataset you care about. Create 5 different visualizations answering questions about it:

What’s the distribution?
Are there relationships?
How does it compare across categories?
Are there trends over time?
What are outliers?

Summary: The Decision Tree

Need to explore data quickly? → Use Seaborn with Jupyter notebooks

Need fine control for publication? → Use Matplotlib with explicit figure/axes

Need interactive web visualization? → Use Plotly

Don’t know which plot type?

Time series → Line plot
Comparing categories → Bar plot
Relationship between variables → Scatter plot
Distribution shape → Histogram
Distribution by category → Box/Violin plot

Appendix: Complete Working Example

A small project tying everything together:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Create sample dataset: Student performance
np.random.seed(42)
students = 150
data = {
    'Study_Hours': np.random.uniform(0, 8, students),
    'Sleep_Hours': np.random.uniform(4, 10, students),
    'GPA': np.random.uniform(2.0, 4.0, students),
    'Major': np.random.choice(['CS', 'Math', 'Physics'], students)
}
df = pd.DataFrame(data)

# Explore with visualization
sns.set_theme(style="whitegrid")
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Plot 1: Study hours vs GPA
sns.scatterplot(data=df, x='Study_Hours', y='GPA', 
                hue='Major', s=100, ax=axes[0, 0])
axes[0, 0].set_title('Study Hours vs GPA by Major')

# Plot 2: GPA distribution
axes[0, 1].hist(df['GPA'], bins=20, color='steelblue', edgecolor='black')
axes[0, 1].set_title('GPA Distribution')
axes[0, 1].set_xlabel('GPA')

# Plot 3: Sleep by major
sns.boxplot(data=df, x='Major', y='Sleep_Hours', ax=axes[1, 0])
axes[1, 0].set_title('Sleep Hours by Major')

# Plot 4: Study vs Sleep
sns.scatterplot(data=df, x='Sleep_Hours', y='Study_Hours', 
                hue='Major', s=100, ax=axes[1, 1])
axes[1, 1].set_title('Sleep Hours vs Study Hours')

plt.tight_layout()
plt.show()

# Key insights from visualization:
print(f"Average GPA: {df['GPA'].mean():.2f}")
print(f"Correlation (Study hrs, GPA): {df[['Study_Hours', 'GPA']].corr().iloc[0, 1]:.3f}")

Teaching Notes

Timing Breakdown

Part 1-2: 8 minutes (Why + Ecosystem)
Part 3: 12 minutes (Matplotlib fundamentals)
Part 4: 8 minutes (Plot types with examples)
Part 5: 8 minutes (Seaborn intro)
Part 6: 4 minutes (Plotly preview)
Part 7: 15 minutes (Hands-on exercises)
Part 8: 4 minutes (Best practices)
Part 9: 1 minute (Resources)

Interactive Elements

Live coding: Build each example in the lecture, explain as you go
Pause points: After each plot type, ask students which they’d use for their data
Exercises: Have students code along for Part 7

Common Questions to Anticipate

“Why Matplotlib if it’s harder than Seaborn?” → Answer: Foundation, control, understanding
“Can I use Plotly for everything?” → Answer: Yes, but overkill for exploration
“How do I save plots?” → Answer: fig.savefig('name.png', dpi=300)

Assessment Ideas

Have students create a visualization from their own dataset
Quiz: “Which plot type would you use for…” questions
Mini-project: Explore a Kaggle dataset and present 3 visualizations

Data Visualization for Beginning Python Developers

1-Hour Introduction Lecture

Lecture Overview (1 minute)

Part 1: Why Data Visualization Matters (5 minutes)

The Core Problem

Three Key Reasons to Visualize

Teaching Point

Part 2: The Python Visualization Ecosystem (3 minutes)

The Three Main Libraries

For This Course

Part 3: Matplotlib Fundamentals (12 minutes)

The Figure-Axes Model

Basic Pattern

Teaching Point

Working with Multiple Subplots

Key Matplotlib Methods

Part 4: When to Use Which Plot Type (8 minutes)

Line Plot

Bar Plot

Scatter Plot

Histogram

Part 5: Introduction to Seaborn (8 minutes)

Why Seaborn?

Basic Philosophy

Common Seaborn Plots

The hue Parameter

Styling

Part 6: Quick Interactive Preview - Plotly (4 minutes)

When to Use Plotly

Basic Example

Why It’s Different

Teaching Point

Part 7: Hands-On Workshop (15 minutes)

Exercise 1: Your First Visualization (5 min)

Exercise 2: Bar Plot Comparison (5 min)

Exercise 3: Multi-Panel Exploration (5 min)

Part 8: Best Practices & Common Mistakes (4 minutes)

Do’s

Don’ts

Common Mistake: Misleading Scales

Common Mistake: Overusing Pie Charts

Part 9: Resources & Next Steps (1 minute)

Learning Resources

Practice Datasets

Next Steps

Project Idea

Summary: The Decision Tree

Appendix: Complete Working Example

Teaching Notes

Timing Breakdown

Interactive Elements

Common Questions to Anticipate

Assessment Ideas

The `hue` Parameter