Skip to the content.

Data Visualization for Beginning Python Developers

1-Hour Introduction Lecture


Lecture Overview (1 minute)

Learning Objectives:

What we’ll cover: Matplotlib basics → Plot types → Seaborn styling → Quick intro to interactive plots


Part 1: Why Data Visualization Matters (5 minutes)

The Core Problem

Raw data is hard to understand. Numbers in tables don’t reveal patterns.

Example: Compare these two:

The graph tells the story instantly.

Three Key Reasons to Visualize

  1. Exploration: Discover patterns you didn’t expect
  2. Communication: Show stakeholders what the data means
  3. Verification: Spot errors or anomalies visually

Teaching Point

“Visualization is the bridge between raw numbers and human understanding. Before you build a model or write a report, visualize your data.”


Part 2: The Python Visualization Ecosystem (3 minutes)

The Three Main Libraries

Matplotlib (The Foundation)

Seaborn (The Statistician)

Plotly (The Interactive)

For This Course

We’ll focus on Matplotlib (the foundation) and Seaborn (the practical tool).


Part 3: Matplotlib Fundamentals (12 minutes)

The Figure-Axes Model

Matplotlib uses a hierarchy:

Basic Pattern

import matplotlib.pyplot as plt

# Create figure and axes
fig, ax = plt.subplots()

# Draw on axes
ax.plot([1, 2, 3], [1, 4, 9])

# Customize
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_title('My First Plot')

# Show
plt.show()

Teaching Point

“Always create your figure and axes explicitly. This pattern scales from simple plots to complex multi-panel figures.”

Working with Multiple Subplots

# Create 2x2 grid of subplots
fig, axes = plt.subplots(2, 2, figsize=(10, 8))

# Flatten to 1D array for easier iteration
axes = axes.flatten()

# Plot on each
for i, ax in enumerate(axes):
    ax.plot([1, 2, 3], [i, i+1, i+2])
    ax.set_title(f'Plot {i+1}')

plt.tight_layout()  # Prevent overlap
plt.show()

Key Matplotlib Methods

# Line plot (time series, trends)
ax.plot(x, y, 'b-', linewidth=2, label='Series A')

# Scatter plot (relationships)
ax.scatter(x, y, s=100, alpha=0.6, color='red')

# Bar plot (categories)
ax.bar(categories, values, color='green')

# Histogram (distributions)
ax.hist(data, bins=20, edgecolor='black')

# Styling elements
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_title('Title')
ax.legend()
ax.grid(True, alpha=0.3)

Part 4: When to Use Which Plot Type (8 minutes)

Line Plot

Use for: Time series, trends, continuous data Example: Stock prices over time, temperature throughout the day

import pandas as pd
import matplotlib.pyplot as plt

# Sample data
dates = pd.date_range('2024-01-01', periods=30)
prices = [100 + i + (i % 5) for i in range(30)]

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(dates, prices, linewidth=2, color='steelblue')
ax.set_xlabel('Date')
ax.set_ylabel('Price ($)')
ax.set_title('Stock Price Over Time')
ax.grid(True, alpha=0.3)
plt.show()

Teaching Point: “Line plots assume order matters. Use them when your x-axis has natural progression.”

Bar Plot

Use for: Comparing categorical values, rankings, counts Example: Sales by region, programming language popularity

fig, ax = plt.subplots(figsize=(10, 6))

languages = ['Python', 'JavaScript', 'Java', 'C++', 'Go']
popularity = [85, 72, 65, 45, 38]
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd']

ax.bar(languages, popularity, color=colors)
ax.set_ylabel('Popularity Score')
ax.set_title('Programming Language Popularity 2024')
ax.set_ylim(0, 100)

# Add value labels on bars
for i, v in enumerate(popularity):
    ax.text(i, v + 2, str(v), ha='center', fontweight='bold')

plt.show()

Teaching Point: “Bars are easier to compare than scattered points. Order them by value for clarity.”

Scatter Plot

Use for: Relationships between two variables, outliers, clusters Example: House price vs. size, student study hours vs. test scores

import numpy as np

fig, ax = plt.subplots(figsize=(10, 6))

# Generate correlated data
np.random.seed(42)
hours_studied = np.random.uniform(0, 10, 100)
test_scores = hours_studied * 8 + np.random.normal(0, 5, 100)
test_scores = np.clip(test_scores, 0, 100)

ax.scatter(hours_studied, test_scores, alpha=0.6, s=100, color='steelblue')
ax.set_xlabel('Hours Studied')
ax.set_ylabel('Test Score')
ax.set_title('Study Hours vs Test Performance')
ax.grid(True, alpha=0.3)

# Add trend line
z = np.polyfit(hours_studied, test_scores, 1)
p = np.poly1d(z)
ax.plot(hours_studied, p(hours_studied), "r--", linewidth=2, label='Trend')
ax.legend()

plt.show()

Teaching Point: “Scatter plots reveal relationships but can hide trends. Add a trend line to clarify the pattern.”

Histogram

Use for: Distribution shape, frequency, data spread Example: Customer age distribution, test score grades

fig, ax = plt.subplots(figsize=(10, 6))

# Generate sample data (normally distributed)
data = np.random.normal(loc=70, scale=15, size=1000)

ax.hist(data, bins=30, color='steelblue', edgecolor='black', alpha=0.7)
ax.set_xlabel('Value')
ax.set_ylabel('Frequency')
ax.set_title('Distribution of Test Scores')
ax.axvline(data.mean(), color='red', linestyle='--', linewidth=2, label=f'Mean: {data.mean():.1f}')
ax.axvline(np.median(data), color='green', linestyle='--', linewidth=2, label=f'Median: {np.median(data):.1f}')
ax.legend()

plt.show()

Teaching Point: “Histograms show the shape of data. Watch for skew, bimodality, or outliers.”


Part 5: Introduction to Seaborn (8 minutes)

Why Seaborn?

Seaborn is a wrapper around Matplotlib with better defaults and simpler code for statistical visualization.

Basic Philosophy

“Seaborn is for exploratory analysis. Matplotlib is when you need full control.”

Common Seaborn Plots

import seaborn as sns
import pandas as pd

# Load sample data
iris = sns.load_dataset('iris')  # Built-in dataset

# Set style
sns.set_theme(style="darkgrid")

# Scatter with hue (color by category)
fig, ax = plt.subplots(figsize=(10, 6))
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width', 
                hue='species', s=100, ax=ax)
ax.set_title('Iris Sepal Measurements')
plt.show()

# Line plot with confidence interval
fig, ax = plt.subplots(figsize=(10, 6))
sns.lineplot(data=iris, x='sepal_length', y='petal_length', 
             hue='species', ax=ax)
ax.set_title('Sepal vs Petal Length by Species')
plt.show()

# Box plot (distribution by category)
fig, ax = plt.subplots(figsize=(10, 6))
sns.boxplot(data=iris, x='species', y='sepal_length', ax=ax)
ax.set_title('Sepal Length Distribution by Species')
plt.show()

The hue Parameter

One of Seaborn’s superpowers: color data points by category without extra code.

# Without hue: you see relationships
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width')

# With hue: you see relationships BY GROUP
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width', 
                hue='species')

Styling

# Set overall theme
sns.set_theme(style="whitegrid")  # or "dark", "white", "darkgrid"

# Set palette (colors)
sns.set_palette("husl")  # or "Set2", "coolwarm", "rocket"

# Create plot with custom styling
fig, ax = plt.subplots(figsize=(10, 6))
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width', 
                hue='species', palette='Set2', s=150, ax=ax)

Part 6: Quick Interactive Preview - Plotly (4 minutes)

When to Use Plotly

Interactive visualizations for dashboards, web apps, and presentations.

Basic Example

import plotly.express as px

iris = px.data.iris()

# Interactive scatter plot
fig = px.scatter(iris, x='sepal_width', y='sepal_length', 
                 color='species', hover_data=['petal_length'],
                 title='Interactive Iris Explorer')
fig.show()

# Interactive line plot
import pandas as pd
import numpy as np

dates = pd.date_range('2024-01-01', periods=30)
values = np.cumsum(np.random.randn(30))
df = pd.DataFrame({'date': dates, 'value': values})

fig = px.line(df, x='date', y='value', 
              title='Interactive Time Series',
              hover_data={'date': '|%B %d, %Y'})
fig.show()

Why It’s Different

Teaching Point

“Plotly is great for final presentations. Use Matplotlib/Seaborn for exploration.”


Part 7: Hands-On Workshop (15 minutes)

Exercise 1: Your First Visualization (5 min)

Task: Create a line plot of monthly website traffic

import matplotlib.pyplot as plt

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
visitors = [5000, 6200, 5800, 7500, 8200, 9100]

# TODO: Create figure and axes
# TODO: Plot the data
# TODO: Add labels and title
# TODO: Show the plot

Solution:

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(months, visitors, marker='o', linewidth=2, color='steelblue')
ax.set_xlabel('Month')
ax.set_ylabel('Visitors')
ax.set_title('Monthly Website Traffic')
ax.grid(True, alpha=0.3)
plt.show()

Exercise 2: Bar Plot Comparison (5 min)

Task: Compare revenue across three product lines

import matplotlib.pyplot as plt

products = ['Product A', 'Product B', 'Product C']
revenue = [450000, 280000, 395000]

# TODO: Create bar plot
# TODO: Add value labels on bars
# TODO: Format y-axis as currency

Solution:

fig, ax = plt.subplots(figsize=(8, 6))
bars = ax.bar(products, revenue, color=['#1f77b4', '#ff7f0e', '#2ca02c'])
ax.set_ylabel('Revenue ($)')
ax.set_title('Revenue by Product Line')

# Add value labels
for bar in bars:
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'${height/1000:.0f}K',
            ha='center', va='bottom', fontweight='bold')

# Format y-axis
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x/1000:.0f}K'))

plt.show()

Exercise 3: Multi-Panel Exploration (5 min)

Task: Create a 2x2 grid exploring a dataset

import seaborn as sns
import matplotlib.pyplot as plt

iris = sns.load_dataset('iris')
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes = axes.flatten()

# TODO: Create 4 different plots on each subplot
# Hint: scatter, box, histogram, and one more

Solution:

iris = sns.load_dataset('iris')
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Plot 1: Scatter
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width', 
                hue='species', ax=axes[0])
axes[0].set_title('Sepal Dimensions')

# Plot 2: Box plot
sns.boxplot(data=iris, x='species', y='sepal_length', ax=axes[1])
axes[1].set_title('Sepal Length Distribution')

# Plot 3: Histogram
axes[2].hist(iris['petal_length'], bins=20, color='steelblue', edgecolor='black')
axes[2].set_title('Petal Length Distribution')
axes[2].set_xlabel('Petal Length')

# Plot 4: Violin plot
sns.violinplot(data=iris, x='species', y='petal_width', ax=axes[3])
axes[3].set_title('Petal Width by Species')

plt.tight_layout()
plt.show()

Part 8: Best Practices & Common Mistakes (4 minutes)

Do’s

Don’ts

Common Mistake: Misleading Scales

# BAD: Exaggerates difference
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [98, 99, 100])
ax.set_ylim(97, 101)  # Zoomed in too much

# GOOD: Shows actual proportions
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [98, 99, 100])
ax.set_ylim(0, 100)  # Full context

Common Mistake: Overusing Pie Charts

# AVOID THIS
fig, ax = plt.subplots()
ax.pie(sizes, labels=labels)  # Hard to compare slices

# DO THIS INSTEAD
fig, ax = plt.subplots()
ax.bar(labels, sizes)  # Easy to compare

Part 9: Resources & Next Steps (1 minute)

Learning Resources

Practice Datasets

Next Steps

  1. Exploratory Analysis: Use visualization to understand new datasets first
  2. Publication Quality: Learn Matplotlib fine-tuning for papers/reports
  3. Dashboards: Combine multiple plots with Plotly or Streamlit
  4. Specialized Plots: Geographic maps, networks, 3D (when appropriate)

Project Idea

Find a dataset you care about. Create 5 different visualizations answering questions about it:


Summary: The Decision Tree

Need to explore data quickly? → Use Seaborn with Jupyter notebooks

Need fine control for publication? → Use Matplotlib with explicit figure/axes

Need interactive web visualization? → Use Plotly

Don’t know which plot type?


Appendix: Complete Working Example

A small project tying everything together:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Create sample dataset: Student performance
np.random.seed(42)
students = 150
data = {
    'Study_Hours': np.random.uniform(0, 8, students),
    'Sleep_Hours': np.random.uniform(4, 10, students),
    'GPA': np.random.uniform(2.0, 4.0, students),
    'Major': np.random.choice(['CS', 'Math', 'Physics'], students)
}
df = pd.DataFrame(data)

# Explore with visualization
sns.set_theme(style="whitegrid")
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Plot 1: Study hours vs GPA
sns.scatterplot(data=df, x='Study_Hours', y='GPA', 
                hue='Major', s=100, ax=axes[0, 0])
axes[0, 0].set_title('Study Hours vs GPA by Major')

# Plot 2: GPA distribution
axes[0, 1].hist(df['GPA'], bins=20, color='steelblue', edgecolor='black')
axes[0, 1].set_title('GPA Distribution')
axes[0, 1].set_xlabel('GPA')

# Plot 3: Sleep by major
sns.boxplot(data=df, x='Major', y='Sleep_Hours', ax=axes[1, 0])
axes[1, 0].set_title('Sleep Hours by Major')

# Plot 4: Study vs Sleep
sns.scatterplot(data=df, x='Sleep_Hours', y='Study_Hours', 
                hue='Major', s=100, ax=axes[1, 1])
axes[1, 1].set_title('Sleep Hours vs Study Hours')

plt.tight_layout()
plt.show()

# Key insights from visualization:
print(f"Average GPA: {df['GPA'].mean():.2f}")
print(f"Correlation (Study hrs, GPA): {df[['Study_Hours', 'GPA']].corr().iloc[0, 1]:.3f}")

Teaching Notes

Timing Breakdown

Interactive Elements

Common Questions to Anticipate

Assessment Ideas