Bar Charts in Matplotlib¶
Bar charts are a great way to visualize categorical data, especially when comparing quantities across different categories. Matplotlib makes it easy to create both vertical and horizontal bar charts with customizations for displaying complex datasets clearly.
Creating a Bar Chart¶
Switching to Bar Chart¶
Matplotlib defaults to line charts when using the plot()
method, but changing to a bar chart is simple. By switching from plot()
to bar()
, we can display data as bars rather than lines.
import matplotlib.pyplot as plt
# Sample data
ages = [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
dev_salaries = [38496, 42000, 46752, 49320, 53200, 56000, 62316, 64928, 67317, 68748, 73752]
# Basic bar chart
plt.bar(ages, dev_salaries, color='blue', label="All Developers")
plt.xlabel("Ages")
plt.ylabel("Median Salary (USD)")
plt.title("Median Salary (USD) by Age")
plt.legend()
plt.show()
In this example:
ages
represents the x-axis values.dev_salaries
represents the y-axis values.- We customize the bar color and add a label for the legend.
Adding Multiple Bar Plots¶
When comparing multiple datasets, such as salaries for different programming languages, we need to add additional bars. However, simply stacking additional bars will overlap them. Here’s how to handle that:
# Additional data for Python and JavaScript developers
py_salaries = [45372, 48876, 53850, 57287, 63016, 65998, 70003, 70000, 71496, 75370, 83640]
js_salaries = [37810, 43515, 46823, 49293, 53437, 56373, 62375, 66674, 68745, 68746, 74583]
# Overlapping example
plt.bar(ages, dev_salaries, color='blue', label="All Developers")
plt.bar(ages, py_salaries, color='green', label="Python")
plt.bar(ages, js_salaries, color='red', label="JavaScript")
plt.legend()
plt.show()
Running this will show overlapping bars, making the chart hard to read. To fix this, we’ll use Numpy to offset the x-axis values.
Using Numpy for X-Indexing¶
We use Numpy to create offsets for each bar group to avoid overlapping.
import numpy as np
x_indexes = np.arange(len(ages))
width = 0.25 # Adjust width for spacing
# Plotting with offsets
plt.bar(x_indexes - width, dev_salaries, width=width, color='blue', label="All Developers")
plt.bar(x_indexes, py_salaries, width=width, color='green', label="Python")
plt.bar(x_indexes + width, js_salaries, width=width, color='red', label="JavaScript")
plt.legend()
plt.show()
Offsetting Bars¶
Here, we use:
x_indexes - width
for the first dataset,x_indexes
for the second,x_indexes + width
for the third dataset.
This method spaces the bars side by side, providing a clear comparison between categories.
Labeling and Ticks¶
To replace the x-tick labels with age values, use plt.xticks()
.
plt.xticks(ticks=x_indexes, labels=ages)
([<matplotlib.axis.XTick at 0x117acd480>, <matplotlib.axis.XTick at 0x117acd450>, <matplotlib.axis.XTick at 0x117accc40>, <matplotlib.axis.XTick at 0x117af5870>, <matplotlib.axis.XTick at 0x117af6320>, <matplotlib.axis.XTick at 0x117af6dd0>, <matplotlib.axis.XTick at 0x117af5ab0>, <matplotlib.axis.XTick at 0x117af7850>, <matplotlib.axis.XTick at 0x117c20340>, <matplotlib.axis.XTick at 0x117c20df0>, <matplotlib.axis.XTick at 0x117c218a0>], [Text(0, 0, '25'), Text(1, 0, '26'), Text(2, 0, '27'), Text(3, 0, '28'), Text(4, 0, '29'), Text(5, 0, '30'), Text(6, 0, '31'), Text(7, 0, '32'), Text(8, 0, '33'), Text(9, 0, '34'), Text(10, 0, '35')])
This displays ages
instead of numerical indices on the x-axis, making the chart easier to interpret.
Horizontal Bar Charts¶
When to Use Horizontal Bar Charts¶
Horizontal bar charts are effective for visualizing large datasets or when category names are long. Here’s an example using programming language popularity.
Using Counters¶
To count programming language occurrences, Python’s Counter
class is very useful. Suppose we have data from a survey on which programming languages developers use.
from collections import Counter
# Sample data: language survey responses
languages = ["Python", "JavaScript", "Java", "C++", "Python", "JavaScript", "Python", "Java"]
language_counter = Counter(languages)
This code counts occurrences of each language.
Extracting and Plotting Data¶
Now, we create lists for languages and counts to plot in a bar chart.
# Extracting data from counter
lang_names = list(language_counter.keys())
popularity = list(language_counter.values())
# Plotting as vertical bar chart
plt.barh(lang_names, popularity)
plt.xlabel("Programming Languages")
plt.ylabel("Number of Users")
plt.title("Popularity of Programming Languages")
plt.show()
plt.barh(lang_names, popularity)
plt.xlabel("Number of Users")
plt.ylabel("Programming Languages")
plt.title("Popularity of Programming Languages")
plt.show()
This displays the categories along the y-axis, enhancing readability for many categories.
Adjusting Orientation and Labels¶
To arrange bars from most to least popular, reverse the lists before plotting.
lang_names.reverse()
popularity.reverse()
plt.barh(lang_names, popularity)
<BarContainer object of 4 artists>