Data Manipulation in Numpy¶
NumPy offers a variety of functions for manipulating arrays, allowing you to reshape, join, split, and modify data efficiently. Mastering these data manipulation techniques is essential for data analysis, machine learning, and scientific computing.
Importing NumPy¶
Before you start working with NumPy, you need to import the library. It's standard practice to import NumPy using the alias np.
import numpy as np
This allows you to access NumPy functions using the np prefix, such as np.array().
# Creating a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
# Getting the shape of the array
array_shape = array_2d.shape
In this example, array_shape will be (2, 3), indicating that the array has 2 rows and 3 columns.
Reshaping Arrays (reshape)¶
The reshape function allows you to change the shape of an array without changing its data.
# Creating a 1D array with 6 elements
array_1d = np.arange(6)
# Reshaping the array to 2 rows and 3 columns
array_reshaped = array_1d.reshape(2, 3)
array_reshaped
array([[0, 1, 2], [3, 4, 5]])
Here, array_reshaped will be a 2x3 array. The total number of elements must remain the same when reshaping.
3.3 Flattening Arrays (flatten and ravel)¶
Flattening an array converts it into a 1D array.
- flatten returns a copy of the array collapsed into one dimension.
- ravel returns a view (if possible) of the array collapsed into one dimension.
# Flattening the array using flatten()
flattened_array = array_reshaped.flatten()
# Flattening the array using ravel()
raveled_array = array_reshaped.ravel()
flattened_array
array([0, 1, 2, 3, 4, 5])
Use flatten when you need a copy of the array, and ravel when you need a view (which is more memory-efficient).
Transposing Arrays (transpose and T)¶
Transposing an array swaps its dimensions.
# Transposing the array using transpose()
transposed_array = array_reshaped.transpose()
# Transposing the array using the T attribute
transposed_array_T = array_reshaped.T
Both methods will swap the rows and columns of the array.
# Creating two arrays
array_a = np.array([[1, 2], [3, 4]])
array_b = np.array([[5, 6]])
# Concatenating along axis 0 (rows)
concatenated_array = np.concatenate((array_a, array_b), axis=0)
# Stacking arrays vertically
vertical_stack = np.vstack((array_a, array_b))
array_a.shape
(2, 2)
Horizontal Stack (hstack)¶
Stacks arrays horizontally (column-wise).
# Stacking arrays horizontally
array_c = np.array([[7], [8]])
horizontal_stack = np.hstack((array_a, array_c))
Depth Stack (dstack)¶
Stacks arrays along the third axis (depth).
# Stacking arrays depth-wise
depth_stack = np.dstack((array_a, array_a))
depth_stack
array([[[1, 1], [2, 2]], [[3, 3], [4, 4]]])
Generic Stack (stack)¶
Stacks arrays along a new axis.
# Stacking arrays along a new axis
stacked_array = np.stack((array_a, array_a), axis=0)
# Creating an array
array_d = np.array([1, 2, 3, 4, 5, 6])
# Splitting the array into three equal-sized sub-arrays
split_arrays = np.split(array_d, 3)
Horizontal Split (hsplit)¶
Splits an array horizontally (column-wise).
# Creating a 2D array
array_e = np.array([[1, 2, 3], [4, 5, 6]])
# Splitting the array into three columns
h_split_arrays = np.hsplit(array_e, 3)
Vertical Split (vsplit)¶
Splits an array vertically (row-wise).
# Splitting the array into two rows
v_split_arrays = np.vsplit(array_e, 2)
Depth Split (dsplit)¶
Splits an array along the depth (third) axis.
# Creating a 3D array
array_f = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
# Splitting the array along depth
d_split_arrays = np.dsplit(array_f, 2)
# Creating an array
array_g = np.array([1, 2, 3, 4, 5])
# Inserting value 99 at index 2
array_with_insert = np.insert(array_g, 2, 99)
# Appending value 6 to the array
array_with_append = np.append(array_g, 6)
# Deleting the element at index 1
array_with_delete = np.delete(array_g, 1)
# Assigning array_g to array_h
array_h = array_g
Modifying array_h will affect array_g.
View or Shallow Copy¶
A view creates a new array object that looks at the same data.
# Creating a view of array_g
array_view = array_g.view()
Changes to the data in the view will affect the original array.
Deep Copy¶
A deep copy creates a new array and copies the data.
# Creating a deep copy of array_g
array_copy = array_g.copy()
Modifying array_copy will not affect array_g.
Broadcasting¶
Broadcasting allows NumPy to perform operations on arrays of different shapes.
Understanding Broadcasting Rules¶
- If the arrays do not have the same rank, prepend the shape with ones until both shapes have the same length.
- The size of each dimension should be the same or one of them should be one.
Broadcasting Examples¶
# Creating arrays of different shapes
array_i = np.array([1, 2, 3])
array_j = np.array([[4], [5], [6]])
# Adding arrays using broadcasting
broadcast_sum = array_i + array_j
broadcast_sum
array([[5, 6, 7], [6, 7, 8], [7, 8, 9]])
In this example, array_i is reshaped to (1, 3) and array_j to (3, 1). Broadcasting allows the addition to proceed.
# Creating an unsorted array
array_k = np.array([3, 1, 2])
# Sorting the array
sorted_array = np.sort(array_k)
# Getting the indices that would sort the array
sorted_indices = np.argsort(array_k)
Using sorted_indices, you can rearrange another array to correspond with the sorted order.