Numpy: Array Basics
Creating Arrays
One of the foundational tasks in NumPy is creating arrays. An array is essentially a data structure that stores values of the same type in Python, and it's crucial for machine learning tasks because data in ML is typically represented as arrays. Here's how to create arrays in NumPy:
-
From a Python List: You can create a NumPy array directly from a Python list.
-
Using Functions: There are several built-in functions in NumPy that allow for array creation, such as
zeros()
,ones()
, andarange()
.
Data Types
In NumPy, arrays are grids of values, and they can contain various data types. By default, NumPy tries to guess the data type for the array based on the values you provide, but you can explicitly specify the data type using the dtype
parameter.
Common data types include:
int64
: Integer typefloat64
: Floating point typecomplex128
: Complex number typebool
: Boolean type (True/False)
For instance, to create an integer array:
Slicing & Indexing
Slicing and indexing are vital operations when working with NumPy arrays, especially in the context of machine learning where you often need to select specific data points or split datasets.
-
Indexing: Just like Python lists, you can access an array's elements by referring to its index number.
-
Slicing: To select a range of elements from an array, you can use the slice notation, which consists of the start index, end index, and the step value.
Reshaping Arrays
The shape of a NumPy array is crucial when you're feeding data into machine learning models. Often, you'll need to reshape data to fit the input or output structure of a model.
The reshape()
method allows you to reorganize the data within an array, providing a new shape without changing the data itself.
arr = np.array([1, 2, 3, 4, 5, 6])
reshaped_arr = arr.reshape(2, 3) # Reshapes the array into a 2x3 matrix
It's crucial to note that the total number of elements must remain the same when reshaping.
Stacking & Splitting
Machine learning often requires merging datasets or splitting them for tasks like training and testing.
-
Stacking: You can use
vstack()
for vertical stacking andhstack()
for horizontal stacking. -
Splitting: The
array_split()
function can split an array into multiple smaller arrays.
By mastering these array basics in NumPy, you'll be well-equipped to handle the data manipulation tasks inherent in the machine learning pipeline. Whether you're preprocessing data, building models, or evaluating results, a solid grasp of NumPy's array functionalities is invaluable.
Version 1.0
This is currently an early version of the learning material and it will be updated over time with more detailed information.
A video will be provided with the learning material as well.
Be sure to subscribe to stay up-to-date with the latest updates.
Need help mastering Machine Learning?
Don't just follow along — join me! Get exclusive access to me, your instructor, who can help answer any of your questions. Additionally, get access to a private learning group where you can learn together and support each other on your AI journey.