Essential Array Operations with NumPy
Objectives
Reshape arrays to transform data structures while preserving values
Combine arrays using concatenation operations along different axes
Generate descriptive statistics from arrays using NumPy’s built-in functions
Apply the axis parameter correctly to perform row-wise and column-wise operations
Integrate reshaping, concatenation, and statistical functions to solve practical data problems
Instructor note
Teaching : 20 min
Demo: 15 min
Introduction
NumPy is the foundation of Python’s data science ecosystem. At its core is the powerful ndarray object - an efficient, versatile container for large datasets. We’ll explore three essential capabilities:
Reshaping arrays to organize data differently
Combining arrays using concatenation
Generating summary statistics to understand our data
Let’s dive into how these operations can transform the way we work with numerical data.
Reshaping Arrays
Understanding Array Dimensions
Arrays can have different dimensions:
1D arrays (vectors): Simple sequences of values
2D arrays (matrices): Tables with rows and columns
3D arrays and beyond: Multi-dimensional structures
The shape and dimension of an array tell us how data is organized:
Demo
import numpy as np
# Create a simple 1D array
a = np.ones(6)
print("Original array:")
print(a)
print(f"Dimensions: {a.ndim}") # Number of dimensions
print(f"Shape: {a.shape}") # Tuple showing size in each dimension
Output:
Original array:
[1. 1. 1. 1. 1. 1.]
Dimensions: 1
Shape: (6,)
Reshaping Arrays uisng reshape
Reshaping allows us to reorganize the same data into different dimensions
The key rule: the total number of elements must remain the same
Demo
a = np.array(range(1,7))
# Reshape our 1D array with 6 elements into a 2D array (2 rows, 3 columns)
b = a.reshape(2, 3)
print("\nReshaped to 2x3 array:")
print(b)
print(f"Dimensions: {b.ndim}")
print(f"Shape: {b.shape}")
Output
Reshaped to 2x3 array:
[[1 2 3]
[4 5 6]]
Dimensions: 2
Shape: (2, 3)
More info
Additional notes: example
Practical Example: Preparing a Simple Grayscale Image for an ML Model:
Imagine you have a tiny grayscale image, maybe from a very simple dataset. It’s represented as a 2D grid of pixel values. Many basic machine learning algorithms (like Logistic Regression or simple Neural Networks) expect input data where each row is a single sample (a single image) and each column is a feature (a single pixel value).
Our task is to take a 2D image representation and “flatten” it into a 1D row vector suitable for these algorithms.
Demo
import numpy as np
# 2. Imagine a tiny 3x3 pixel grayscale image
# Each number represents the brightness of a pixel (0=black, 255=white)
# This is a 2D NumPy array (a matrix)
image_2d = np.array([
[10, 20, 30],
[40, 50, 60],
[70, 80, 90]
])
print("Original 2D Image Array:")
print(image_2d)
print("Shape of original image:", image_2d.shape) # Output: (3, 3) -> 3 rows, 3 columns
# 3. Prepare for ML: Flatten the image
# Many ML models expect each sample (our image) as a single row.
# We need to convert the 3x3 grid into a 1x9 row (1 row, 9 features/pixels).
# Total number of pixels = 3 * 3 = 9
# Using reshape:
# We want 1 row, and NumPy can figure out the number of columns needed.
# We use '-1' to tell NumPy: "calculate the correct number of columns for me".
flattened_image = image_2d.reshape(1, 9)
# Alternatively, we could be explicit:
# flattened_image = image_2d.reshape(1, 9)
print("Flattened Image Array (Ready for ML Model):")
print(flattened_image)
print("Shape of flattened image:", flattened_image.shape) # Output: (1, 9) -> 1 row, 9 columns
Output
Original 2D Image Array:
[[10 20 30]
[40 50 60]
[70 80 90]]
Shape of original image: (3, 3)
Flattened Image Array (Ready for ML Model):
[[10 20 30 40 50 60 70 80 90]]
Shape of flattened image: (1, 9)
Using -1 as a Dimension:
NumPy can automatically calculate one dimension when you use -1:
image_2d2 = np.array([
[10, 20, 30],
[40, 50, 60],
[70, 80, 90],
[100, 50, 60],
[55, 150, 200],
[150, 100, 220]
])
print(f"Flattened image: {image_2d2.reshape(-1, 9)}")
Output
array([[ 10, 20, 30, 40, 50, 60, 70, 80, 90],
[100, 50, 60, 55, 150, 200, 150, 100, 220]])
Array Concatenation
Concatenation lets us combine multiple arrays into a single larger array. This is essential when:
Merging datasets
Building up arrays piece by piece
Combining results from different operations
More info
Additional notes: array concatenation
1D Array Concatenation
Let’s start with the simplest case - joining two 1D arrays:
# Create two 1D arrays
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
# Concatenate them
combined = np.concatenate((a, b))
print("Concatenated 1D arrays:")
print(combined)
Output
Concatenated 1D arrays:
[1 2 3 4 5 6 7 8]
2D Array Concatenation:
When working with 2D arrays, we need to specify the axis of concatenation:
axis=0: Join vertically (collapse rows) axis=1: Join horizontally (collapse columns)
Vertical Concatenation (axis=0):
# Create 2D arrays
x = np.array([[1, 2], [3, 4]]) # 2x2 array
y = np.array([[5, 6]]) # 1x2 array
# Vertical concatenation (default is axis=0)
v_combined = np.concatenate((x, y))
print("\nVertical concatenation (axis=0):")
print(v_combined)
Output
Vertical concatenation (axis=0):
[[1 2]
[3 4]
[5 6]]
Horizontal Concatenation (axis=1):
# Create arrays for horizontal concatenation
p = np.array([[1, 2], [3, 4]]) # 2x2 array
q = np.array([[5], [6]]) # 2x1 array
# Horizontal concatenation (axis=1)
h_combined = np.concatenate((p, q), axis=1)
print("\nHorizontal concatenation (axis=1):")
print(h_combined)
Output
Horizontal concatenation (axis=1):
[[1 2 5]
[3 4 6]]
Concatenation Requirements:
The key rule for numpy concatenation is:
When concatenating along a specific axis, all dimensions except that axis must match exactly.
Example 1 (fails):
a = np.array([[1, 2, 3]]) # Shape: (1, 3)
b = np.array([[4, 5, 6, 7]]) # Shape: (1, 4)
np.concatenate((a,b), axis=0)
Concatenating along axis=0 (stacking rows)
For axis=0 concatenation, all other dimensions must match
a has 3 columns, b has 4 columns
Since 3 ≠ 4, this raises an error
Example 2 (works):
c = np.array([[1, 2, 3]]) # Shape: (1, 3)
d = np.array([[4, 5, 6, 7]]) # Shape: (1, 4)
np.concatenate((c,d), axis=1)
Concatenating along axis=1 (stacking columns)
For axis=1 concatenation, all other dimensions must match
Both c and d have 1 row
Since 1 = 1, this works and creates shape (1, 7)
Simple way to remember:
axis=0: Arrays must have the same number of columns
axis=1: Arrays must have the same number of rows
axis=2 (for 3D): Arrays must have matching dimensions in axes 0 and 1
Think of it like stacking blocks:
To stack vertically (axis=0), blocks must have the same width
To stack horizontally (axis=1), blocks must have the same height
Summary Statistics
NumPy provides efficient functions to calculate statistical measures across arrays. These are essential for:
Data exploration and understanding
Identifying patterns and outliers
Summarizing large datasets
Function |
Description |
---|---|
np.sum() |
Sum of array elements |
np.min() |
Minimum value |
np.max() |
Maximum value |
np.mean() |
Arithmetic mean (average) |
np.median() |
Median value |
np.std() |
Standard deviation |
np.var() |
Variance |
axis=None (default): Operate on all elements (flattened array)
axis=0: Collapse rows and operate along columns (down)
axis=1: Collapse columns and operate along rows (across)
Demo
# Create a 2D array
data = np.array([[1, 2, 3],
[4, 5, 6]])
print("Our data:")
print(data)
# Sum of all elements
total = np.sum(data)
print(f"\nTotal sum: {total}") # 21
# Column sums (axis=0)
col_sums = np.sum(data, axis=0)
print(f"Column sums: {col_sums}") # [5 7 9]
## Collapse values in rows along the the column 0 and aggregate: [1, 4] = 5
## Collapse values in rows along the the column 1 and aggregate: [2, 5] = 7
## Collapse values in rows along the the column 2 and aggregate: [3, 6] = 9
# Row sums (axis=1)
row_sums = np.sum(data, axis=1)
print(f"Row sums: {row_sums}") # [6 15]
## Collapse values in columns along the the row 0 and aggregate: [1, 2, 3] = 6
## Collapse values in columns along the the row 1 and aggregate: [4, 5, 6] = 15
Output
Our data:
[[1 2 3]
[4 5 6]]
Total sum: 21
Column sums: [5 7 9]
Row sums: [ 6 15]
Keypoints
Reshaping Arrays: Maintain the total number of elements when reshaping; use -1 for automatic dimension calculation.
Concatenation of Arrays: Combine arrays while matching dimensions, except along the concatenation axis.
Statistical Functions: Utilize NumPy’s statistical functions for data analysis, operating across different axes.
Error Handling: Be aware of shape requirements for concatenation to avoid errors.