NumPy¶

NumPy is a fundamental Python library for computing. It provides efficient data structures for handling vectors and matrices, along with fast and reliable routines for linear algebra operations that are central to many machine learning algorithms.

We begin by importing the NumPy module.

In [124]:

Copied!

import numpy
import numpy

To use the functions, classes, and attributes provided by this module, access them using the dot (.) operator.

Example:

In [125]:

Copied!

import numpy

x = numpy.array([1, 2, 3, 4])
mean_x = numpy.mean(x)
print(mean_x)
import numpy

x = numpy.array([1, 2, 3, 4])
mean_x = numpy.mean(x)
print(mean_x)

2.5

It is common practice to import numpy like this.

In [126]:

Copied!

import numpy as np
import numpy as np

The `ndarray` Object¶

The fundamental data structure in NumPy is the ndarray.
Almost everything we do in machine learning ultimately reduces to operations on ndarray objects.

In [127]:

Copied!

x = np.array([1, 2, 3, 4, 5])
print(x)
type(x)
x = np.array([1, 2, 3, 4, 5])
print(x)
type(x)

[1 2 3 4 5]

Out[127]:

numpy.ndarray

In a machine learning context, a one-dimensional array typically represents:

a feature vector for a single data point,
a parameter vector of a model,
gradients during optimization.

Creating One-Dimensional Arrays¶

In [128]:

Copied!

np.zeros(5)
np.ones(5)
np.arange(0, 10, 2)
np.zeros(5)
np.ones(5)
np.arange(0, 10, 2)

Out[128]:

array([0, 2, 4, 6, 8])

Such arrays are often used to initialize parameters or construct index ranges for experiments.

Using `linspace`¶

In [129]:

Copied!

x = np.linspace(1,10,10)
print(x)
x = np.linspace(1,10,10)
print(x)

[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]

Copying Arrays¶

NumPy arrays are mutable objects. Assigning one array to another variable does not create a new array; it only creates a new reference to the same data in memory.
To create an independent array, an explicit copy is required.

In [130]:

Copied!





x = np.ones(5)
z=x
y = x.copy()     # or y = np.copy(x)

x[:] = 0.0

print('x = ', x)
print('y = ', y)
print('z = ', z)
x = np.ones(5)
z=x
y = x.copy()     # or y = np.copy(x)

x[:] = 0.0

print('x = ', x)
print('y = ', y)
print('z = ', z)

x =  [0. 0. 0. 0. 0.]
y =  [1. 1. 1. 1. 1.]
z =  [0. 0. 0. 0. 0.]

Two-Dimensional Arrays and Datasets¶

In [131]:

Copied!





X = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
X
X = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
X

Out[131]:

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

A two-dimensional array naturally represents a dataset:

rows correspond to samples,
columns correspond to features.

In [132]:

Copied!

X.shape
X.ndim
X.shape
X.ndim

Out[132]:

Additional Array Constructions¶

NumPy provides several convenient functions for creating commonly used matrices that appear frequently in machine learning and linear algebra.

In [133]:

Copied!

A = np.ones((2, 3))
print(A)
A = np.ones((2, 3))
print(A)

[[1. 1. 1.]
 [1. 1. 1.]]

The function np.eye creates an identity matrix, which is often used in linear transformations and regularization.

In [134]:

Copied!

B = np.eye(5)
print(B)
B = np.eye(5)
print(B)

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]

A diagonal matrix can be constructed by specifying its diagonal entries explicitly.

In [135]:

Copied!

a = np.array([-2, 3, 5, 0])    # diagonal elements
D = np.diag(a)
print(D)
a = np.array([-2, 3, 5, 0])    # diagonal elements
D = np.diag(a)
print(D)

[[-2  0  0  0]
 [ 0  3  0  0]
 [ 0  0  5  0]
 [ 0  0  0  0]]

Indexing and Slicing¶

In [136]:

Copied!

x = np.array([10, 20, 30, 40, 50])
x[1:4]
x = np.array([10, 20, 30, 40, 50])
x[1:4]

Out[136]:

array([20, 30, 40])

In [137]:

Copied!

X[0, :]
X[:, 1]
X[0, :]
X[:, 1]

Out[137]:

array([2, 5, 8])

Indexing allows us to isolate individual samples or features, a very common operation in data preprocessing.

Vectorized Operations¶

In [138]:

Copied!





x = np.array([1, 2, 3, 4])
x + 1
x * 2
x ** 2
x = np.array([1, 2, 3, 4])
x + 1
x * 2
x ** 2

Out[138]:

array([ 1,  4,  9, 16])

Vectorized operations avoid explicit loops and are a key reason why NumPy-based implementations are fast.

Broadcasting¶

In [139]:

Copied!





X = np.array([[1, 2, 3],
              [4, 5, 6]])
w = np.array([1, 0, -1])
X + w
X = np.array([[1, 2, 3],
              [4, 5, 6]])
w = np.array([1, 0, -1])
X + w

Out[139]:

array([[2, 2, 2],
       [5, 5, 5]])

Broadcasting is commonly used when adding bias terms or scaling features.

Basic Statistical Operations¶

In [140]:

Copied!





x = np.array([2, 4, 6, 8, 10])
np.mean(x)
np.std(x)
np.min(x)
np.max(x)
x = np.array([2, 4, 6, 8, 10])
np.mean(x)
np.std(x)
np.min(x)
np.max(x)

Out[140]:

np.int64(10)

Statistics computed using NumPy are often part of feature normalization pipelines.

Reshaping Arrays¶

In [141]:

Copied!

x = np.arange(12)
x.reshape(4, 3)
x = np.arange(12)
x.reshape(4, 3)

Out[141]:

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

Reshaping is frequently required when switching between vector and matrix representations.

Matrix Operations¶

In [142]:

Copied!

A = np.array([[1, 2], [3, 4]])
B = np.array([[2, 0], [1, 2]])
A @ B
A = np.array([[1, 2], [3, 4]])
B = np.array([[2, 0], [1, 2]])
A @ B

Out[142]:

array([[ 4,  4],
       [10,  8]])

In [143]:

Copied!

A.T
A.T

Out[143]:

array([[1, 3],
       [2, 4]])

Matrix multiplication and transpose operations are central to linear models and neural networks.

Linear Algebra Utilities¶

In [144]:

Copied!

A = np.array([[2, 1], [1, 3]])
print(np.linalg.det(A))
print(np.linalg.inv(A))
A = np.array([[2, 1], [1, 3]])
print(np.linalg.det(A))
print(np.linalg.inv(A))

5.000000000000001
[[ 0.6 -0.2]
 [-0.2  0.4]]

While higher-level libraries handle most linear algebra, NumPy remains the conceptual foundation.

Random Number Generation¶

In [145]:

Copied!





np.random.seed(0)
X = np.random.randn(5, 3)
y = np.random.randn(5)
X, y
np.random.seed(0)
X = np.random.randn(5, 3)
y = np.random.randn(5)
X, y

Out[145]:

(array([[ 1.76405235,  0.40015721,  0.97873798],
        [ 2.2408932 ,  1.86755799, -0.97727788],
        [ 0.95008842, -0.15135721, -0.10321885],
        [ 0.4105985 ,  0.14404357,  1.45427351],
        [ 0.76103773,  0.12167502,  0.44386323]]),
 array([ 0.33367433,  1.49407907, -0.20515826,  0.3130677 , -0.85409574]))

Functions and Control Flow¶

After working with arrays and matrix constructions, we now introduce functions and conditional statements.
These concepts allow us to organize code logically and apply operations selectively based on conditions.

Defining Functions¶

Functions help modularize code and make computations reusable.
In machine learning, functions are often used to define transformations, loss functions, or evaluation metrics.

In [146]:

Copied!





def square_array(x):
    """Return element-wise square of a NumPy array."""
    return x**2

x = np.array([1, 2, 3, 4])
y = square_array(x)
print(y)
def square_array(x):
    """Return element-wise square of a NumPy array."""
    return x**2

x = np.array([1, 2, 3, 4])
y = square_array(x)
print(y)

[ 1  4  9 16]

Here, the function takes a NumPy array as input and returns a transformed array without modifying the original data.

Using `if` Statements¶

Conditional statements allow code execution to depend on logical conditions.
This is useful when decisions must be made based on data values or parameter settings.

In [147]:

Copied!





x = np.array([1, -2, 3, -4, 5])

if np.any(x < 0):
    print("The array contains negative values")
else:
    print("All values are non-negative")
x = np.array([1, -2, 3, -4, 5])

if np.any(x < 0):
    print("The array contains negative values")
else:
    print("All values are non-negative")

The array contains negative values

The condition checks whether any element of the array satisfies the given inequality.

`if` Statements Inside Functions¶

Conditionals are often combined with functions to create flexible behavior.

In [148]:

Copied!





def normalize_if_needed(x):
    """Normalize array only if values exceed 1."""
    if np.max(np.abs(x)) > 1:
        return x / np.max(np.abs(x))
    else:
        return x

x = np.array([0.2, 0.5, 1.5, -2.0])
y = normalize_if_needed(x)
print(y)
def normalize_if_needed(x):
    """Normalize array only if values exceed 1."""
    if np.max(np.abs(x)) > 1:
        return x / np.max(np.abs(x))
    else:
        return x

x = np.array([0.2, 0.5, 1.5, -2.0])
y = normalize_if_needed(x)
print(y)

[ 0.1   0.25  0.75 -1.  ]

Such logic appears frequently in preprocessing steps, where data-dependent decisions are required.

Euclidean Norm ($\ell_2$ Norm)¶

In [149]:

Copied!

np.linalg.norm(x)
np.linalg.norm(x)

Out[149]:

np.float64(2.5573423705088842)

The Euclidean norm corresponds to the standard distance from the origin.
In machine learning, it is widely used in loss functions and regularization.

Other Common Vector Norms¶

In [150]:

Copied!

np.linalg.norm(x, ord=1)   # L1 norm
np.linalg.norm(x, ord=np.inf)  # Infinity norm
np.linalg.norm(x, ord=1)   # L1 norm
np.linalg.norm(x, ord=np.inf)  # Infinity norm

Out[150]:

np.float64(2.0)

Interpretation:

L1 norm: sum of absolute values (promotes sparsity)
Infinity norm: maximum absolute component

Distance Between Vectors¶

In [151]:

Copied!

x = np.array([1, 2, 3])
y = np.array([2, 0, 4])
np.linalg.norm(x - y)
x = np.array([1, 2, 3])
y = np.array([2, 0, 4])
np.linalg.norm(x - y)

Out[151]:

np.float64(2.449489742783178)

Distances between vectors measure similarity between data points.

Matrix Norms¶

In [152]:

Copied!

A = np.array([[1, 2], [3, 4]])
np.linalg.norm(A)
A = np.array([[1, 2], [3, 4]])
np.linalg.norm(A)

Out[152]:

np.float64(5.477225575051661)

The default matrix norm is the Frobenius norm, which treats the matrix as a long vector.

Dot Product and Angles¶

In [153]:

Copied!

x = np.array([1, 0])
y = np.array([0, 1])
np.dot(x, y)
x = np.array([1, 0])
y = np.array([0, 1])
np.dot(x, y)

Out[153]:

np.int64(0)

The dot product is zero for orthogonal vectors, indicating no directional similarity.

Cosine Similarity¶

In [154]:

Copied!

cos_sim = np.dot(x, y) / (np.linalg.norm(x) * np.linalg.norm(y))
cos_sim
cos_sim = np.dot(x, y) / (np.linalg.norm(x) * np.linalg.norm(y))
cos_sim

Out[154]:

np.float64(0.0)

Cosine similarity measures the angle between vectors and is commonly used in text and embedding models.

Projection of One Vector onto Another¶

In [155]:

Copied!





x = np.array([2, 3])
y = np.array([1, 0])
proj = (np.dot(x, y) / np.dot(y, y)) * y
proj
x = np.array([2, 3])
y = np.array([1, 0])
proj = (np.dot(x, y) / np.dot(y, y)) * y
proj

Out[155]:

array([2., 0.])

Projections are useful in optimization and least-squares problems.

Norm-Based Regularization¶

In [156]:

Copied!





w = np.array([1.5, -0.3, 0.8])
l2_penalty = np.linalg.norm(w)**2
l1_penalty = np.linalg.norm(w, ord=1)
l2_penalty, l1_penalty
w = np.array([1.5, -0.3, 0.8])
l2_penalty = np.linalg.norm(w)**2
l1_penalty = np.linalg.norm(w, ord=1)
l2_penalty, l1_penalty

Out[156]:

(np.float64(2.98), np.float64(2.6))

A First Taste of Learning¶

So far, we have discussed norms, distances, and projections as geometric concepts.
We now briefly show how these ideas appear naturally in a simple machine learning task using real data.

The goal here is conceptual exposure, not algorithmic completeness.

Example: Predicting House Prices (Real Dataset)¶

We consider a small real-world dataset containing housing information.
Each data point consists of a feature vector and a target value (price).

For simplicity, we will use one feature and a linear model.

Loading the Data¶

We use the well-known Boston Housing-style dataset available online in CSV format.
(Any similar regression dataset would serve the same conceptual purpose.)

In [ ]:

Copied!





# Load data from an online source
import numpy as np
import pandas as pd

url = "https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv"
data = pd.read_csv(url)

# Select one feature and the target
x = data['rm'].values      # average number of rooms
y = data['medv'].values    # median house value

x, y
# Load data from an online source
import numpy as np
import pandas as pd

url = "https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv"
data = pd.read_csv(url)

# Select one feature and the target
x = data['rm'].values      # average number of rooms
y = data['medv'].values    # median house value

x, y

Here:

$x$ represents a feature vector (input data),
$y$ represents the target values we want to predict.

A Simple Linear Model¶

We consider a linear model of the form $$ y \approx w x $$ where $w$ is a parameter to be learned from data.

Measuring Error Using Norms¶

For a given choice of $w$, the prediction error is measured using a squared Euclidean norm: $$ \|y - wx\|_2^2 $$ This is exactly the geometric notion of distance discussed earlier.

In [158]:

Copied!





# Try a range of parameter values
w_vals = np.linspace(0, 10, 200)
errors = []

for w in w_vals:
    y_pred = w * x
    error = np.linalg.norm(y - y_pred)**2
    errors.append(error)

w_best = w_vals[np.argmin(errors)]
w_best
# Try a range of parameter values
w_vals = np.linspace(0, 10, 200)
errors = []

for w in w_vals:
    y_pred = w * x
    error = np.linalg.norm(y - y_pred)**2
    errors.append(error)

w_best = w_vals[np.argmin(errors)]
w_best

Out[158]:

np.float64(3.6683417085427137)

The learning task reduces to finding the parameter $w$ that minimizes a norm-based error.