NumPy

Updated Aug 17, 2021 ·

Overview

Python lists are flexible but slow for data analysis, especially when working with large datasets. NumPy (Numeric Python) solves this by providing NumPy arrays, which enable fast, element-wise calculations.

To use NumPy:

Install it.
```
pip install numpy 
```
Import it using:
```
import numpy as np
```

Usage

You can create a NumPy array with np.array(). Unlike lists, NumPy arrays let you perform operations like BMI calculation across all elements efficiently, making it a powerful tool for data analysis.

Consider the two NumPy arrays below for height and weight for a sports team comprising of 6 athletes.

import numpy as np 

height = [1.56, 1.62, 1.75, 1.80, 1.85, 1.89]  # Heights in meters
weight = [65.3, 68.5, 72.4, 76.0, 80.2, 89.1]  # Weights in kilograms

np_height = np.array(height)
np_weight = np.array(weight)

If you try to calculate the BMI directly with lists, Python will raise an error because mathematical operations like squaring (**) or division cannot be applied element-wise to lists:

bmi = weight / height ** 2
print(bmi)

Output:

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

Using NumPy arrays, you can easily perform the BMI calculation:

bmi = np_weight / np_height ** 2 
print(bmi)

Output:

array([26.83267587, 26.10120408, 23.64081633, 23.45679012, 23.43316289, 24.94331066]) 

NumPy Array Type Rules

NumPy arrays are designed to hold values of a single type, such as floats, booleans, or strings. It assumes that the values inside an array are of the same type.

If you mixed types, they are automatically converted to one type
A NumPy array is simply a new kind of Python type.
it comes with unique methods that behave differently from lists.

For example, the array below contains different types.

np.array([1.0, "is", True])  

The boolean and the float will be converted to strings, as shown in the output:

array(['1.0', 'is', 'True'], dtype='<U32')

Python Lists vs. NumPy Arrays

Operations on Python lists and NumPy arrays can produce different results.

Adding two Python lists combines their elements into a single list.
```
python_list = [1, 2, 3] 
python_list
```
Output:
```
[1, 2, 3, 1, 2, 3]
```

Adding two NumPy arrays performs element-wise addition.

numpy_array = np.array([1, 2, 3])
numpy_array + numpy_array

Output:

array([2, 4, 6])

NumPy Subsetting

Subsetting works similarly to lists using square brackets. NumPy supports boolean subsetting:

Create a boolean array by comparing values.
Use the boolean array for subsetting.
This filters data and provides useful insights.

Using the previous example:

import numpy as np 

height = [1.56, 1.62, 1.75, 1.80, 1.85, 1.89]  # Heights in meters
weight = [65.3, 68.5, 72.4, 76.0, 80.2, 89.1]  # Weights in kilograms

np_height = np.array(height)
np_weight = np.array(weight)

bmi = np_weight / np_height ** 2 
print(bmi)

Output:

array([26.83267587, 26.10120408, 23.64081633, 23.45679012, 23.43316289, 24.94331066]) 

To retrieve the BMI of the third athlete, use an index (remember, indexing starts at 0):

bmi[2]

Output:

23.640816326530615

To filter BMIs greater than 24, first create a boolean array by performing the comparison:

bmi > 24

This will return a Numpy array of boolean values.

array([ True,  True, False, False, False,  True])

You can then use this boolean array to subset the original BMI array:

bmi[bmi > 24]

Output:

array([26.83267587, 26.10120408, 24.94331066]) 

Comparison and Boolean Operators

Operational operators like < and >= worked with NumPy arrays out of the box. They allow you to perform element-wise comparisons without additional modifications.

Example:

import numpy as np
my_array = np.array([10, 15, 20, 25])
print(my_array > 15)  

Output:

[False False  True  True]  

Unfortunately, this is not true for the boolean operators like and, or, and not. To use these operators with NumPy, you will need to use the equivalent:

np.logical_and()
np.logical_or()
np.logical_not()

Consider two NumPy arrays representing house areas:

import numpy as np
house_x = np.array([18.0, 20.0, 10.75, 9.50])
huose_y = np.array([14.0, 24.0, 14.25, 9.0]) 

To determine the areas in house_x which greater than 18.5 but smaller than 10:

print(np.logical_or(house_x > 18.5, house_x < 10))

Output:

[False  True False  True]

To find areas smaller than 11 in both house_x and huose_y:

print(np.logical_and(house_x < 11, huose_y < 11))

Output:

[False False False  True]  

Overview​

Usage​

NumPy Array Type Rules​

Python Lists vs. NumPy Arrays​

NumPy Subsetting​

Comparison and Boolean Operators​

Overview

Usage

NumPy Array Type Rules

Python Lists vs. NumPy Arrays

NumPy Subsetting

Comparison and Boolean Operators