NumPy
Overview
Python lists are flexible but slow for data analysis, especially when working with large datasets. NumPy (Numeric Python) solves this by providing NumPy arrays, which enable fast, element-wise calculations.
To use NumPy:
-
Install it.
pip install numpy
-
Import it using:
import numpy as np
Usage
You can create a NumPy array with np.array()
. Unlike lists, NumPy arrays let you perform operations like BMI calculation across all elements efficiently, making it a powerful tool for data analysis.
Consider the two NumPy arrays below for height and weight for a sports team comprising of 6 athletes.
import numpy as np
height = [1.56, 1.62, 1.75, 1.80, 1.85, 1.89] # Heights in meters
weight = [65.3, 68.5, 72.4, 76.0, 80.2, 89.1] # Weights in kilograms
np_height = np.array(height)
np_weight = np.array(weight)
If you try to calculate the BMI directly with lists, Python will raise an error because mathematical operations like squaring (**) or division cannot be applied element-wise to lists:
bmi = weight / height ** 2
print(bmi)
Output:
TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'
Using NumPy arrays, you can easily perform the BMI calculation:
bmi = np_weight / np_height ** 2
print(bmi)
Output:
array([26.83267587, 26.10120408, 23.64081633, 23.45679012, 23.43316289, 24.94331066])
NumPy Array Type Rules
NumPy arrays are designed to hold values of a single type, such as floats, booleans, or strings. It assumes that the values inside an array are of the same type.
- If you mixed types, they are automatically converted to one type
- A NumPy array is simply a new kind of Python type.
- it comes with unique methods that behave differently from lists.
For example, the array below contains different types.
np.array([1.0, "is", True])
The boolean and the float will be converted to strings, as shown in the output:
array(['1.0', 'is', 'True'], dtype='<U32')
Python Lists vs. NumPy Arrays
Operations on Python lists and NumPy arrays can produce different results.
-
Adding two Python lists combines their elements into a single list.
python_list = [1, 2, 3]
python_listOutput:
[1, 2, 3, 1, 2, 3]
-
Adding two NumPy arrays performs element-wise addition.
numpy_array = np.array([1, 2, 3])
numpy_array + numpy_arrayOutput:
array([2, 4, 6])
NumPy Subsetting
Subsetting works similarly to lists using square brackets. NumPy supports boolean subsetting:
- Create a boolean array by comparing values.
- Use the boolean array for subsetting.
- This filters data and provides useful insights.
Using the previous example:
import numpy as np
height = [1.56, 1.62, 1.75, 1.80, 1.85, 1.89] # Heights in meters
weight = [65.3, 68.5, 72.4, 76.0, 80.2, 89.1] # Weights in kilograms
np_height = np.array(height)
np_weight = np.array(weight)
bmi = np_weight / np_height ** 2
print(bmi)
Output:
array([26.83267587, 26.10120408, 23.64081633, 23.45679012, 23.43316289, 24.94331066])
To retrieve the BMI of the third athlete, use an index (remember, indexing starts at 0):
bmi[2]
Output:
23.640816326530615
To filter BMIs greater than 24, first create a boolean array by performing the comparison:
bmi > 24
This will return a Numpy array of boolean values.
array([ True, True, False, False, False, True])
You can then use this boolean array to subset the original BMI array:
bmi[bmi > 24]
Output:
array([26.83267587, 26.10120408, 24.94331066])
Comparison and Boolean Operators
Operational operators like <
and >=
worked with NumPy arrays out of the box. They allow you to perform element-wise comparisons without additional modifications.
Example:
import numpy as np
my_array = np.array([10, 15, 20, 25])
print(my_array > 15)
Output:
[False False True True]
Unfortunately, this is not true for the boolean operators like and
, or
, and not
. To use these operators with NumPy, you will need to use the equivalent:
np.logical_and()
np.logical_or()
np.logical_not()
Consider two NumPy arrays representing house areas:
import numpy as np
house_x = np.array([18.0, 20.0, 10.75, 9.50])
huose_y = np.array([14.0, 24.0, 14.25, 9.0])
To determine the areas in house_x
which greater than 18.5 but smaller than 10:
print(np.logical_or(house_x > 18.5, house_x < 10))
Output:
[False True False True]
To find areas smaller than 11 in both house_x
and huose_y
:
print(np.logical_and(house_x < 11, huose_y < 11))
Output:
[False False False True]