Reshaping Data
Updated Aug 17, 2021 ·
Wide vs. Long Data
Data can be stored in two formats:
- Wide format – One row per subject, with multiple columns for attributes
- Long format – One subject appears in multiple rows, each row represents an attribute
Wide format is easier to read, while long format is better for analysis.
We can use the melt()
method to transform a wide format data to a long format.
Using .melt()
The .melt()
method converts wide-format data into a long-format, making it more useful for analysis.
- Converts wide data to long format
- Moves selected columns into rows
- Keeps identifier columns unchanged
Consider a dataset containing financial metrics for two companies:
import pandas as pd
social_fin = pd.DataFrame({
"Company": ["Facebook", "Twitter"],
"2016": [27, 3.5],
"2017": [40, 5.0],
"2018": [56, 7.2],
"2019": [70, 8.5]
})
print(social_fin)
Output in wide form:
Company 2016 2017 2018 2019
0 Facebook 27.0 40.0 56.0 70.0
1 Twitter 3.5 5.0 7.2 8.5
Now, we use .melt()
to transform this data.
social_long = social_fin.melt(id_vars=["Company"])
print(social_long)
Output in Long Form:
Company variable value
0 Facebook 2016 27.0
1 Twitter 2016 3.5
2 Facebook 2017 40.0
3 Twitter 2017 5.0
4 Facebook 2018 56.0
5 Twitter 2018 7.2
6 Facebook 2019 70.0
7 Twitter 2019 8.5
Controlling Columns
We can choose which columns to unpivot using value_vars
. In the example below, we'll onlu unpivot the 2017 and 2018 columns.
social_long = social_fin.melt(id_vars=["Company"], value_vars=["2017", "2018"])
print(social_long)
Output:
Company variable value
0 Facebook 2017 40.0
1 Twitter 2017 5.0
2 Facebook 2018 56.0
3 Twitter 2018 7.2
Renaming Columns
We can also rename the columns for better readability.
social_long = social_fin.melt(id_vars=["Company"],
var_name="Year",
value_name="Revenue")
print(social_long)
Output:
Company Year Revenue
0 Facebook 2016 27.0
1 Twitter 2016 3.5
2 Facebook 2017 40.0
3 Twitter 2017 5.0
4 Facebook 2018 56.0
5 Twitter 2018 7.2
6 Facebook 2019 70.0
7 Twitter 2019 8.5