Count plots
In this exercise, we'll return to exploring our dataset that contains the responses to a survey sent out to young people. We might suspect that young people spend a lot of time on the internet, but how much do they report using the internet each day? Let's use a count plot to break down the number of survey responses in each category and then explore whether it changes based on age.
이번 예제는 설문 조사에 대한 젋은층 응답을 포함하는 데이터 세트를 살펴 봅니다. 젊은 사람들이 인터넷에서 많은 시간을 보냈다고 의심 할 수도 있지만, 실제로 젋은 사람들이 매일 인터넷 사용에 관한 보고는 실제로 확인할 필요가 있습니다. 이번에 카운트 플롯(Count plots)을 사용하여 각 카테고리의 설문 응답 수를 분석 한 다음 연령에 따라 설문 응답이 어떻게 변하는 지 여부를 알아 봅니다.
As a reminder, to create a count plot, we'll use the catplot() function and specify the name of the categorical variable to count (x=____)
, the Pandas DataFrame to use (data=____)
, and the type of plot (kind="count")
.
다시 말해 카운트 플롯을 만들려면 catplot() 함수를 사용하고 계산할 범주 변수의 이름 (x = ____), 사용할 팬더 데이터 프레임 (data = ____) 및 유형을 지정합니다. plot (kind = "count").
Seaborn has been imported as sns
and matplotlib.pyplot
has been imported as plt.
Seaborn은 sn으로 가져오고 matplotlib.pyplot은 plt로 가져 왔습니다.
# Import Matplotlib and Seaborn
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
url = 'https://assets.datacamp.com/production/repositories/3996/datasets/ab13162732ae9ca1a9a27e2efd3da923ed6a4e7b/young-people-survey-responses.csv'
survey_data = pd.read_csv(url)
print(survey_data)
Unnamed: 0 Music Techno Movies History Mathematics Pets Spiders \
0 0 5.0 1.0 5.0 1.0 3.0 4.0 1.0
1 1 4.0 1.0 5.0 1.0 5.0 5.0 1.0
2 2 5.0 1.0 5.0 1.0 5.0 5.0 1.0
3 3 5.0 2.0 5.0 4.0 4.0 1.0 5.0
4 4 5.0 2.0 5.0 3.0 2.0 1.0 1.0
5 5 5.0 1.0 5.0 5.0 2.0 2.0 1.0
6 6 5.0 5.0 4.0 3.0 1.0 5.0 1.0
7 7 5.0 3.0 5.0 5.0 1.0 5.0 1.0
8 8 5.0 1.0 5.0 3.0 1.0 1.0 5.0
9 9 5.0 1.0 5.0 3.0 3.0 2.0 3.0
10 10 5.0 4.0 5.0 3.0 2.0 5.0 2.0
11 11 5.0 1.0 5.0 2.0 1.0 1.0 5.0
12 12 5.0 1.0 5.0 4.0 1.0 2.0 1.0
13 13 5.0 1.0 5.0 2.0 1.0 5.0 3.0
14 14 5.0 1.0 4.0 2.0 1.0 5.0 2.0
15 15 1.0 1.0 5.0 5.0 3.0 1.0 1.0
16 16 5.0 4.0 5.0 1.0 1.0 5.0 5.0
17 17 5.0 1.0 5.0 3.0 1.0 5.0 4.0
18 18 5.0 4.0 5.0 3.0 5.0 4.0 1.0
19 19 5.0 3.0 3.0 2.0 3.0 1.0 4.0
20 20 5.0 3.0 5.0 5.0 2.0 3.0 2.0
21 21 5.0 1.0 5.0 4.0 3.0 5.0 1.0
22 22 5.0 3.0 5.0 3.0 1.0 5.0 5.0
23 23 5.0 1.0 5.0 2.0 2.0 5.0 3.0
24 24 5.0 3.0 5.0 4.0 2.0 3.0 5.0
25 25 5.0 2.0 5.0 2.0 3.0 3.0 5.0
26 26 5.0 2.0 5.0 3.0 2.0 5.0 1.0
27 27 4.0 4.0 5.0 2.0 1.0 5.0 5.0
28 28 5.0 3.0 5.0 5.0 1.0 3.0 4.0
29 29 5.0 3.0 5.0 5.0 2.0 2.0 5.0
... ... ... ... ... ... ... ... ...
980 980 5.0 1.0 5.0 5.0 1.0 3.0 5.0
981 981 5.0 1.0 5.0 5.0 1.0 1.0 5.0
982 982 5.0 1.0 4.0 1.0 3.0 4.0 1.0
983 983 5.0 2.0 5.0 2.0 1.0 4.0 5.0
984 984 5.0 1.0 4.0 1.0 1.0 4.0 5.0
985 985 5.0 1.0 5.0 4.0 1.0 2.0 4.0
986 986 4.0 4.0 4.0 5.0 5.0 1.0 1.0
987 987 4.0 1.0 4.0 5.0 2.0 2.0 2.0
988 988 5.0 NaN 5.0 5.0 4.0 1.0 1.0
989 989 5.0 5.0 3.0 5.0 1.0 1.0 1.0
990 990 5.0 1.0 5.0 3.0 1.0 1.0 1.0
991 991 5.0 2.0 5.0 4.0 1.0 5.0 3.0
992 992 4.0 5.0 4.0 5.0 5.0 1.0 5.0
993 993 5.0 2.0 5.0 1.0 1.0 2.0 3.0
994 994 5.0 1.0 5.0 5.0 1.0 5.0 4.0
995 995 5.0 5.0 5.0 3.0 4.0 5.0 1.0
996 996 5.0 1.0 3.0 4.0 1.0 2.0 2.0
997 997 5.0 1.0 4.0 1.0 2.0 5.0 4.0
998 998 5.0 5.0 5.0 3.0 5.0 1.0 1.0
999 999 5.0 5.0 4.0 4.0 5.0 1.0 2.0
1000 1000 5.0 3.0 5.0 3.0 1.0 5.0 1.0
1001 1001 5.0 3.0 3.0 3.0 1.0 4.0 1.0
1002 1002 5.0 3.0 3.0 2.0 1.0 2.0 1.0
1003 1003 4.0 4.0 5.0 3.0 1.0 5.0 4.0
1004 1004 5.0 5.0 5.0 3.0 2.0 4.0 3.0
1005 1005 5.0 3.0 5.0 4.0 1.0 4.0 2.0
1006 1006 4.0 4.0 5.0 4.0 5.0 5.0 1.0
1007 1007 4.0 1.0 4.0 2.0 3.0 5.0 2.0
1008 1008 5.0 2.0 5.0 3.0 1.0 4.0 3.0
1009 1009 5.0 3.0 5.0 2.0 2.0 5.0 1.0
Loneliness Parents' advice Internet usage Finances Age \
0 3.0 4.0 few hours a day 3.0 20.0
1 2.0 2.0 few hours a day 3.0 19.0
2 5.0 3.0 few hours a day 2.0 20.0
3 5.0 2.0 most of the day 2.0 22.0
4 3.0 3.0 few hours a day 4.0 20.0
5 2.0 3.0 few hours a day 2.0 20.0
6 3.0 4.0 less than an hour a day 4.0 20.0
7 2.0 3.0 few hours a day 3.0 19.0
8 4.0 4.0 few hours a day 2.0 18.0
9 2.0 3.0 few hours a day 4.0 19.0
10 2.0 4.0 less than an hour a day 2.0 19.0
11 4.0 4.0 few hours a day 2.0 17.0
12 5.0 4.0 few hours a day 4.0 24.0
13 2.0 3.0 few hours a day 3.0 19.0
14 2.0 4.0 most of the day 5.0 22.0
15 4.0 4.0 few hours a day 3.0 18.0
16 2.0 3.0 few hours a day 3.0 19.0
17 4.0 1.0 few hours a day 1.0 20.0
18 4.0 3.0 few hours a day 4.0 18.0
19 2.0 2.0 few hours a day 2.0 18.0
20 2.0 3.0 few hours a day 2.0 20.0
21 4.0 3.0 few hours a day 2.0 24.0
22 3.0 3.0 few hours a day 5.0 22.0
23 3.0 2.0 few hours a day 2.0 20.0
24 4.0 4.0 few hours a day 3.0 19.0
25 2.0 4.0 few hours a day 2.0 20.0
26 4.0 5.0 few hours a day 5.0 22.0
27 3.0 4.0 few hours a day 3.0 19.0
28 3.0 3.0 few hours a day 3.0 20.0
29 3.0 3.0 few hours a day 4.0 19.0
... ... ... ... ... ...
980 2.0 4.0 less than an hour a day 1.0 18.0
981 4.0 2.0 most of the day 5.0 19.0
982 3.0 4.0 few hours a day 2.0 18.0
983 2.0 3.0 few hours a day 2.0 22.0
984 3.0 1.0 most of the day 4.0 21.0
985 3.0 1.0 few hours a day 5.0 20.0
986 3.0 3.0 few hours a day 1.0 19.0
987 4.0 2.0 few hours a day 3.0 20.0
988 5.0 3.0 few hours a day 1.0 19.0
989 5.0 3.0 most of the day 3.0 30.0
990 2.0 2.0 few hours a day 1.0 29.0
991 2.0 1.0 most of the day 3.0 21.0
992 3.0 3.0 most of the day 1.0 30.0
993 4.0 3.0 few hours a day 4.0 21.0
994 3.0 4.0 few hours a day 3.0 20.0
995 4.0 4.0 few hours a day 4.0 18.0
996 2.0 4.0 few hours a day 3.0 20.0
997 2.0 NaN less than an hour a day 3.0 19.0
998 2.0 3.0 few hours a day 4.0 28.0
999 3.0 3.0 few hours a day 3.0 19.0
1000 3.0 2.0 few hours a day 1.0 16.0
1001 2.0 2.0 few hours a day 2.0 18.0
1002 3.0 2.0 few hours a day 4.0 22.0
1003 3.0 4.0 few hours a day 3.0 20.0
1004 3.0 2.0 few hours a day 4.0 22.0
1005 4.0 4.0 few hours a day 3.0 20.0
1006 1.0 4.0 less than an hour a day 3.0 27.0
1007 4.0 4.0 most of the day 1.0 18.0
1008 3.0 3.0 most of the day 3.0 25.0
1009 3.0 3.0 few hours a day 5.0 21.0
Siblings Gender Village - town
0 1.0 female village
1 2.0 female city
2 2.0 female city
3 1.0 female city
4 1.0 female village
5 1.0 male city
6 1.0 female village
7 1.0 male city
8 1.0 female city
9 3.0 female city
10 2.0 female city
11 1.0 female city
12 10.0 female city
13 1.0 female city
14 1.0 female city
15 0.0 male city
16 2.0 female city
17 1.0 female village
18 2.0 male city
19 1.0 male city
20 1.0 male city
21 1.0 male city
22 1.0 female city
23 3.0 female city
24 1.0 female city
25 1.0 female city
26 1.0 female city
27 1.0 female city
28 2.0 male village
29 2.0 female village
... ... ... ...
980 2.0 female city
981 1.0 female village
982 2.0 male city
983 2.0 female village
984 1.0 female city
985 1.0 female city
986 3.0 female city
987 1.0 male city
988 0.0 male city
989 2.0 female village
990 1.0 male city
991 0.0 male city
992 1.0 male city
993 0.0 female city
994 0.0 female city
995 0.0 female city
996 0.0 male NaN
997 1.0 female NaN
998 1.0 male city
999 1.0 male city
1000 1.0 female city
1001 2.0 female city
1002 1.0 male city
1003 1.0 female city
1004 1.0 male city
1005 1.0 female city
1006 5.0 male village
1007 0.0 female city
1008 1.0 female city
1009 1.0 male village
[1010 rows x 16 columns]
Quesiton
- Use sns.catplot() to create a count plot using the survey_data DataFrame with "Internet usage" on the x-axis.
# Change the orientation of the plot
sns.catplot(x="Internet usage", data=survey_data,
kind="count")
# Show plot
plt.show()
Question
- Make the bars horizontal instead of vertical.
# Change the orientation of the plot
sns.catplot(y="Internet usage", data=survey_data,
kind="count")
# Show plot
plt.show()
Question
- Create column subplots based on "Age Category", which separates respondents into those that are younger than 21 vs. 21 and older.
# Create Age Category column by condition
import numpy as np
survey_data['Age Category'] = np.where(survey_data['Age'] >= 21, "21+", "Less than 21")
print(survey_data[['Age', 'Age Category']].head())
Age Age Category
0 20.0 Less than 21
1 19.0 Less than 21
2 20.0 Less than 21
3 22.0 21+
4 20.0 Less than 21
# Create column subplots based on age category
sns.catplot(y="Internet usage", data=survey_data,
kind="count", col='Age Category')
# Show plot
plt.show()
All the Contents are from DataCamp