Download the code: https://github.com/allenlu2009/colab/blob/master/dataframe_demo.ipynb
Python DataFrame
Create DataFrame
-
Direct input
-
Use dict: Method 1: 一筆一筆加入。
1 |
|
Name | Sex | Age | |
---|---|---|---|
0 | Allen | male | 33 |
1 | Alice | female | 22 |
2 | Bob | male | 11 |
Method 2: 一次加入所有資料。
1 |
|
Name | Age | |
---|---|---|
0 | Allen | 33 |
1 | Alice | 22 |
2 | Bob | 11 |
Dataframe 的屬性
- ndim: 2 for 2D dataframe; axis 0 => row; axis 1 => column
- shape: (row no. x column no.) (not including number index)
- dtypes: (object or int) of each column
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
Read CSV
Donwload a test csv file from https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html Pick the biostats.csv
For 2, Before read csv, reference Medium article to import google drive
- Read csv 使用 read_csv function. 但是要加上 skipinitialspace to strip the leading space!!
- Two ways to read_csv: (1) load csv file directly; (2) load from url
1 |
|
1 |
|
Name | Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|---|
0 | Alex | M | 41 | 74 | 170 |
1 | Bert | M | 42 | 68 | 166 |
2 | Carl | M | 32 | 70 | 155 |
3 | Dave | M | 39 | 72 | 167 |
4 | Elly | F | 30 | 66 | 124 |
5 | Fran | F | 33 | 66 | 115 |
6 | Gwen | F | 26 | 64 | 121 |
7 | Hank | M | 30 | 71 | 158 |
8 | Ivan | M | 53 | 72 | 175 |
9 | Jake | M | 32 | 69 | 143 |
10 | Kate | F | 47 | 69 | 139 |
11 | Luke | M | 34 | 72 | 163 |
12 | Myra | F | 23 | 62 | 98 |
13 | Neil | M | 36 | 75 | 160 |
14 | Omar | M | 38 | 70 | 145 |
15 | Page | F | 31 | 67 | 135 |
16 | Quin | M | 29 | 71 | 176 |
17 | Ruth | F | 28 | 65 | 131 |
1 |
|
Name | Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|---|
0 | Alex | M | 41 | 74 | 170 |
1 | Bert | M | 42 | 68 | 166 |
2 | Carl | M | 32 | 70 | 155 |
3 | Dave | M | 39 | 72 | 167 |
4 | Elly | F | 30 | 66 | 124 |
5 | Fran | F | 33 | 66 | 115 |
6 | Gwen | F | 26 | 64 | 121 |
7 | Hank | M | 30 | 71 | 158 |
8 | Ivan | M | 53 | 72 | 175 |
9 | Jake | M | 32 | 69 | 143 |
10 | Kate | F | 47 | 69 | 139 |
11 | Luke | M | 34 | 72 | 163 |
12 | Myra | F | 23 | 62 | 98 |
13 | Neil | M | 36 | 75 | 160 |
14 | Omar | M | 38 | 70 | 145 |
15 | Page | F | 31 | 67 | 135 |
16 | Quin | M | 29 | 71 | 176 |
17 | Ruth | F | 28 | 65 | 131 |
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
Basic Viewing Command
1 |
|
Name | Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|---|
0 | Alex | M | 41 | 74 | 170 |
1 | Bert | M | 42 | 68 | 166 |
2 | Carl | M | 32 | 70 | 155 |
1 |
|
Name | Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|---|
15 | Page | F | 31 | 67 | 135 |
16 | Quin | M | 29 | 71 | 176 |
17 | Ruth | F | 28 | 65 | 131 |
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
Name | Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|---|
7 | Hank | M | 30 | 71 | 158 |
8 | Ivan | M | 53 | 72 | 175 |
9 | Jake | M | 32 | 69 | 143 |
1 |
|
1 |
|
1 |
|
Name | Age | Sex | |
---|---|---|---|
7 | Hank | 30 | M |
8 | Ivan | 53 | M |
9 | Jake | 32 | M |
1 |
|
Name | Age | Sex | |
---|---|---|---|
7 | Hank | 30 | M |
8 | Ivan | 53 | M |
9 | Jake | 32 | M |
10 | Kate | 47 | F |
1 |
|
1 |
|
Basic Index Operation
Index (索引) is a very useful key for DataFrame. The default index is the row number starting from 0 to N-1, where N is the number of data. 除了用 row number 做為 index, 一般也會使用 unique feature 例如 name, id, or phone number 做為 index.
把 column 變成 index
- Method 1: 直接在 read_csv 指定 index_col. 可以看到 index number 消失,而被 Name column 取代。
1 |
|
Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|
Name | ||||
Alex | M | 41 | 74 | 170 |
Bert | M | 42 | 68 | 166 |
Carl | M | 32 | 70 | 155 |
Dave | M | 39 | 72 | 167 |
Elly | F | 30 | 66 | 124 |
Fran | F | 33 | 66 | 115 |
Gwen | F | 26 | 64 | 121 |
Hank | M | 30 | 71 | 158 |
Ivan | M | 53 | 72 | 175 |
Jake | M | 32 | 69 | 143 |
Kate | F | 47 | 69 | 139 |
Luke | M | 34 | 72 | 163 |
Myra | F | 23 | 62 | 98 |
Neil | M | 36 | 75 | 160 |
Omar | M | 38 | 70 | 145 |
Page | F | 31 | 67 | 135 |
Quin | M | 29 | 71 | 176 |
Ruth | F | 28 | 65 | 131 |
- df.index shows the element in index column
1 |
|
1 |
|
- 使用 reset_index 又會回到 index number.
1 |
|
Name | Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|---|
0 | Alex | M | 41 | 74 | 170 |
1 | Bert | M | 42 | 68 | 166 |
2 | Carl | M | 32 | 70 | 155 |
3 | Dave | M | 39 | 72 | 167 |
4 | Elly | F | 30 | 66 | 124 |
5 | Fran | F | 33 | 66 | 115 |
6 | Gwen | F | 26 | 64 | 121 |
7 | Hank | M | 30 | 71 | 158 |
8 | Ivan | M | 53 | 72 | 175 |
9 | Jake | M | 32 | 69 | 143 |
10 | Kate | F | 47 | 69 | 139 |
11 | Luke | M | 34 | 72 | 163 |
12 | Myra | F | 23 | 62 | 98 |
13 | Neil | M | 36 | 75 | 160 |
14 | Omar | M | 38 | 70 | 145 |
15 | Page | F | 31 | 67 | 135 |
16 | Quin | M | 29 | 71 | 176 |
17 | Ruth | F | 28 | 65 | 131 |
再看一次 df 並沒有改變。很多 DataFrame 的 function 都是保留原始的 df, create a new object, 也就是 inplace = False. 如果要取代原來的 df, 必須 inplace = True!
1 |
|
Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|
Name | ||||
Alex | M | 41 | 74 | 170 |
Bert | M | 42 | 68 | 166 |
Carl | M | 32 | 70 | 155 |
Dave | M | 39 | 72 | 167 |
Elly | F | 30 | 66 | 124 |
Fran | F | 33 | 66 | 115 |
Gwen | F | 26 | 64 | 121 |
Hank | M | 30 | 71 | 158 |
Ivan | M | 53 | 72 | 175 |
Jake | M | 32 | 69 | 143 |
Kate | F | 47 | 69 | 139 |
Luke | M | 34 | 72 | 163 |
Myra | F | 23 | 62 | 98 |
Neil | M | 36 | 75 | 160 |
Omar | M | 38 | 70 | 145 |
Page | F | 31 | 67 | 135 |
Quin | M | 29 | 71 | 176 |
Ruth | F | 28 | 65 | 131 |
1 |
|
Name | Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|---|
0 | Alex | M | 41 | 74 | 170 |
1 | Bert | M | 42 | 68 | 166 |
2 | Carl | M | 32 | 70 | 155 |
3 | Dave | M | 39 | 72 | 167 |
4 | Elly | F | 30 | 66 | 124 |
5 | Fran | F | 33 | 66 | 115 |
6 | Gwen | F | 26 | 64 | 121 |
7 | Hank | M | 30 | 71 | 158 |
8 | Ivan | M | 53 | 72 | 175 |
9 | Jake | M | 32 | 69 | 143 |
10 | Kate | F | 47 | 69 | 139 |
11 | Luke | M | 34 | 72 | 163 |
12 | Myra | F | 23 | 62 | 98 |
13 | Neil | M | 36 | 75 | 160 |
14 | Omar | M | 38 | 70 | 145 |
15 | Page | F | 31 | 67 | 135 |
16 | Quin | M | 29 | 71 | 176 |
17 | Ruth | F | 28 | 65 | 131 |
如果再 reset_index()一次,會是什麼結果?此處用 default inplace=False. 多了一個 index column
1 |
|
index | Name | Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|---|---|
0 | 0 | Alex | M | 41 | 74 | 170 |
1 | 1 | Bert | M | 42 | 68 | 166 |
2 | 2 | Carl | M | 32 | 70 | 155 |
3 | 3 | Dave | M | 39 | 72 | 167 |
4 | 4 | Elly | F | 30 | 66 | 124 |
5 | 5 | Fran | F | 33 | 66 | 115 |
6 | 6 | Gwen | F | 26 | 64 | 121 |
7 | 7 | Hank | M | 30 | 71 | 158 |
8 | 8 | Ivan | M | 53 | 72 | 175 |
9 | 9 | Jake | M | 32 | 69 | 143 |
10 | 10 | Kate | F | 47 | 69 | 139 |
11 | 11 | Luke | M | 34 | 72 | 163 |
12 | 12 | Myra | F | 23 | 62 | 98 |
13 | 13 | Neil | M | 36 | 75 | 160 |
14 | 14 | Omar | M | 38 | 70 | 145 |
15 | 15 | Page | F | 31 | 67 | 135 |
16 | 16 | Quin | M | 29 | 71 | 176 |
17 | 17 | Ruth | F | 28 | 65 | 131 |
1 |
|
Name | Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|---|
0 | Alex | M | 41 | 74 | 170 |
1 | Bert | M | 42 | 68 | 166 |
2 | Carl | M | 32 | 70 | 155 |
3 | Dave | M | 39 | 72 | 167 |
4 | Elly | F | 30 | 66 | 124 |
5 | Fran | F | 33 | 66 | 115 |
6 | Gwen | F | 26 | 64 | 121 |
7 | Hank | M | 30 | 71 | 158 |
8 | Ivan | M | 53 | 72 | 175 |
9 | Jake | M | 32 | 69 | 143 |
10 | Kate | F | 47 | 69 | 139 |
11 | Luke | M | 34 | 72 | 163 |
12 | Myra | F | 23 | 62 | 98 |
13 | Neil | M | 36 | 75 | 160 |
14 | Omar | M | 38 | 70 | 145 |
15 | Page | F | 31 | 67 | 135 |
16 | Quin | M | 29 | 71 | 176 |
17 | Ruth | F | 28 | 65 | 131 |
- Method 2: 使用 set_index()
1 |
|
Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|
Name | ||||
Alex | M | 41 | 74 | 170 |
Bert | M | 42 | 68 | 166 |
Carl | M | 32 | 70 | 155 |
Dave | M | 39 | 72 | 167 |
Elly | F | 30 | 66 | 124 |
Fran | F | 33 | 66 | 115 |
Gwen | F | 26 | 64 | 121 |
Hank | M | 30 | 71 | 158 |
Ivan | M | 53 | 72 | 175 |
Jake | M | 32 | 69 | 143 |
Kate | F | 47 | 69 | 139 |
Luke | M | 34 | 72 | 163 |
Myra | F | 23 | 62 | 98 |
Neil | M | 36 | 75 | 160 |
Omar | M | 38 | 70 | 145 |
Page | F | 31 | 67 | 135 |
Quin | M | 29 | 71 | 176 |
Ruth | F | 28 | 65 | 131 |
loc[]
使用 loc[] 配合 index label 取出資料非常方便。
如果是 number index, 可以用 df[0], df[3], etc.
但如果是其他 column index, e.g. Name, df[2] 或是 df[“Hank”] are wrong!, 必須用 df.loc[‘Hank’]
或是 df.loc[ [‘Hank’, ‘Ruth’, ‘Page’] ]
1 |
|
1 |
|
1 |
|
Sex | Age | |
---|---|---|
Name | ||
Alex | M | 41 |
Bert | M | 42 |
Carl | M | 32 |
Dave | M | 39 |
Elly | F | 30 |
Fran | F | 33 |
Gwen | F | 26 |
Hank | M | 30 |
Ivan | M | 53 |
Jake | M | 32 |
Kate | F | 47 |
Luke | M | 34 |
Myra | F | 23 |
Neil | M | 36 |
Omar | M | 38 |
Page | F | 31 |
Quin | M | 29 |
Ruth | F | 28 |
1 |
|
Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|
Name | ||||
Hank | M | 30 | 71 | 158 |
Ruth | F | 28 | 65 | 131 |
Page | F | 31 | 67 | 135 |
loc[] 可以用 row, column 得到對應的 element, 似乎是奇怪的用法
1 |
|
1 |
|
iloc[]
使用 column index 仍然可以用 iloc[] 配合 index number 取出資料。
1 |
|
1 |
|
1 |
|
Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|
Name | ||||
Bert | M | 42 | 68 | 166 |
Carl | M | 32 | 70 | 155 |
Dave | M | 39 | 72 | 167 |
Elly | F | 30 | 66 | 124 |
Fran | F | 33 | 66 | 115 |
Gwen | F | 26 | 64 | 121 |
Hank | M | 30 | 71 | 158 |
Ivan | M | 53 | 72 | 175 |
Jake | M | 32 | 69 | 143 |
1 |
|
Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|
Name | ||||
Bert | M | 42 | 68 | 166 |
Elly | F | 30 | 66 | 124 |
Gwen | F | 26 | 64 | 121 |
排序
包含兩種排序
- sort_index()
- sort_value()
1 |
|
Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|
Name | ||||
Alex | M | 41 | 74 | 170 |
Bert | M | 42 | 68 | 166 |
Carl | M | 32 | 70 | 155 |
Dave | M | 39 | 72 | 167 |
Elly | F | 30 | 66 | 124 |
Fran | F | 33 | 66 | 115 |
Gwen | F | 26 | 64 | 121 |
Hank | M | 30 | 71 | 158 |
Ivan | M | 53 | 72 | 175 |
Jake | M | 32 | 69 | 143 |
Kate | F | 47 | 69 | 139 |
Luke | M | 34 | 72 | 163 |
Myra | F | 23 | 62 | 98 |
Neil | M | 36 | 75 | 160 |
Omar | M | 38 | 70 | 145 |
Page | F | 31 | 67 | 135 |
Quin | M | 29 | 71 | 176 |
Ruth | F | 28 | 65 | 131 |
1 |
|
Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|
Name | ||||
Myra | F | 23 | 62 | 98 |
Gwen | F | 26 | 64 | 121 |
Ruth | F | 28 | 65 | 131 |
Quin | M | 29 | 71 | 176 |
Elly | F | 30 | 66 | 124 |
Hank | M | 30 | 71 | 158 |
Page | F | 31 | 67 | 135 |
Carl | M | 32 | 70 | 155 |
Jake | M | 32 | 69 | 143 |
Fran | F | 33 | 66 | 115 |
Luke | M | 34 | 72 | 163 |
Neil | M | 36 | 75 | 160 |
Omar | M | 38 | 70 | 145 |
Dave | M | 39 | 72 | 167 |
Alex | M | 41 | 74 | 170 |
Bert | M | 42 | 68 | 166 |
Kate | F | 47 | 69 | 139 |
Ivan | M | 53 | 72 | 175 |
Rename and Drop Column(s) and Index(s)
1 |
|
Sex | Age | Height | Weight | |
---|---|---|---|---|
Name | ||||
Alex | M | 41 | 74 | 170 |
Bert | M | 42 | 68 | 166 |
Carl | M | 32 | 70 | 155 |
Dave | M | 39 | 72 | 167 |
Elly | F | 30 | 66 | 124 |
Fran | F | 33 | 66 | 115 |
Gwen | F | 26 | 64 | 121 |
Hank | M | 30 | 71 | 158 |
Ivan | M | 53 | 72 | 175 |
Jake | M | 32 | 69 | 143 |
Kate | F | 47 | 69 | 139 |
Luke | M | 34 | 72 | 163 |
Myra | F | 23 | 62 | 98 |
Neil | M | 36 | 75 | 160 |
Omar | M | 38 | 70 | 145 |
Page | F | 31 | 67 | 135 |
Quin | M | 29 | 71 | 176 |
Ruth | F | 28 | 65 | 131 |
1 |
|
Sex | Age | Height | Weight | |
---|---|---|---|---|
Name | ||||
Allen | M | 41 | 74 | 170 |
Bob | M | 42 | 68 | 166 |
Carl | M | 32 | 70 | 155 |
Dave | M | 39 | 72 | 167 |
Elly | F | 30 | 66 | 124 |
Fran | F | 33 | 66 | 115 |
Gwen | F | 26 | 64 | 121 |
Hank | M | 30 | 71 | 158 |
Ivan | M | 53 | 72 | 175 |
Jake | M | 32 | 69 | 143 |
Kate | F | 47 | 69 | 139 |
Luke | M | 34 | 72 | 163 |
Myra | F | 23 | 62 | 98 |
Neil | M | 36 | 75 | 160 |
Omar | M | 38 | 70 | 145 |
Page | F | 31 | 67 | 135 |
Quin | M | 29 | 71 | 176 |
Ruth | F | 28 | 65 | 131 |
1 |
|
Age | Height | |
---|---|---|
Name | ||
Allen | 41 | 74 |
Bob | 42 | 68 |
Carl | 32 | 70 |
Dave | 39 | 72 |
Elly | 30 | 66 |
Fran | 33 | 66 |
Gwen | 26 | 64 |
Hank | 30 | 71 |
Ivan | 53 | 72 |
Jake | 32 | 69 |
Kate | 47 | 69 |
Luke | 34 | 72 |
Myra | 23 | 62 |
Neil | 36 | 75 |
Omar | 38 | 70 |
Page | 31 | 67 |
Quin | 29 | 71 |
Ruth | 28 | 65 |
1 |
|
Sex | Age | Height | Weight | |
---|---|---|---|---|
Name | ||||
Bob | M | 42 | 68 | 166 |
Carl | M | 32 | 70 | 155 |
Dave | M | 39 | 72 | 167 |
Elly | F | 30 | 66 | 124 |
Fran | F | 33 | 66 | 115 |
Gwen | F | 26 | 64 | 121 |
Hank | M | 30 | 71 | 158 |
Ivan | M | 53 | 72 | 175 |
Jake | M | 32 | 69 | 143 |
Kate | F | 47 | 69 | 139 |
Luke | M | 34 | 72 | 163 |
Myra | F | 23 | 62 | 98 |
Neil | M | 36 | 75 | 160 |
Omar | M | 38 | 70 | 145 |
Page | F | 31 | 67 | 135 |
Quin | M | 29 | 71 | 176 |
進階技巧
Multiple Index (多重索引)
這是非常有用的技巧,使用 set_index with keys
1 |
|
Name | Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|---|
0 | Alex | M | 41 | 74 | 170 |
1 | Bert | M | 42 | 68 | 166 |
2 | Carl | M | 32 | 70 | 155 |
3 | Dave | M | 39 | 72 | 167 |
4 | Elly | F | 30 | 66 | 124 |
5 | Fran | F | 33 | 66 | 115 |
6 | Gwen | F | 26 | 64 | 121 |
7 | Hank | M | 30 | 71 | 158 |
8 | Ivan | M | 53 | 72 | 175 |
9 | Jake | M | 32 | 69 | 143 |
10 | Kate | F | 47 | 69 | 139 |
11 | Luke | M | 34 | 72 | 163 |
12 | Myra | F | 23 | 62 | 98 |
13 | Neil | M | 36 | 75 | 160 |
14 | Omar | M | 38 | 70 | 145 |
15 | Page | F | 31 | 67 | 135 |
16 | Quin | M | 29 | 71 | 176 |
17 | Ruth | F | 28 | 65 | 131 |
1 |
|
Age | Height (in) | Weight (lbs) | ||
---|---|---|---|---|
Name | Sex | |||
Alex | M | 41 | 74 | 170 |
Bert | M | 42 | 68 | 166 |
Carl | M | 32 | 70 | 155 |
Dave | M | 39 | 72 | 167 |
Elly | F | 30 | 66 | 124 |
Fran | F | 33 | 66 | 115 |
Gwen | F | 26 | 64 | 121 |
Hank | M | 30 | 71 | 158 |
Ivan | M | 53 | 72 | 175 |
Jake | M | 32 | 69 | 143 |
Kate | F | 47 | 69 | 139 |
Luke | M | 34 | 72 | 163 |
Myra | F | 23 | 62 | 98 |
Neil | M | 36 | 75 | 160 |
Omar | M | 38 | 70 | 145 |
Page | F | 31 | 67 | 135 |
Quin | M | 29 | 71 | 176 |
Ruth | F | 28 | 65 | 131 |
1 |
|
Age | Height (in) | Weight (lbs) | ||
---|---|---|---|---|
Sex | Name | |||
M | Alex | 41 | 74 | 170 |
Bert | 42 | 68 | 166 | |
Carl | 32 | 70 | 155 | |
Dave | 39 | 72 | 167 | |
F | Elly | 30 | 66 | 124 |
Fran | 33 | 66 | 115 | |
Gwen | 26 | 64 | 121 | |
M | Hank | 30 | 71 | 158 |
Ivan | 53 | 72 | 175 | |
Jake | 32 | 69 | 143 | |
F | Kate | 47 | 69 | 139 |
M | Luke | 34 | 72 | 163 |
F | Myra | 23 | 62 | 98 |
M | Neil | 36 | 75 | 160 |
Omar | 38 | 70 | 145 | |
F | Page | 31 | 67 | 135 |
M | Quin | 29 | 71 | 176 |
F | Ruth | 28 | 65 | 131 |
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
Age | Height (in) | Weight (lbs) | ||
---|---|---|---|---|
Sex | Name | |||
F | Elly | 30 | 66 | 124 |
Fran | 33 | 66 | 115 | |
Gwen | 26 | 64 | 121 | |
Kate | 47 | 69 | 139 | |
Myra | 23 | 62 | 98 | |
Page | 31 | 67 | 135 | |
Ruth | 28 | 65 | 131 | |
M | Alex | 41 | 74 | 170 |
Bert | 42 | 68 | 166 | |
Carl | 32 | 70 | 155 | |
Dave | 39 | 72 | 167 | |
Hank | 30 | 71 | 158 | |
Ivan | 53 | 72 | 175 | |
Jake | 32 | 69 | 143 | |
Luke | 34 | 72 | 163 | |
Neil | 36 | 75 | 160 | |
Omar | 38 | 70 | 145 | |
Quin | 29 | 71 | 176 |
Groupby Command
Groupby 是 SQL 的語法。根據某一項資料做分組方便查找。
The SQL GROUP BY Statement
The GROUP BY statement is often used with aggregate functions (COUNT, MAX, MIN, SUM, AVG) to group the result-set by one or more columns.
1 |
|
Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|
Name | ||||
Alex | M | 41 | 74 | 170 |
Bert | M | 42 | 68 | 166 |
Carl | M | 32 | 70 | 155 |
Dave | M | 39 | 72 | 167 |
Elly | F | 30 | 66 | 124 |
Fran | F | 33 | 66 | 115 |
Gwen | F | 26 | 64 | 121 |
Hank | M | 30 | 71 | 158 |
Ivan | M | 53 | 72 | 175 |
Jake | M | 32 | 69 | 143 |
Kate | F | 47 | 69 | 139 |
Luke | M | 34 | 72 | 163 |
Myra | F | 23 | 62 | 98 |
Neil | M | 36 | 75 | 160 |
Omar | M | 38 | 70 | 145 |
Page | F | 31 | 67 | 135 |
Quin | M | 29 | 71 | 176 |
Ruth | F | 28 | 65 | 131 |
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
Sex | Age | Height (in) | Weight (lbs) | |
---|---|---|---|---|
Name | ||||
Alex | M | 41 | 74 | 170 |
Bert | M | 42 | 68 | 166 |
Carl | M | 32 | 70 | 155 |
Dave | M | 39 | 72 | 167 |
Hank | M | 30 | 71 | 158 |
Ivan | M | 53 | 72 | 175 |
Jake | M | 32 | 69 | 143 |
Luke | M | 34 | 72 | 163 |
Neil | M | 36 | 75 | 160 |
Omar | M | 38 | 70 | 145 |
Quin | M | 29 | 71 | 176 |
Groupby Operation
分組後可以進行各類運算:sum(), mean(), max(), min()
1 |
|
Age | Height (in) | Weight (lbs) | |
---|---|---|---|
Sex | |||
F | 218 | 459 | 863 |
M | 406 | 784 | 1778 |
1 |
|
Age | Height (in) | Weight (lbs) | |
---|---|---|---|
Sex | |||
F | 31.142857 | 65.571429 | 123.285714 |
M | 36.909091 | 71.272727 | 161.636364 |
1 |
|
Age | Height (in) | Weight (lbs) | |
---|---|---|---|
Sex | |||
F | 47 | 69 | 139 |
M | 53 | 75 | 176 |
1 |
|
Age | Height (in) | Weight (lbs) | |
---|---|---|---|
Sex | |||
F | 23 | 62 | 98 |
M | 29 | 68 | 143 |
Wash Data with NAN
判斷 NAN
- isnull()
- notnull()
處理 NAN
- dropna()
- fillna()
1 |
|
1 |
|
Plot
DataFrame 一個很重要的特性是利用 matplotlib.pyplot 繪圖功能 visuallize data!
有兩種方式:(1) 直接用 df.plot; (2) 用 pyplot 的 plot.
(1) 是一個 quick way to plot
(2) 可以調用 pyplot 所有的功能
1 |
|
1 |
|
1 |
|