Python Project Management - Testing

[@liaoPythonImport2020] 指出 import 常常遇到的問題

[@loongProjectStructure2021] Python project structure 寫的非常好,本文直接引用作爲自己參考。

testing:

由於 Python 簡單易用,很多開始使用 Python 的人都是從一個 script 檔案開始,逐步形成多個 Python 檔案組成的程序。

在脫離 Python 幼幼班準備建立稍大型的專案的時候,學習如何組織化 Python 專案是一大要點。分成三個部分:

  1. 檔案放在同一個 directory 形成一個 package 打包。對應下面的簡單結構。
  2. 不同的 sub-packages 再用一個 (src) directory. 之後一起打包。對應下面的 src 結構的 src directory.
  3. Testing 非常重要,但是一般放在分開的 tests directory, 避免被打包。對應下面的 src 結構的 tests directory.

這裏討論 Testing.

Unittest

Pytest 的特點

  • 會自動辨識 tests directory, test_xxx.py, 以及 def test_xxx module!

  • 使用 assert 語法

  • 可以直接在 command window 執行,或是在 vs code 執行。

  • 如果在 command window 執行 python program, 例如 pytest:

    在 PC Windows 10 PowerShell (PS), 必須這樣設定 PYTHONPATH:

    1
     $env:PYTHONPATH = ".\src"
    

    在 Mac OS, 可以這樣設定 PYTHONPATH:

    1
     $export PYTHONPATH='./src'
    
  • 如果在 vs code 執行 python program, 有兩種設定方式

  1. 直接在 launch.json 設定如下。此處是相當于設定 PYTHONPATH = “./src” 也就是 VS Code {workspaceRoot/src} folder.
  2. 第二種方法是 VS code default 會 load {workspaceRoot}/.env. 也可以用 launch.json 的 envFile 設定 path (這裡也是 ./.env)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Current File",
            "type": "python",
            "request": "launch",
            "program": "${file}",
            "console": "integratedTerminal",
            "env": {"PYTHONPATH": "./src"},
            //"env": {"PYTHONPATH":"${workspaceRoot}/src"},  // same as "./src"
            //"envFile": "${workspaceRoot}/.env",
            //"python": "${command:python.interpreterPath}",
            "justMyCode": true
        }
    ]
}

.env file content 就只有一行:

1
PYTHONPATH=./src

好像第二種方法比較不會有問題?

phone_benchmark + pytest 爲例

首先看 tree structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
phone_benchmark
├── data
│   ├── antutu.html
│   ├── geekbench.html
│   └── gfxbench.html
├── db
│   └── benchmark.db
├── src
│   └── phone_benchmark
│       ├── __init__.py
│       ├── gfxcrawler.py
│       └── gfxsql.py
└── tests
    ├── __init__.py
    └── test_gfxsql.py

src/phone_benchmark/gfxsql.py

先看 gfxsql.py 目的是輸入 gfxbench.html, parse and output to benchmark.db.

原則上每一個 function, 包含main, 都可以被測試。不過一般還是以主要的 function 爲主。

例如 parse_gfxbench_html().

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
import click
from bs4 import BeautifulSoup
import re

'''import sqlite3'''
import sqlite3

# process title to remove space and special characters including return for a legal filename
def process_title(title):
    '''replace space, hyfen, and other special characters with underscore using regular expression'''
    title = re.sub(r'[\r\n\t\s\(\)-/\:*?<>|]', '_', title)
    '''remove leading and continuous underscores'''
    title = re.sub(r'^_+', '', title)
    '''split the title by underscore and return the first element'''
    titleLst = title.split('_')
    return titleLst[0]

def parse_gfxbench_html(html):
    soup = BeautifulSoup(html, 'html.parser')
    '''Get title'''
    gfx_title = process_title( soup.title.string )
    '''extract the phone name'''
    phone_lst = []
    lst_all= soup.find_all('li', class_='name') 
    for item in lst_all:
        linkname = list(item.stripped_strings)
        phone_lst.append(linkname[0])
    '''extract the phone score'''
    score_lst = []
    fps_lst = []
    lst_all= soup.find_all('li', class_='points')
    for item in lst_all:
        linkname = list(item.stripped_strings)
        score_lst.append(linkname[0])
        '''extract the number from the string'''
        fps_lst.append(float(re.findall(r'\d+\.?\d*', linkname[2])[0]))
    '''extract the gpu name'''
    gpu_lst = []
    lst_all= soup.find_all('li', class_='gpu-info')
    for item in lst_all:
        linkname = list(item.stripped_strings)
        '''remove the trademark symbol'''
        gpu_lst.append(linkname[0].replace('™', ''))
    '''extract the api name'''
    api_lst = []
    lst_all= soup.find_all('li', class_='api')
    for item in lst_all:
        linkname = list(item.stripped_strings)
        api_lst.append(linkname[0])
    '''extract date'''
    date_lst = []
    lst_all= soup.find_all('li', class_='date')
    for item in lst_all:
        linkname = list(item.stripped_strings)
        date_lst.append(linkname[0])    
    '''clean the gfx data and convert to database format'''
    (gfx_field, gfx_records) = clean_gfx_data(phone_lst, gpu_lst, api_lst, date_lst, score_lst, fps_lst)
    return (gfx_title, gfx_field, gfx_records)

def clean_gfx_data(phone_lst, gpu_lst, api_lst, date_lst, score_lst, fps_lst):
    '''unify the gfx data format'''
    gfx_itemname = ['Phone', 'GPU', 'API', 'DATE', 'SCORE', 'FPS']
    gfx_item = []
    for i in range(len(phone_lst)):
        gfx_item.append([phone_lst[i], gpu_lst[i], api_lst[i], date_lst[i], score_lst[i], fps_lst[i]])
    return (gfx_itemname, gfx_item)
    

@click.command()
def main():
    with open(r'./data/gfxbench.html','r',encoding="utf-8") as f:
        gfxbench_html = f.read()
    f.close()

    (gfx_title, gfx_field, gfx_records) = parse_gfxbench_html(gfxbench_html)

    db_table = gfx_title
    create_db_table(db_table)
    click.echo(gfx_title)

if __name__ == '__main__':
    main()

tests/test_gfxsql.py

我們看 test_gfxsql.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from click.testing import CliRunner
from phone_benchmark import gfxsql


def test_main():
    runner = CliRunner()
    result = runner.invoke(gfxsql.main)
    assert 'GFXBench' in result.output

def test_parse_gfxbench_html():
    with open(r'./data/gfxbench.html','r',encoding="utf-8") as f:
        gfxbench_html = f.read()
    f.close()
    (gfx_title, gfx_field, gfx_records) = gfxsql.parse_gfxbench_html(gfxbench_html)
    assert gfx_title == 'GFXBench'
    assert gfx_field == ['Phone', 'GPU', 'API', 'DATE', 'SCORE', 'FPS']
    assert len(gfx_records) == 20

if __name__ == '__main__':
    test_main()
    test_parse_gfxbench_html()

直接在 command window 執行 pytest : evoke tests\test_gfxsql.py

其中的兩項 test: test_main() and test_parse_gfxbench_html()

1
2
3
4
5
6
7
8
(base) PS C:\Users\allen\OneDrive\ml_code\work\phone_benchmark_prj> pytest
========================= test session starts =============================
platform win32 -- Python 3.8.5, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
rootdir: C:\Users\allen\OneDrive\ml_code\work\phone_benchmark_prj
collected 2 items

tests\test_gfxsql.py ..                                                                                          [100%]
========================= 2 passed, 0 warning in 0.33s ======================