Python Project Management - Testing

[@liaoPythonImport2020] 指出 import 常常遇到的問題

[@loongProjectStructure2021] Python project structure 寫的非常好，本文直接引用作爲自己參考。

testing:

由於 Python 簡單易用，很多開始使用 Python 的人都是從一個 script 檔案開始，逐步形成多個 Python 檔案組成的程序。

在脫離 Python 幼幼班準備建立稍大型的專案的時候，學習如何組織化 Python 專案是一大要點。分成三個部分：

檔案放在同一個 directory 形成一個 package 打包。對應下面的簡單結構。
不同的 sub-packages 再用一個 (src) directory. 之後一起打包。對應下面的 src 結構的 src directory.
Testing 非常重要，但是一般放在分開的 tests directory, 避免被打包。對應下面的 src 結構的 tests directory.

這裏討論 Testing.

Unittest

Pytest 的特點

會自動辨識 tests directory, test_xxx.py, 以及 def test_xxx module!
使用 assert 語法
可以直接在 command window 執行，或是在 vs code 執行。
如果在 command window 執行 python program, 例如 pytest:

在 PC Windows 10 PowerShell (PS), 必須這樣設定 PYTHONPATH:
1
$env:PYTHONPATH = ".\src"
在 Mac OS, 可以這樣設定 PYTHONPATH:
1
$export PYTHONPATH='./src'
如果在 vs code 執行 python program, 有兩種設定方式

直接在 launch.json 設定如下。此處是相當于設定 PYTHONPATH = “./src” 也就是 VS Code {workspaceRoot/src} folder.
第二種方法是 VS code default 會 load {workspaceRoot}/.env. 也可以用 launch.json 的 envFile 設定 path (這裡也是 ./.env)

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Current File",
            "type": "python",
            "request": "launch",
            "program": "${file}",
            "console": "integratedTerminal",
            "env": {"PYTHONPATH": "./src"},
            //"env": {"PYTHONPATH":"${workspaceRoot}/src"},  // same as "./src"
            //"envFile": "${workspaceRoot}/.env",
            //"python": "${command:python.interpreterPath}",
            "justMyCode": true
        }
    ]
}

.env file content 就只有一行：

1	`PYTHONPATH=./src`

好像第二種方法比較不會有問題?

phone_benchmark + pytest 爲例

首先看 tree structure:

phone_benchmark
├── data
│   ├── antutu.html
│   ├── geekbench.html
│   └── gfxbench.html
├── db
│   └── benchmark.db
├── src
│   └── phone_benchmark
│       ├── __init__.py
│       ├── gfxcrawler.py
│       └── gfxsql.py
└── tests
    ├── __init__.py
    └── test_gfxsql.py

src/phone_benchmark/gfxsql.py

先看 gfxsql.py 目的是輸入 gfxbench.html, parse and output to benchmark.db.

原則上每一個 function, 包含main, 都可以被測試。不過一般還是以主要的 function 爲主。

例如 parse_gfxbench_html().

import click
from bs4 import BeautifulSoup
import re

'''import sqlite3'''
import sqlite3

# process title to remove space and special characters including return for a legal filename
def process_title(title):
    '''replace space, hyfen, and other special characters with underscore using regular expression'''
    title = re.sub(r'[\r\n\t\s\(\)-/\:*?<>|]', '_', title)
    '''remove leading and continuous underscores'''
    title = re.sub(r'^_+', '', title)
    '''split the title by underscore and return the first element'''
    titleLst = title.split('_')
    return titleLst[0]

def parse_gfxbench_html(html):
    soup = BeautifulSoup(html, 'html.parser')
    '''Get title'''
    gfx_title = process_title( soup.title.string )
    '''extract the phone name'''
    phone_lst = []
    lst_all= soup.find_all('li', class_='name') 
    for item in lst_all:
        linkname = list(item.stripped_strings)
        phone_lst.append(linkname[0])
    '''extract the phone score'''
    score_lst = []
    fps_lst = []
    lst_all= soup.find_all('li', class_='points')
    for item in lst_all:
        linkname = list(item.stripped_strings)
        score_lst.append(linkname[0])
        '''extract the number from the string'''
        fps_lst.append(float(re.findall(r'\d+\.?\d*', linkname[2])[0]))
    '''extract the gpu name'''
    gpu_lst = []
    lst_all= soup.find_all('li', class_='gpu-info')
    for item in lst_all:
        linkname = list(item.stripped_strings)
        '''remove the trademark symbol'''
        gpu_lst.append(linkname[0].replace('™', ''))
    '''extract the api name'''
    api_lst = []
    lst_all= soup.find_all('li', class_='api')
    for item in lst_all:
        linkname = list(item.stripped_strings)
        api_lst.append(linkname[0])
    '''extract date'''
    date_lst = []
    lst_all= soup.find_all('li', class_='date')
    for item in lst_all:
        linkname = list(item.stripped_strings)
        date_lst.append(linkname[0])    
    '''clean the gfx data and convert to database format'''
    (gfx_field, gfx_records) = clean_gfx_data(phone_lst, gpu_lst, api_lst, date_lst, score_lst, fps_lst)
    return (gfx_title, gfx_field, gfx_records)

def clean_gfx_data(phone_lst, gpu_lst, api_lst, date_lst, score_lst, fps_lst):
    '''unify the gfx data format'''
    gfx_itemname = ['Phone', 'GPU', 'API', 'DATE', 'SCORE', 'FPS']
    gfx_item = []
    for i in range(len(phone_lst)):
        gfx_item.append([phone_lst[i], gpu_lst[i], api_lst[i], date_lst[i], score_lst[i], fps_lst[i]])
    return (gfx_itemname, gfx_item)
    

@click.command()
def main():
    with open(r'./data/gfxbench.html','r',encoding="utf-8") as f:
        gfxbench_html = f.read()
    f.close()

    (gfx_title, gfx_field, gfx_records) = parse_gfxbench_html(gfxbench_html)

    db_table = gfx_title
    create_db_table(db_table)
    click.echo(gfx_title)

if __name__ == '__main__':
    main()

tests/test_gfxsql.py

我們看 test_gfxsql.py

from click.testing import CliRunner
from phone_benchmark import gfxsql


def test_main():
    runner = CliRunner()
    result = runner.invoke(gfxsql.main)
    assert 'GFXBench' in result.output

def test_parse_gfxbench_html():
    with open(r'./data/gfxbench.html','r',encoding="utf-8") as f:
        gfxbench_html = f.read()
    f.close()
    (gfx_title, gfx_field, gfx_records) = gfxsql.parse_gfxbench_html(gfxbench_html)
    assert gfx_title == 'GFXBench'
    assert gfx_field == ['Phone', 'GPU', 'API', 'DATE', 'SCORE', 'FPS']
    assert len(gfx_records) == 20

if __name__ == '__main__':
    test_main()
    test_parse_gfxbench_html()

直接在 command window 執行 pytest : evoke tests\test_gfxsql.py

其中的兩項 test: test_main() and test_parse_gfxbench_html()

(base) PS C:\Users\allen\OneDrive\ml_code\work\phone_benchmark_prj> pytest
========================= test session starts =============================
platform win32 -- Python 3.8.5, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
rootdir: C:\Users\allen\OneDrive\ml_code\work\phone_benchmark_prj
collected 2 items

tests\test_gfxsql.py ..                                                                                          [100%]
========================= 2 passed, 0 warning in 0.33s ======================