PythonでWhisperを呼び出し文字起こしアプリを作ろう

はじめに
プログラムはこんな感じ
３．Windowsの「送る」メニューに作ったアプリを追加
まとめ

はじめに

前回、Google ColabでWhisperを触ってみました。
今回は、簡単に扱えるように、EXE化まで行ってみたいと思います。

プログラムはこんな感じ

ファイル名は適当に「whisper_demo.py」です。
お好みに合わせて改造してください。

import sys
import whisper
import ffmpeg
import re


def whisper_mp3(attach):
    for i, file in enumerate(attach):
        print(f'{i}:{file}')

        # 対象ファイル以外は、ffmpegでMP3に変換
        if not file.endswith('.mp3'):
            file = ffmpeg_mp3(file)

        # fname = os.path.basename(file)
        # dname = os.path.dirname(file)
        # outfile_name = fname.replace('.mp3', '.txt')
        outfile = re.sub(r'\.(mp3|MP3)$', '.txt', file)
        # print(outfile)
        result = whisper_proc(file)

        # with open(f'{dname}\\{outfile_name}', "w") as f:
        with open(outfile, "w") as f:
            f.write(result)


def add_line(s):
    new_s = s
    s_count = len(s)
    s_max_count = 40
    if s_count >= s_max_count:
        if (s_count - s_max_count) >= 3:
            new_s = s[:s_max_count] + "\n" + s[s_max_count:]
    return new_s + "\n"


def whisper_proc(file):
    model = whisper.load_model("base")
    result = model.transcribe(file, fp16=False)
    # return result["text"]
    segments = result["segments"]
    subs = []
    for data in segments:
        text = add_line(data["text"])
        subs.append(text)
    return ''.join(subs)


def ffmpeg_mp3(file):
    result_file = file + '.mp3'
    ffmpeg.run(
        ffmpeg.output(
            ffmpeg.input(file),
            result_file)
        )
    return result_file

if __name__ == '__main__':
    whisper_mp3(sys.argv[1:])
    print('文字起こし処理が完了しました。')

２．pyinstallerでEXE化。
①．以下のパッケージを仮想環境に追加でインストールします。

pip install six
pip install tqdm

②．下記のコマンドでEXE化を実行します。

pyinstaller whisper_demo.py --onefile --copy-metadata tqdm --copy-metadata regex --copy-metadata requests --copy-metadata packaging --copy-metadata filelock --copy-metadata numpy --copy-metadata tokenizers --collect-data whisper

pyinstallerについては以前の記事を参照してください。