Whisper based speech recognition server#

In the last blog post: Token authenticated aiohttp server, an authenticated aiohttp server was introduced. In this post, we build on top of the previous posts, to create an Automatic Speech Recognition (ASR) authenticated server, that uses the newly introduced whisper by OpenAI.

Speech recognition implementation#

There are different approaches to implement this. The choice here is arbitrary, I chose whisper since it is accurate and simple to integrate. One way to implement this is to use the whisper transcribe function. This can be done as follows:

transcribe#

 import io
 import sys
 import time
 import asyncio
 import logging
 import whisper
 import subprocess
 import soundfile as sf


 # load model
 model = whisper.load_model("base")

 async def stt(request):
     """
     Speech to text routine to transcribe incoming wave audio and deliver
     the resulting text in return.

     Args:
         request.

     Returns:
         Json response.
     """
     response = {}
     stime = time.time()
     bytes_data = await request.content.read()
     logger.debug("size of received bytes data: " + str(sys.getsizeof(bytes_data)) + " bytes")

     try:
         # save temp file
         audio, fs = sf.read(io.BytesIO(bytes_data), dtype="float32")
         result = model.transcribe(audio=audio)
         texts = result['text']
         tproc      = round(time.time() - stime, 3)
         response   = {"text": ' '.join(texts), "proc_duration": tproc, "status": "success"}

     except Exception as e:
         response   = {"Error": str(e), "status": "failed"}
         logger.error("-> Errror :" + str(e))

     return web.json_response(response)

The config#

We will be using the same config specified in Token authenticated aiohttp server/config Note the allowed_tokens sections, that can be replaced by your own server allowed tokens.

Code#

The previously listed steps, should look together in Python as follows:

import io
import sys
import json
import time
import asyncio
import logging
import whisper
import subprocess
import soundfile as sf
from aiohttp import web
from typing import Callable, Coroutine, Tuple


# init loggging
logger = logging.getLogger(__name__)
logging.basicConfig(format="[%(asctime)s.%(msecs)03d] p%(process)s {%(pathname)s: %(funcName)s: %(lineno)d}: %(levelname)s: %(message)s", datefmt="%Y-%m-%d %p %I:%M:%S")
logger.setLevel(10)

# load model
model = whisper.load_model("base")

# parse conf
with open("config.json", "rb") as config_file:
    conf = json.loads(config_file.read())

def ping(request):
    logger.debug("-> Received PING")
    response = web.json_response({"text": "pong", "status": "success"})
    return response

async def stt(request):
    """
    Speech to text routine to transcribe incoming wave audio and deliver
    the resulting text in return.

    Args:
        request.

    Returns:
        Json response.
    """
    response = {}
    stime = time.time()
    bytes_data = await request.content.read()
    logger.debug("size of received bytes data: " + str(sys.getsizeof(bytes_data)) + " bytes")

    try:
        # save temp file
        audio, fs = sf.read(io.BytesIO(bytes_data), dtype="float32")
        result = model.transcribe(audio=audio)
        texts = result['text']
        tproc      = round(time.time() - stime, 3)
        response   = {"text": ' '.join(texts), "proc_duration": tproc, "status": "success"}

    except Exception as e:
        response   = {"Error": str(e), "status": "failed"}
        logger.error("-> Errror :" + str(e))

    return web.json_response(response)

def token_auth_middleware(user_loader: Callable,
                    request_property: str = 'user',
                    auth_scheme: str = 'Token',
                    exclude_routes: Tuple = tuple(),
                    exclude_methods: Tuple = tuple()) -> Coroutine:
  """
  Checks a auth token and adds a user from user_loader in request.
  """
  @web.middleware
  async def middleware(request, handler):
      try               : scheme, token = request.headers['Authorization'].strip().split(' ')
      except KeyError   : raise web.HTTPUnauthorized(reason='Missing authorization header',)
      except ValueError : raise web.HTTPForbidden(reason='Invalid authorization header',)

      if auth_scheme.lower() != scheme.lower():
          raise web.HTTPForbidden(reason='Invalid token scheme',)

      user = await user_loader(token)
      if user : request[request_property] = user
      else    : raise web.HTTPForbidden(reason='Token doesn\'t exist')
      return await handler(request)
  return middleware

async def init():
  """
  Init web application.
  """
  async def user_loader(token: str):
    user = {'uuid': 'fake-uuid'} if token in conf["server"]["http"]["allowed_tokens"] else None
    return user

  app = web.Application(client_max_size=conf["server"]["http"]["request_max_size"],
                middlewares=[token_auth_middleware(user_loader)])

  app.router.add_route('GET', '/ping', ping)
  app.router.add_route('GET', '/stt', stt)
  return app

if __name__ == '__main__':
  web.run_app(init(),)

Testing#

The previous code when executed will start a server running on http://localhost:8080/. To test your server, you can use curl curl -H 'Authorization: Token token1' http://localhost:8080/ping (note that you will need to pass a token as specified in the config and code). As for the transcription function, this can be tested by passing the audio file to transcribe using curl -X POST --data-binary @WAVEFILENAME.wav http://127.0.0.1:8080/stt.

Alternatively, you can use Postman, which is a tool for developers to design, build and test their APIs. For this, you will need to create you GET request, specify the API/server link, add your wave file to transcribe to the body (as binary) and you will need to specify your token in the headers. This can be done under the authorization tab, choose OAuth 2.0 as type, then set the "Access Token" to an allowed token (specify this in the config) "Header Prefix" to Token (this is the same as defined in def token_auth_middleware(.. auth_scheme: str = 'Token'..)).

../../../../_images/postman-auth-config.png — Figure 20: Postman authorization config#

The server reponses should look as follows:

Ping response: {"text": "pong", "status": "success"}
Stt response: {"text": "transcription", "proc_duration": 0.xxx , "status": "success"}

Conclusion#

This blog built on top of the previous posts (Basic aiohttp server and Token authenticated aiohttp server) to deliver an ASR authenticated server based on whisper.

Token authenticated aiohttp server

27 Dezember 2022

✍️ Recent Posts

🗓️ Archives

Whisper based speech recognition server#

Speech recognition implementation#

The config#

Code#

Testing#

Conclusion#

This Page

27 Dezember 2022

✍️ Recent Posts

🗓️ Archives

Whisper based speech recognition server#

Speech recognition implementation#

The config#

Code#

Testing#

Conclusion#

Share this blog#

This Page