Как установить vosk на windows

Installation Instructions

To begin the Vosk voice recognition installation, ensure that your system meets the necessary requirements. Vosk supports various platforms, including Windows, macOS, and Linux. Follow the steps below to set up Vosk on your machine:

Step 1: Install Dependencies

Before installing Vosk, you need to install some dependencies. Use the following commands based on your operating system:

  • For Ubuntu/Debian:
    sudo apt-get update
    sudo apt-get install python3 python3-pip
    
  • For Windows:
    Download and install Python from python.org.

Step 2: Install Vosk API

Once the dependencies are in place, you can install the Vosk API using pip. Run the following command:

pip install vosk

Step 3: Download Language Models

Vosk requires language models to function effectively. You can download the models from the official Vosk GitHub repository. For example, to download the English model, use:

wget https://alphacephei.com/vosk/models/vosk-model-en-us-0.22.zip

Unzip the downloaded model:

unzip vosk-model-en-us-0.22.zip

Step 4: Running a Sample

To test your installation, you can run a sample Python script. Create a new Python file and add the following code:

import sys
import os
from vosk import Model, KaldiRecognizer
import pyaudio

if not os.path.exists("model"):  
    print("Please download the model from https://alphacephei.com/vosk/models")
    sys.exit(1)

model = Model("model")
rec = KaldiRecognizer(model, 16000)

mic = pyaudio.PyAudio()
stream = mic.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8000)
stream.start_stream()

while True:
    data = stream.read(4000)
    if rec.AcceptWaveform(data):
        print(rec.Result())
    else:
        print(rec.PartialResult())

Step 5: Testing the Setup

Run the script using:

python your_script.py

Speak into your microphone, and you should see the recognized text output in the console.

Additional Resources

For further exploration, refer to the official Vosk documentation at Vosk Documentation. This guide provides a comprehensive overview of the Vosk setup process for voice recognition, ensuring a smooth installation experience.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sign up

Master Offline Speech Recognition with Vosk for Python

Table of Contents

  1. Introduction
  2. The Limitations of Speech Recognition Library
  3. Introducing Vosk: An Offline Speech Recognition Library
  4. Why Offline Speech Recognition is Important
  5. Installing Vosk Library
  6. Installing «PyAudio» for Windows Users
  7. Downloading the Vosk Model
  8. Integrating Vosk into Python Code
  9. Implementing Speech Recognition with Vosk
  10. Parsing the Result of Speech Recognition
  11. Conclusion

Introduction

In this article, we will explore the concept of offline speech recognition and how it can be implemented using the Vosk library in Python. Speech recognition has become an increasingly popular technology, with applications ranging from virtual assistants like Alexa or Siri to IoT devices and wearable technology. However, most speech recognition libraries require an internet connection to function, which may not always be feasible. That’s where Vosk comes in, offering an alternative solution for offline speech recognition. Let’s dive in and see how it works!

The Limitations of Speech Recognition Library

Before we Delve into the details of Vosk, it’s essential to understand the limitations of existing speech recognition libraries. The most common speech recognition libraries, such as the one used by Alexa or Siri, heavily rely on an internet connection. This means that for these libraries to function, You need to be connected to the internet. This could be problematic for projects that require offline speech recognition or in scenarios where internet connectivity is not available.

Introducing Vosk: An Offline Speech Recognition Library

Vosk is a Python library that provides offline speech recognition capabilities. Unlike other libraries that rely on an internet connection, Vosk works by using a downloadable model that is stored and processed locally on your computer. This makes it possible to perform speech recognition without the need for an internet connection. By using Vosk, you can develop projects that have offline speech recognition functionalities, making them more versatile and independent.

Why Offline Speech Recognition is Important

Offline speech recognition is crucial for various reasons. Firstly, it allows you to use speech recognition technology in environments where internet connectivity is unavailable or unreliable. This is particularly useful in scenarios like IoT devices or wearable technology, where internet connection may not always be guaranteed. Additionally, having the ability to perform speech recognition offline adds redundancy to your applications, ensuring they can function in a dystopian world where the internet is not accessible.

Installing Vosk Library

To get started with Vosk, you need to install the library in your Python environment. If you’re using PyCharm, you can do this by opening the file settings for your project and searching for «vosk» in the interpreter settings. Click on the «Install» button to install the library. It’s important to note that Vosk supports both Python 3.7 and 3.8, so make sure you choose the correct version Based on your project.

Installing «PyAudio» for Windows Users

For Windows users, there is an additional step required before installing Vosk. Windows machines may not allow the direct installation of «PyAudio» using pip. To overcome this, you can download the appropriate version of «PyAudio» from the Python Extension Packages for Windows Website. Choose the version that matches your Python installation (e.g., 64-bit for Windows Intel) and download it. Once downloaded, navigate to the downloaded file, open a command prompt in that directory, and run the command pip install <filename> to install «PyAudio».

Downloading the Vosk Model

To use Vosk, you need to download the Vosk model corresponding to the language you want to recognize. The Vosk model supports various languages, including English, Chinese, Russian, and more. Visit the official Vosk website, and download the desired model. The model is downloaded as a zip file. Extract the Contents of the zip file into your project folder.

Integrating Vosk into Python Code

Once you have installed Vosk and downloaded the Vosk model, you can start integrating it into your Python code. Begin by importing the necessary modules: muffler, model, cow, drecognizer, and Pi audio. Create an instance of the Vosk model by calling model = m.Model("<path_to_model_folder>"), where «» is the absolute path to the folder where you extracted the Vosk model.

Implementing Speech Recognition with Vosk

With Vosk integrated into your Python code, you can now implement speech recognition functionality. Start by creating an instance of the recognizer: recognizer = k.Recognizer(model, 16000). Initialize the microphone using mic = a.PiAudio(). Open the microphone stream using stream = mic.open(format=a.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8192). Implement a loop to continuously listen for speech input. If speech is detected, use recognizer.accept_waveform(data) to process the audio data. Retrieve the recognized speech text using text = recognizer.result(). Print the recognized text or perform any other desired actions based on the recognized speech.

Parsing the Result of Speech Recognition

The result of speech recognition using Vosk is returned as a STRING in the form of a dictionary. To extract the desired text from the result, you can use string manipulation techniques. For example, you can trim the unwanted parts of the string using indexing, like text[14:-3]. Additionally, you can parse the result further according to your specific requirements.

Conclusion

In this article, we explored the concept of offline speech recognition and how it can be achieved using the Vosk library in Python. We discussed the limitations of existing speech recognition libraries and the importance of having offline speech recognition capabilities. We covered the steps required to install Vosk, including installing «PyAudio» for Windows users and downloading the necessary Vosk model. We also learned how to integrate Vosk into Python code and implement speech recognition functionality. Finally, we looked at how to parse the result of speech recognition. With Vosk, you can now develop projects that offer offline speech recognition capabilities, increasing their flexibility and usability.

Highlights

  • Vosk is a Python library that provides offline speech recognition capabilities.
  • Offline speech recognition is crucial for scenarios where internet connectivity is unavailable or unreliable.
  • Vosk uses a downloadable model to perform speech recognition locally, eliminating the need for an internet connection.
  • To use Vosk, you need to install the library, download the Vosk model, and integrate it into your Python code.
  • The result of speech recognition using Vosk is returned as a string in the form of a dictionary, which can be parsed to extract the desired text.

FAQ

Q: Can Vosk be used for multiple languages?
A: Yes, Vosk supports multiple languages, including English, Chinese, Russian, and more.

Q: Does Vosk require an internet connection to function?
A: No, Vosk is an offline speech recognition library, meaning it doesn’t require an internet connection.

Q: Can I use Vosk on Windows machines?
A: Yes, Vosk is compatible with Windows machines. However, Windows users may need to install «PyAudio» separately for it to work.

Q: Is Vosk compatible with Python 3.8?
A: Yes, Vosk supports both Python 3.7 and Python 3.8 installations.

Q: Can I customize the Vosk model for better recognition accuracy?
A: Yes, Vosk allows for fine-tuning and customization of the model to improve recognition accuracy, although it requires additional steps and expertise.

Преобразование аудио в текст — популярная и повсеместно используемая технология. В этой статье мы расскажем, как распознать речь из аудиофайла на своем ПК без использования онлайн сервисов.

Недавно нам была поставлена задача конвертировать аудиозаписи в текст для дальнейшего анализа. Обязательные условия: офлайн обработка, невысокая требовательность к системным ресурсам, и возможность автоматизации процесса. Мы выбрали Python и библиотеку vosk-api.

Что может Vosk

Vosk – это автономный инструмент для распознавания речи с открытым исходным кодом. Он позволяет использовать модели для 17 языков и диалектов (на момент написания статьи). Модели Vosk малы (50Мб) и позволяют преобразовывать речь в текст «на лету». Существуют и более точные модели. Их размер достигает 2Гб.

Существует реализация библиотеки на Python, Java, NodeJS, C#, C++ и др.

Возможен запуск на ОС Windows, Linux, Android.

Установка

Нам понадобится: python 3.8, библиотеки PyAudio == 0.2.11, vosk == 0.3.1.2

Следующим шагом загружаем модель распознавания. На данный момент для русского доступны две модели:

  • vosk-model-small-ru-0.4 50Мб

  • vosk-model-ru-0.10 2Гб

Большая модель распознает чуть-чуть лучше, а занимает в 40 раз больше места.

После распаковки в каталог модели будет содержать каталоги am, conf, graph и другие.

При появлении ошибки вида:

RuntimeError: Cannot open config file: Z:\Python\Trifonov\vosk\vosk-model-ru-0.10/mfcc.conf

необходимо найти файл в одной из папок модели и переместить в корневой каталог модели. В нашем случае в файл mfcc.conf можно обнаружить в папку conf и переместить его на уровень наверх. С подобной ошибкой я сталкивался на ОС Windows. Для запуска мне пришлось переместить все содержимое папок am, conf, graph, ivector, rmmlm в корень модели.

Использование

Важным параметром является частота дискретизации. Большая модель поддерживает частоту 8000, следовательно, и читать данные с микрофона нужно с такой же частотой

Распознавание «на лету» с микрофона:

from vosk import Model, KaldiRecognizer
import os
import pyaudio

model = Model(r»/home/user/vosk-model-ru-0.10″) # полный путь к модели
rec = KaldiRecognizer(model, 8000)
p = pyaudio.PyAudio()
stream = p.open(
format=pyaudio.paInt16,
channels=1,
rate=8000,
input=True,
frames_per_buffer=8000
)
stream.start_stream()

while True:
data = stream.read(4000)
if len(data) == 0:
break

print(rec.Result() if rec.AcceptWaveform(data) else rec.PartialResult())

print(rec.FinalResult())

Для распознавания аудио из файлов необходимо конвертировать в формат WAV с частотой дискретизации, поддерживаемый выбранной моделью, в моем случае 8000 Гц.

Листинг кода распознавания аудио файла:

from vosk import Model, KaldiRecognizer
import sys
import json
import os
import time
import wave

model = Model(r»/home/user/vosk-model-ru-0.10″)

wf = wave.open(r’test.wav’, «rb»)
rec = KaldiRecognizer(model, 8000)

result = »
last_n = False

while True:
data = wf.readframes(8000)
if len(data) == 0:
break

if rec.AcceptWaveform(data):
res = json.loads(rec.Result())

if res[‘text’] != »:
result += f» {res[‘text’]}»
last_n = False
elif not last_n:
result += ‘\n’
last_n = True

res = json.loads(rec.FinalResult())
result += f» {res[‘text’]}»

print(result)

Для примера я распознал новогоднюю речь президента РФ за 2021 год используя большую модель:

уважаемые граждане россии дорогие друзья всего через несколько минут две тысячи двадцатый заканчивает встречая его ровно год назад мы с вами как и люди во всем мире конечно же думали мечтали о добрых перемен и тогда никто не мог представить через какие испытания всем нам придётся пройти и сейчас кажется что уходящий год вместил в себя груз нескольких лет он был трудным для каждого из нас с тревогами и большими материальными сложностей с переживаниями а для кого то горькими утратами близких любимых людей но безусловно уходящий год был связан и с надеждами на преодолении невзгод с гордостью за тех кто проявил свои лучшие человеческие и профессиональные качества с осознанием того как много значат надёжный искренне настоящие отношения между людьми дружбы и доверия между нами

Качество распознавания очень сильно зависит от шумов в исходном файле. Менее удачный пример распознавания той же моделью (минута из видео c YouTube):

сенсор встречается уже поздний базы багажа его нужно то сам что вот я тебе все скажу ну да точнее его машина сломалась у меня монастыря и нежелательно не знаешь нужно надо пройти сначала думаю да а уж потом переходить через вроде как следствие тени это уже это уже изменить эту нишу а когда вот у нас все равно два быть дотронуться прости очень много всего нужно фанат и пройдя очень много кружков и очень многое даже власть имущих неважно как сбор отдавать бывший министр что заяц сэр очень такой хороший дядька мне посоветовал и незамедлительно он выдаёт рады нас видеть смита трейдеры лазеров что у нас перед зрителями

Также стоит отметить, что данная библиотека распознавания речи не обучена определять жаргонизмы и ненормативную лексику, но позволяет проводить дообучение моделей на пользователькой выборке. Описание данной функции можно найти в документации: https://alphacephei.com/vosk/adaptation.

Библиотека vosk показала хороший результат при обработке аудио в «тепличных условиях», но при появлении шумов качество распознавания значительно снижается.

На слабом офисном ПК мне удалось обработать запись длиной 4 часа за 20 минут.

Speaker 1: Today, we will be installing Vosk. I hope I pronounced that correctly. Probably not. This is the GitHub page with the source code for the offline API. We won’t use this today. In the future, we will try this out with our own code, but for today, we will install Vosk from here. It requires Python 3.9, so we will be using Anaconda. In a new Conda prompt, I’m going to create a new environment initialized with Python 3.9. Copy-paste the command to activate the new environment. Now run pip install vosk. Now we just need to install the wheel. The URL on this page is for a Linux wheel, so we will go to the GitHub page in the releases section and get the URL for the Windows wheel. Copy the URL.

Speaker 2: And run pip install for that URL. And now it is installed. On this page, we can see an example usage. I’m going to copy the command to Notepad. I have a test audio file from a previous video.

Speaker 1: Today, we will be installing Wave2Lip, which will let us make pictures and videos talk the words we want. I’m going to copy the path to this audio file, and change to that directory in the command prompt, and then modify the command to use this file as the input file.

Speaker 2: Now let’s run the command.

Speaker 1: The first time it runs, it is going to install the necessary files and model. By default, it will use the small model, which is not as accurate as the large one. We can specify what model we want to use as one of the command line arguments, but for now, let’s see what happens with the default one.

Speaker 2: It is finished. Let’s check the output.

Speaker 1: That looks good to me. Here are all of the input parameters you can use. You can specify a different model by name or by path. You can also use a model from one of the other supported languages. You can also go to the models page and see what models are available, along with other details. And that is all there is to it. In the future, we will take a closer look at the source code, and how we can use the offline API in custom code.

Понравилась статья? Поделить с друзьями:
0 0 голоса
Рейтинг статьи
Подписаться
Уведомить о
guest

0 комментариев
Старые
Новые Популярные
Межтекстовые Отзывы
Посмотреть все комментарии
  • Как установить pop os рядом с windows
  • Обновление windows 11 новости
  • Ubuntu не загружается после установки загружается windows
  • Ускорить запуск windows 10 при включении через реестр
  • Секретные функции windows 11