ユーザ用ツール

サイト用ツール


サイドバー

最新の10件
一覧
microsoft:azure:speech_service

Microsoft Azure Speach Service

Docker版spx

導入

$ sudo docker pull msftspeech/spx
Using default tag: latest
latest: Pulling from msftspeech/spx
08a6abff8943: Pull complete
bb79cebf0976: Pull complete
c63ce881ac8a: Pull complete
0ad24d223958: Pull complete
7c6a85675d6b: Pull complete
65a90f31fc95: Pull complete
ea88efd477c8: Pull complete
47b3b2c55069: Pull complete
Digest: sha256:d3436c4bb023e21d07204aebb4b4025be2a4887c25e7afea3759da4e527267fe
Status: Downloaded newer image for msftspeech/spx:latest
docker.io/msftspeech/spx:latest
$ sudo docker image ls
REPOSITORY       TAG       IMAGE ID       CREATED      SIZE
msftspeech/spx   latest    d1ffdd590c84   6 days ago   969MB

help

$ sudo docker run -it --rm msftspeech/spx help
SPX - Azure Speech CLI, Version 1.21.0
Copyright (c) 2020 Microsoft Corporation. All Rights Reserved.

  ______ ___ _  __
 /  ___// _ \ \/ /
 \___ \/ ___/   <
/____ /_/  /__/\_\

USAGE: spx <command> [...]

COMMANDS

  spx config [...]          (see: spx help config)

  spx recognize [...]       (see: spx help recognize)
  spx synthesize [...]      (see: spx help synthesize)

  spx intent [...]          (see: spx help intent)
  spx translate [...]       (see: spx help translate)

  spx batch [...]           (see: spx help batch)
  spx csr [...]             (see: spx help csr)

  spx profile [...]         (see: spx help profile)
  spx speaker [...]         (see: spx help speaker)

  spx webjob [...]          (see: spx help webjob)

EXAMPLES

  spx config @region --set westus2
  spx config @key --set 436172626F6E20697320636F6F6C2121

  spx synthesize --text "Hello"
  spx synthesize --files *.txt;*.ssml --audio output {id}.wav

  spx recognize --microphone
  spx recognize --files *.wav --output results.tsv

  spx translate --target de --microphone
  spx translate --source en-US --target de;es;fr --file hello.wav

  spx config --set @appId=1b9b6355-e76a-48c5-861c-bb54dd25a806

ADDITIONAL TOPICS

  spx help setup
  spx help examples

  spx help find --topics "examples"
  spx help find --text "example"
  spx help find *

  spx help documentation

regionとkeyを登録

$ sudo docker run -it -v ~/Music:/data --rm msftspeech/spx config  @region --set japanwest
SPX - Azure Speech CLI, Version 1.21.0
Copyright (c) 2020 Microsoft Corporation. All Rights Reserved.

region (saved at '/data')

  japanwest

keyの登録.hisotryに残らないよう頭に 「 」 を加えている

$  sudo docker run -it -v ~/Music:/data --rm msftspeech/spx config @key --set deadbeef1234deadbeef1234deadbeef
SPX - Azure Speech CLI, Version 1.21.0
Copyright (c) 2020 Microsoft Corporation. All Rights Reserved.

key (saved at '/data')

  deadbeef1234deadbeef1234deadbeef
  

keyがhistoryに残っていないか確認.もしあったら消す.

$ history | grep 'msftspeech/spx config'
  372  sudo docker run -it --rm msftspeech/spx config  @region --set japanwest
  373  history | grep 'msftspeech/spx config'

TSS

$ sudo docker run -it -v ~/Music:/data --rm msftspeech/spx synthesize --text "Enjoy using the Speech CLI." --audio output sample.wav
$ ffprobe ~/Music/sample.wav 2>&1 | grep Input -A9
Input #0, wav, from '/home/ubuntu/Music/sample.wav':
  Duration: 00:00:02.88, bitrate: 256 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s

日本語スピーカーの確認

$ sudo docker run -it -v ~/Music:/data --rm msftspeech/spx synthesize --voices | grep -E '^  \[' -A9999 | tr -d '^\ \ ' | jq '.[] | select(.Locale == "ja-JP")'
{
  "Name": "MicrosoftServerSpeechTexttoSpeechVoice(ja-JP,NanamiNeural)",
  "DisplayName": "Nanami",
  "LocalName": "七海",
  "ShortName": "ja-JP-NanamiNeural",
  "Gender": "Female",
  "Locale": "ja-JP",
  "LocaleName": "Japanese(Japan)",
  "StyleList": [
    "chat",
    "customerservice",
    "cheerful"
  ],
  "SampleRateHertz": "24000",
  "VoiceType": "Neural",
  "Status": "GA"
}
{
  "Name": "MicrosoftServerSpeechTexttoSpeechVoice(ja-JP,KeitaNeural)",
  "DisplayName": "Keita",
  "LocalName": "圭太",
  "ShortName": "ja-JP-KeitaNeural",
  "Gender": "Male",
  "Locale": "ja-JP",
  "LocaleName": "Japanese(Japan)",
  "SampleRateHertz": "24000",
  "VoiceType": "Neural",
  "Status": "GA"
}

七海でTTS

$ sudo docker run -it -v ~/Music:/data --rm msftspeech/spx synthesize --text "日本語も話せます." --voice "ja-JP-NanamiNeural" --audio output japan.wav

STT

$ sudo docker run -it -v ~/Music:/data --rm msftspeech/spx recognize --file sample.wav
$ cat ~/Music/output.132950280064316023.tsv
audio.input.id  recognizer.session.started.sessionid    recognizer.recognized.result.text
sample  8bcc7a9b0dd14132841cd19684202714        Enjoy using the speech CLI.
$ sudo docker run -it -v ~/Music:/data --rm msftspeech/spx recognize --file japan.wav
$ cat output.132950295870105988.tsv
audio.input.id  recognizer.session.started.sessionid    recognizer.recognized.result.text
japan   79ab40f8761049a3a8a572b0fbf084c2        Nihongo mohana semas

音声->テキスト翻訳

spx help translate language によると --source, --targetBCP-47 format

$ sudo docker run -it -v ~/Music:/data --rm msftspeech/spx translate --file sample.wav --source en-US --target ja
$ cat output.132950304946172724.tsv
audio.input.id  recognizer.session.started.sessionid    recognizer.recognized.result.text       recognizer.recognized.result.translated.text
sample  25bab9dffa7743ffbdff600e72b3f2ba        Enjoy using the speech CLI.     音声 CLI を使用してお楽しみください。

大きな音声ファイルを渡しても1行で結果が帰ってくるよう.

$ wget https://media.libreplanet.org/mgoblin_media/media_entries/2730/rms-2022-state-of-the-free-software-movement.ogg
$ ffmpeg -i "rms-2022-state-of-the-free-software-movement.ogg" "rms-2022-state-of-the-free-software-movement.wav"
$ sudo docker run -it -v ~/Music:/data --rm msftspeech/spx translate --file rms-2022-state-of-the-free-software-movement.wav --sou
rce en-US --target ja
  :
SESSION STARTED: 4abfe53fee164f57bde9e1406764f37f

Connection CONNECTED...
TRANSLATING into 'ja': 通常は起こりますが、もちろん (from '')
TRANSLATING into 'ja': 通常起こるが、もちろんそれらは今起こっている (from '')
TRANSLATING into 'ja': 通常起こるが、もちろん、彼らは今私は感謝したいと思います起こっています (from '')
TRANSLATING into 'ja': 通常起こるが、もちろん、彼らは今起こっている私はあなたのすべてに感謝したいと思います (from '')
TRANSLATING into 'ja': 通常起こるが、もちろん、彼らは今起こっている私は貢献しているあなたのすべてに感謝したいと思います (from '')
TRANSLATED into 'ja': 通常、それは起こりますが、もちろん起こっています。今、私は貢献した皆さん全員に感謝します。 (from 'Usually happen, but of course they're h
appening. Now I'd to thank all of you who have contributed.')

TRANSLATING into 'ja': 特にGNUソフトウェアに (from '')
TRANSLATING into 'ja': 特にGNUソフトウェアだけでなく、 (from '')
TRANSLATING into 'ja': 特にGNUソフトウェアだけでなく、他のソフトウェアにも (from '')
TRANSLATED into 'ja': 特にGNUソフトウェアだけでなく、他の自由ソフトウェアにも。 (from 'Especially to GNU software, but also to other free software.')
  :
$ ls -ltra | tail -2
-rw-r--r-- 1 root   root       139499 Apr 21 16:55 output.132950309862177089.tsv
-rw-r--r-- 1 root   root    147947031 Apr 21 16:55 log-132950309862177089.log
$ wc output.132950309862177089.tsv
     2  10888 139499 output.132950309862177089.tsv
$ head -1 output.132950309862177089.tsv | wc
      1       4     134
$ head -2 output.132950309862177089.tsv | wc
      2   10888  139499

.Net Linux版

NuGet を使用せずに、Linux バイナリを .zip ファイルとしてダウンロードすることもできます。 spx-netcore-30-linux-x64.zip を新しい ~/spx ディレクトリに抽出し、バイナリに対する「sudo chmod +r+x spx」を入力して、PATH システム変数に ~/spx パスを追加します。

これ( https://aka.ms/speech/spx-linux )は2022-04-18時点ではwindows版が降ってくるよう

$ wc spx-netcore31-win-x64.zip
   8134   48429 2396464 spx-netcore31-win-x64.zip
$ sha512sum spx-netcore31-win-x64.zip
f246087e14bda634c2abeef0e12d6c0d6d582f0addf9d87845a47c49607690a7ec59ce4290555ea62eb57a83914cee448c5bef31e8a7f313be686ddef2d2271d  spx-netcore31-win-x64.zip
$ unzip -l spx-netcore31-win-x64.zip
Archive:  spx-netcore31-win-x64.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
      443  2022-04-14 18:58   CodeSignSummary-f78b4f9d-5d51-4ca8-859d-591c020c3954.md
    11590  2022-04-14 18:42   LICENSE.txt
  2112408  2022-04-14 11:47   Microsoft.CognitiveServices.Speech.core.dll
   231344  2022-04-14 11:47   Microsoft.CognitiveServices.Speech.csharp.dll
   131480  2022-04-14 11:47   Microsoft.CognitiveServices.Speech.extension.audio.sys.dll
   109464  2022-04-14 11:47   Microsoft.CognitiveServices.Speech.extension.codec.dll
   276888  2022-04-14 11:47   Microsoft.CognitiveServices.Speech.extension.kws.dll
   272792  2022-04-14 11:47   Microsoft.CognitiveServices.Speech.extension.lu.dll
   151960  2022-04-14 11:47   Microsoft.CognitiveServices.Speech.extension.silk_codec.dll
   693680  2019-11-09 00:56   Newtonsoft.Json.dll
   852888  2022-04-14 18:58   spx.dll
   185240  2022-04-14 18:45   spx.exe
      154  2022-04-14 18:44   spx.runtimeconfig.json
   229840  2020-11-10 15:53   System.Interactive.Async.dll
  1088976  2020-11-10 15:53   System.Linq.Async.dll
     1743  2022-04-14 18:42   THIRD_PARTY_NOTICE.txt
---------                     -------
  6350890                     16 files

install

Ubuntu 20.04 LTS amd64に導入

.NET Core 3.1 SDK

$ sudo snap install dotnet-sdk --classic --channel=3.1

すでに他のバージョンが導入されているときは導入に失敗するのでuninstallするか,unaliasして開けてあげる

$ dotnet tool install --global Microsoft.CognitiveServices.Speech.CLI
port: running "dotnet-runtime-60.dotnet" failed: cannot create transient scope: DBus error "org.freedesktop.systemd1.NoSuchUnit": [Unit snap.dotnet-runtime-60.dotnet.24fc25e9-291a-485c-9974-0243fa4d1d0d.scope not found.]
$ sudo snap unalias dotnet
Removed:
  - dotnet-runtime-60.dotnet as dotnet
$ snap install dotnet-sdk --classic --channel=3.1
dotnet-sdk (3.1/stable) 3.1.418 from Microsoft .NET Core (dotnetcore✓) installed
$ dotnet tool install --global Microsoft.CognitiveServices.Speech.CLI

Welcome to .NET Core 3.1!
---------------------
SDK Version: 3.1.418

Telemetry
---------
The .NET Core tools collect usage data in order to help us improve your experience. It is collected by Microsoft and shared with the community. You can opt-out
 of telemetry by setting the DOTNET_CLI_TELEMETRY_OPTOUT environment variable to '1' or 'true' using your favorite shell.

Read more about .NET Core CLI Tools telemetry: https://aka.ms/dotnet-cli-telemetry

----------------
Explore documentation: https://aka.ms/dotnet-docs
Report issues and find source on GitHub: https://github.com/dotnet/core
Find out what's new: https://aka.ms/dotnet-whats-new
Learn about the installed HTTPS developer cert: https://aka.ms/aspnet-core-https
Use 'dotnet --help' to see available commands or visit: https://aka.ms/dotnet-cli-docs
Write your first app: https://aka.ms/first-net-core-app
--------------------------------------------------------------------------------------
Tools directory '/home/ubuntu/.dotnet/tools' is not currently on the PATH environment variable.
If you are using bash, you can add it to your profile by running the following command:

cat << \EOF >> ~/.bash_profile
# Add .NET Core SDK tools
export PATH="$PATH:/home/ubuntu/.dotnet/tools"
EOF

You can add it to the current session by running the following command:

export PATH="$PATH:/home/ubuntu/.dotnet/tools"
$ spx help
A fatal error occurred. The required library libhostfxr.so could not be found.
If this is a self-contained application, that library should exist in [/home/ubuntu/.dotnet/tools/.store/microsoft.cognitiveservices.speech.cli/1.21.0/microsof
t.cognitiveservices.speech.cli/1.21.0/tools/netcoreapp3.1/any/].
If this is a framework-dependent application, install the runtime in the global location [/usr/share/dotnet] or use the DOTNET_ROOT environment variable to spe
cify the runtime location or register the runtime location in [/etc/dotnet/install_location].

The .NET Core runtime can be found at:
  - https://aka.ms/dotnet-core-applaunch?missing_runtime=true&arch=x64&rid=ubuntu.20.04-x64
$ cat << \EOF >> ~/.bash_profile
> # Add .NET Core SDK tools
> export PATH="$PATH:/home/ubuntu/.dotnet/tools"
> EOF
$ export DOTNET_ROOT=/snap/dotnet-sdk/current
$ spx
Process terminated. Couldn't find a valid ICU package installed on the system. Set the configuration flag System.Globalization.Invariant to true if you want to
 run with no globalization support.
   at System.Environment.FailFast(System.String)
   at System.Globalization.GlobalizationMode.GetGlobalizationInvariantMode()
   at System.Globalization.GlobalizationMode..cctor()
   at System.Globalization.CultureData.CreateCultureWithInvariantData()
   at System.Globalization.CultureData.get_Invariant()
   at System.Globalization.CultureInfo..cctor()
   at System.IO.TextWriter..cctor()
   at System.IO.StreamWriter..ctor(System.IO.Stream, System.Text.Encoding, Int32, Boolean)
   at System.Console.CreateOutputWriter(System.IO.Stream)
   at System.Console+<>c.<get_Out>b__26_0()
   at System.Threading.LazyInitializer.EnsureInitializedCore[[System.__Canon, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7
bea7798e]](System.__Canon ByRef, System.Object ByRef, System.Func`1<System.__Canon>)
   at System.Threading.LazyInitializer.EnsureInitialized[[System.__Canon, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7
798e]](System.__Canon ByRef, System.Object ByRef, System.Func`1<System.__Canon>)
   at System.Console.get_Out()
   at System.ConsolePal.EnsureInitializedCore()
   at System.ConsolePal+ControlCHandlerRegistrar.Register()
   at System.Console.add_CancelKeyPress(System.ConsoleCancelEventHandler)
   at Azure.AI.Speech.Tools.Carbonx2.Program.Main(System.String[])
Aborted (core dumped)
$ DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1 spx help | head                                                              [328/14677]
SPX - Azure Speech CLI, Version 1.21.0
Copyright (c) 2020 Microsoft Corporation. All Rights Reserved.

  ______ ___ _  __
 /  ___// _ \ \/ /
 \___ \/ ___/   <
/____ /_/  /__/\_\

USAGE: spx <command> [...]
$ echo "DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1" | tee -a ~/.profile
DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1
$ export DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1
$ spx help | head
SPX - Azure Speech CLI, Version 1.21.0
Copyright (c) 2020 Microsoft Corporation. All Rights Reserved.

  ______ ___ _  __
 /  ___// _ \ \/ /
 \___ \/ ___/   <
/____ /_/  /__/\_\

USAGE: spx <command> [...]

コメント

コメントを入力. Wiki文法が有効です:
  _      __  _____   ____  ____   ____
 | | /| / / / ___/  / __/ / __ \ /_  /
 | |/ |/ / / /__   / _/  / /_/ /  / /_
 |__/|__/  \___/  /___/  \___\_\ /___/
 
microsoft/azure/speech_service.txt · 最終更新: 2022/04/22 19:56 by matoken