$ sudo docker pull msftspeech/spx Using default tag: latest latest: Pulling from msftspeech/spx 08a6abff8943: Pull complete bb79cebf0976: Pull complete c63ce881ac8a: Pull complete 0ad24d223958: Pull complete 7c6a85675d6b: Pull complete 65a90f31fc95: Pull complete ea88efd477c8: Pull complete 47b3b2c55069: Pull complete Digest: sha256:d3436c4bb023e21d07204aebb4b4025be2a4887c25e7afea3759da4e527267fe Status: Downloaded newer image for msftspeech/spx:latest docker.io/msftspeech/spx:latest $ sudo docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE msftspeech/spx latest d1ffdd590c84 6 days ago 969MB
$ sudo docker run -it --rm msftspeech/spx help SPX - Azure Speech CLI, Version 1.21.0 Copyright (c) 2020 Microsoft Corporation. All Rights Reserved. ______ ___ _ __ / ___// _ \ \/ / \___ \/ ___/ < /____ /_/ /__/\_\ USAGE: spx <command> [...] COMMANDS spx config [...] (see: spx help config) spx recognize [...] (see: spx help recognize) spx synthesize [...] (see: spx help synthesize) spx intent [...] (see: spx help intent) spx translate [...] (see: spx help translate) spx batch [...] (see: spx help batch) spx csr [...] (see: spx help csr) spx profile [...] (see: spx help profile) spx speaker [...] (see: spx help speaker) spx webjob [...] (see: spx help webjob) EXAMPLES spx config @region --set westus2 spx config @key --set 436172626F6E20697320636F6F6C2121 spx synthesize --text "Hello" spx synthesize --files *.txt;*.ssml --audio output {id}.wav spx recognize --microphone spx recognize --files *.wav --output results.tsv spx translate --target de --microphone spx translate --source en-US --target de;es;fr --file hello.wav spx config --set @appId=1b9b6355-e76a-48c5-861c-bb54dd25a806 ADDITIONAL TOPICS spx help setup spx help examples spx help find --topics "examples" spx help find --text "example" spx help find * spx help documentation
regionとkeyを登録
$ sudo docker run -it -v ~/Music:/data --rm msftspeech/spx config @region --set japanwest SPX - Azure Speech CLI, Version 1.21.0 Copyright (c) 2020 Microsoft Corporation. All Rights Reserved. region (saved at '/data') japanwest
keyの登録.hisotryに残らないよう頭に 「 」 を加えている
$ sudo docker run -it -v ~/Music:/data --rm msftspeech/spx config @key --set deadbeef1234deadbeef1234deadbeef SPX - Azure Speech CLI, Version 1.21.0 Copyright (c) 2020 Microsoft Corporation. All Rights Reserved. key (saved at '/data') deadbeef1234deadbeef1234deadbeef
keyがhistoryに残っていないか確認.もしあったら消す.
$ history | grep 'msftspeech/spx config' 372 sudo docker run -it --rm msftspeech/spx config @region --set japanwest 373 history | grep 'msftspeech/spx config'
$ sudo docker run -it -v ~/Music:/data --rm msftspeech/spx synthesize --text "Enjoy using the Speech CLI." --audio output sample.wav $ ffprobe ~/Music/sample.wav 2>&1 | grep Input -A9 Input #0, wav, from '/home/ubuntu/Music/sample.wav': Duration: 00:00:02.88, bitrate: 256 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s
日本語スピーカーの確認
$ sudo docker run -it -v ~/Music:/data --rm msftspeech/spx synthesize --voices | grep -E '^ \[' -A9999 | tr -d '^\ \ ' | jq '.[] | select(.Locale == "ja-JP")' { "Name": "MicrosoftServerSpeechTexttoSpeechVoice(ja-JP,NanamiNeural)", "DisplayName": "Nanami", "LocalName": "七海", "ShortName": "ja-JP-NanamiNeural", "Gender": "Female", "Locale": "ja-JP", "LocaleName": "Japanese(Japan)", "StyleList": [ "chat", "customerservice", "cheerful" ], "SampleRateHertz": "24000", "VoiceType": "Neural", "Status": "GA" } { "Name": "MicrosoftServerSpeechTexttoSpeechVoice(ja-JP,KeitaNeural)", "DisplayName": "Keita", "LocalName": "圭太", "ShortName": "ja-JP-KeitaNeural", "Gender": "Male", "Locale": "ja-JP", "LocaleName": "Japanese(Japan)", "SampleRateHertz": "24000", "VoiceType": "Neural", "Status": "GA" }
七海でTTS
$ sudo docker run -it -v ~/Music:/data --rm msftspeech/spx synthesize --text "日本語も話せます." --voice "ja-JP-NanamiNeural" --audio output japan.wav
$ sudo docker run -it -v ~/Music:/data --rm msftspeech/spx recognize --file sample.wav $ cat ~/Music/output.132950280064316023.tsv audio.input.id recognizer.session.started.sessionid recognizer.recognized.result.text sample 8bcc7a9b0dd14132841cd19684202714 Enjoy using the speech CLI.
$ sudo docker run -it -v ~/Music:/data --rm msftspeech/spx recognize --file japan.wav $ cat output.132950295870105988.tsv audio.input.id recognizer.session.started.sessionid recognizer.recognized.result.text japan 79ab40f8761049a3a8a572b0fbf084c2 Nihongo mohana semas
spx help translate language
によると --source
, --target
は BCP-47 format
$ sudo docker run -it -v ~/Music:/data --rm msftspeech/spx translate --file sample.wav --source en-US --target ja $ cat output.132950304946172724.tsv audio.input.id recognizer.session.started.sessionid recognizer.recognized.result.text recognizer.recognized.result.translated.text sample 25bab9dffa7743ffbdff600e72b3f2ba Enjoy using the speech CLI. 音声 CLI を使用してお楽しみください。
大きな音声ファイルを渡しても1行で結果が帰ってくるよう.
$ wget https://media.libreplanet.org/mgoblin_media/media_entries/2730/rms-2022-state-of-the-free-software-movement.ogg $ ffmpeg -i "rms-2022-state-of-the-free-software-movement.ogg" "rms-2022-state-of-the-free-software-movement.wav" $ sudo docker run -it -v ~/Music:/data --rm msftspeech/spx translate --file rms-2022-state-of-the-free-software-movement.wav --sou rce en-US --target ja : SESSION STARTED: 4abfe53fee164f57bde9e1406764f37f Connection CONNECTED... TRANSLATING into 'ja': 通常は起こりますが、もちろん (from '') TRANSLATING into 'ja': 通常起こるが、もちろんそれらは今起こっている (from '') TRANSLATING into 'ja': 通常起こるが、もちろん、彼らは今私は感謝したいと思います起こっています (from '') TRANSLATING into 'ja': 通常起こるが、もちろん、彼らは今起こっている私はあなたのすべてに感謝したいと思います (from '') TRANSLATING into 'ja': 通常起こるが、もちろん、彼らは今起こっている私は貢献しているあなたのすべてに感謝したいと思います (from '') TRANSLATED into 'ja': 通常、それは起こりますが、もちろん起こっています。今、私は貢献した皆さん全員に感謝します。 (from 'Usually happen, but of course they're h appening. Now I'd to thank all of you who have contributed.') TRANSLATING into 'ja': 特にGNUソフトウェアに (from '') TRANSLATING into 'ja': 特にGNUソフトウェアだけでなく、 (from '') TRANSLATING into 'ja': 特にGNUソフトウェアだけでなく、他のソフトウェアにも (from '') TRANSLATED into 'ja': 特にGNUソフトウェアだけでなく、他の自由ソフトウェアにも。 (from 'Especially to GNU software, but also to other free software.') : $ ls -ltra | tail -2 -rw-r--r-- 1 root root 139499 Apr 21 16:55 output.132950309862177089.tsv -rw-r--r-- 1 root root 147947031 Apr 21 16:55 log-132950309862177089.log $ wc output.132950309862177089.tsv 2 10888 139499 output.132950309862177089.tsv $ head -1 output.132950309862177089.tsv | wc 1 4 134 $ head -2 output.132950309862177089.tsv | wc 2 10888 139499
NuGet を使用せずに、Linux バイナリを .zip ファイルとしてダウンロードすることもできます。 spx-netcore-30-linux-x64.zip を新しい ~/spx ディレクトリに抽出し、バイナリに対する「sudo chmod +r+x spx」を入力して、PATH システム変数に ~/spx パスを追加します。
これ( https://aka.ms/speech/spx-linux )は2022-04-18時点ではwindows版が降ってくるよう
$ wc spx-netcore31-win-x64.zip 8134 48429 2396464 spx-netcore31-win-x64.zip $ sha512sum spx-netcore31-win-x64.zip f246087e14bda634c2abeef0e12d6c0d6d582f0addf9d87845a47c49607690a7ec59ce4290555ea62eb57a83914cee448c5bef31e8a7f313be686ddef2d2271d spx-netcore31-win-x64.zip $ unzip -l spx-netcore31-win-x64.zip Archive: spx-netcore31-win-x64.zip Length Date Time Name --------- ---------- ----- ---- 443 2022-04-14 18:58 CodeSignSummary-f78b4f9d-5d51-4ca8-859d-591c020c3954.md 11590 2022-04-14 18:42 LICENSE.txt 2112408 2022-04-14 11:47 Microsoft.CognitiveServices.Speech.core.dll 231344 2022-04-14 11:47 Microsoft.CognitiveServices.Speech.csharp.dll 131480 2022-04-14 11:47 Microsoft.CognitiveServices.Speech.extension.audio.sys.dll 109464 2022-04-14 11:47 Microsoft.CognitiveServices.Speech.extension.codec.dll 276888 2022-04-14 11:47 Microsoft.CognitiveServices.Speech.extension.kws.dll 272792 2022-04-14 11:47 Microsoft.CognitiveServices.Speech.extension.lu.dll 151960 2022-04-14 11:47 Microsoft.CognitiveServices.Speech.extension.silk_codec.dll 693680 2019-11-09 00:56 Newtonsoft.Json.dll 852888 2022-04-14 18:58 spx.dll 185240 2022-04-14 18:45 spx.exe 154 2022-04-14 18:44 spx.runtimeconfig.json 229840 2020-11-10 15:53 System.Interactive.Async.dll 1088976 2020-11-10 15:53 System.Linq.Async.dll 1743 2022-04-14 18:42 THIRD_PARTY_NOTICE.txt --------- ------- 6350890 16 files
Ubuntu 20.04 LTS amd64に導入
.NET Core 3.1 SDK
$ sudo snap install dotnet-sdk --classic --channel=3.1
すでに他のバージョンが導入されているときは導入に失敗するのでuninstallするか,unaliasして開けてあげる
$ dotnet tool install --global Microsoft.CognitiveServices.Speech.CLI port: running "dotnet-runtime-60.dotnet" failed: cannot create transient scope: DBus error "org.freedesktop.systemd1.NoSuchUnit": [Unit snap.dotnet-runtime-60.dotnet.24fc25e9-291a-485c-9974-0243fa4d1d0d.scope not found.] $ sudo snap unalias dotnet Removed: - dotnet-runtime-60.dotnet as dotnet $ snap install dotnet-sdk --classic --channel=3.1 dotnet-sdk (3.1/stable) 3.1.418 from Microsoft .NET Core (dotnetcore✓) installed
$ dotnet tool install --global Microsoft.CognitiveServices.Speech.CLI Welcome to .NET Core 3.1! --------------------- SDK Version: 3.1.418 Telemetry --------- The .NET Core tools collect usage data in order to help us improve your experience. It is collected by Microsoft and shared with the community. You can opt-out of telemetry by setting the DOTNET_CLI_TELEMETRY_OPTOUT environment variable to '1' or 'true' using your favorite shell. Read more about .NET Core CLI Tools telemetry: https://aka.ms/dotnet-cli-telemetry ---------------- Explore documentation: https://aka.ms/dotnet-docs Report issues and find source on GitHub: https://github.com/dotnet/core Find out what's new: https://aka.ms/dotnet-whats-new Learn about the installed HTTPS developer cert: https://aka.ms/aspnet-core-https Use 'dotnet --help' to see available commands or visit: https://aka.ms/dotnet-cli-docs Write your first app: https://aka.ms/first-net-core-app -------------------------------------------------------------------------------------- Tools directory '/home/ubuntu/.dotnet/tools' is not currently on the PATH environment variable. If you are using bash, you can add it to your profile by running the following command: cat << \EOF >> ~/.bash_profile # Add .NET Core SDK tools export PATH="$PATH:/home/ubuntu/.dotnet/tools" EOF You can add it to the current session by running the following command: export PATH="$PATH:/home/ubuntu/.dotnet/tools"
$ spx help A fatal error occurred. The required library libhostfxr.so could not be found. If this is a self-contained application, that library should exist in [/home/ubuntu/.dotnet/tools/.store/microsoft.cognitiveservices.speech.cli/1.21.0/microsof t.cognitiveservices.speech.cli/1.21.0/tools/netcoreapp3.1/any/]. If this is a framework-dependent application, install the runtime in the global location [/usr/share/dotnet] or use the DOTNET_ROOT environment variable to spe cify the runtime location or register the runtime location in [/etc/dotnet/install_location]. The .NET Core runtime can be found at: - https://aka.ms/dotnet-core-applaunch?missing_runtime=true&arch=x64&rid=ubuntu.20.04-x64
$ cat << \EOF >> ~/.bash_profile > # Add .NET Core SDK tools > export PATH="$PATH:/home/ubuntu/.dotnet/tools" > EOF
$ export DOTNET_ROOT=/snap/dotnet-sdk/current
$ spx Process terminated. Couldn't find a valid ICU package installed on the system. Set the configuration flag System.Globalization.Invariant to true if you want to run with no globalization support. at System.Environment.FailFast(System.String) at System.Globalization.GlobalizationMode.GetGlobalizationInvariantMode() at System.Globalization.GlobalizationMode..cctor() at System.Globalization.CultureData.CreateCultureWithInvariantData() at System.Globalization.CultureData.get_Invariant() at System.Globalization.CultureInfo..cctor() at System.IO.TextWriter..cctor() at System.IO.StreamWriter..ctor(System.IO.Stream, System.Text.Encoding, Int32, Boolean) at System.Console.CreateOutputWriter(System.IO.Stream) at System.Console+<>c.<get_Out>b__26_0() at System.Threading.LazyInitializer.EnsureInitializedCore[[System.__Canon, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7 bea7798e]](System.__Canon ByRef, System.Object ByRef, System.Func`1<System.__Canon>) at System.Threading.LazyInitializer.EnsureInitialized[[System.__Canon, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7 798e]](System.__Canon ByRef, System.Object ByRef, System.Func`1<System.__Canon>) at System.Console.get_Out() at System.ConsolePal.EnsureInitializedCore() at System.ConsolePal+ControlCHandlerRegistrar.Register() at System.Console.add_CancelKeyPress(System.ConsoleCancelEventHandler) at Azure.AI.Speech.Tools.Carbonx2.Program.Main(System.String[]) Aborted (core dumped)
$ DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1 spx help | head [328/14677] SPX - Azure Speech CLI, Version 1.21.0 Copyright (c) 2020 Microsoft Corporation. All Rights Reserved. ______ ___ _ __ / ___// _ \ \/ / \___ \/ ___/ < /____ /_/ /__/\_\ USAGE: spx <command> [...]
$ echo "DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1" | tee -a ~/.profile DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1 $ export DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1 $ spx help | head SPX - Azure Speech CLI, Version 1.21.0 Copyright (c) 2020 Microsoft Corporation. All Rights Reserved. ______ ___ _ __ / ___// _ \ \/ / \___ \/ ___/ < /____ /_/ /__/\_\ USAGE: spx <command> [...]
コメント