☁️くもをもくもくまなぶ

クラウドコンピューティングサービスの学んだことを中心につらつらと書いています

はじめての自然言語解析(NLP) ハンズオン

セミナー開催概要


URL

https://aws-seminar.smktg.jp/public/seminar/view/1354

日時

2020/03/03 14:00-17:30

形態

当初は、AWSセミナールーム開催予定だったが、
新型コロナウィルスの感染拡大影響もあり、オンライン開催へ変更

必要なもの

・WiFi 環境に接続可能で音声が出力できるPC
・AWS アカウント ID(本番環境とは別のアカウントを推奨)

Amazon Polly

概要

文字列を音声に変換
→日本語は、Mizuki(女性)とTakumi(男性)
→多言語は、男性・女性と若年・中年など組み合わせが選択肢の幅は広い

特徴

読み方の修正も可能。
SSML、発音のカスタマイズがある。
→人名、地名などの特定の読み方が必要な言語がある場合に事前定義


Amazon Transcribe

Amazon Transcribe

概要

予め、S3にMP3を格納してからでないとサービスが利用できない。
文字起こしを JOB として定義。
Polly/Transribeに比べて時間が掛かる。
音声の時間と同等、若干短いかもしれないが変換時間が掛かる。
カスタムボキャブラリィとして、繰り返し出る用語を定義できる。
文字起こしを行った結果に個人の住所や、クレジットカードなどの番号を隅つけで隠して文字起こしする。

AmazonComprehend

概要

文章からすべての単語を出力するのではなく、主語を出力。
Sentimentは投入された文章からネガポジ分析、中立なのかを表示。
→ニュースは客観的な表現を行っているので、だいたいNeutralとして表示される

特徴

Tokyo Regionのサービスはまだリリースされたばかりで完全ではない。
Service Lunch(同時実行数)に不安があり、
Oregon Regionの方が同時実行数に余裕がある。

Amazon Translate

概要

言語の翻訳
→サンプルでは、英語のニュースサイトから文字列を日本語へ翻訳

特徴

カスタム用語(TranslationMemory)を、 EC2 などの独自の用語といった独自の言葉を定義。

ハンズオン内容

選んだ英語の文章が良くなかったかな...

利用した翻訳記事

the japan times alpha
https://alpha.japantimes.co.jp/article/top_news/202002/37147/

South Korean dark comedy Parasite made movie history at the Oscars on Feb. 9,
becoming the first non-English-language film to win the best picture award — Hollywood’s biggest prize of all.
記事を翻訳

事前に用語の定義を行っていないので、凄い文章です。。

韓国の大学コメディ寄生虫は二月九日にオスカーで映画詩を作りハリウッドの最大の賞を受賞した最初の日英語映画になりました
文字起こしの用意

AmazonPollyで上記の文字列をMP3に変換しDL

Image in a image block
文字起こし

Lambdaから実行して、文字起こし

Image in a image block

Cloud9からTranscribeのJOBを実行

Uploading speech_20200303053215852.mp3...
Starting transcription job...
Waiting for job to complete...
Still waiting...
Still waiting...
Still waiting...
Still waiting...
Still waiting...
Detecting key phrases...
Detecting sentiment...
Indexing document...
<Response [201]>
{'_index': 'support-calls', '_type': 'call', '_id': 'speech_20200303053215852', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1}


Process exited with code: 0

ネガポジ分析

今回は処理結果をElasticSearch(ES)に取り込むことで、kibanaで可視化。

Image in a image block

まとめ

  1. Pollyの発話が改善されている(詰まるような印象がない)
  2. 変換処理も、予め予約後を定義すれば回避できる(だろう)
  3. AmazonConnectの通話録音(S3)から書き起こしも容易になった

ここから先は作業記録

Cloud9の環境設定で途中エラーが出たので、ログを個人用に貼り付け


admin:~/environment $ aws s3 mb s3://20200303transcribeshibao --region us-west2
make_bucket failed: s3://20200303transcribeshibao Could not connect to the endpoint URL: "<https://20200303transcribeshibao.s3.us-west2.amazonaws.com/>"
admin:~/environment $
admin:~/environment $ aws s3 mb s3://20200303transcribeshibao --region us-west2
make_bucket failed: s3://20200303transcribeshibao Could not connect to the endpoint URL: "<https://20200303transcribeshibao.s3.us-west2.amazonaws.com/>"
admin:~/environment $
admin:~/environment $
admin:~/environment $ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=43 time=6.32 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=43 time=6.45 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=43 time=6.33 ms
^C
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 6.327/6.373/6.456/0.058 ms
admin:~/environment $
admin:~/environment $
admin:~/environment $ python --version
Python 3.6.10
admin:~/environment $
admin:~/environment $ pip --version
pip 9.0.3 from /usr/lib/python2.7/dist-packages (python 2.7)
admin:~/environment $
admin:~/environment $ sudo update-alternatives --config python

There are 2 programs which provide 'python'.

  Selection    Command
-----------------------------------------------
*+ 1           /usr/bin/python2.7
   2           /usr/bin/python3.6

Enter to keep the current selection[+], or type selection number: 2
admin:~/environment $
admin:~/environment $
admin:~/environment $ pip --version
pip 9.0.3 from /usr/lib/python3.6/dist-packages (python 3.6)
admin:~/environment $
admin:~/environment $
admin:~/environment $ pip install boto3
Requirement already satisfied: boto3 in /usr/local/lib/python3.6/site-packages
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.6/site-packages (from boto3)
Requirement already satisfied: botocore<1.14.0,>=1.13.41 in /usr/local/lib/python3.6/site-packages (from boto3)
Requirement already satisfied: s3transfer<0.3.0,>=0.2.0 in /usr/local/lib/python3.6/site-packages (from boto3)
Requirement already satisfied: python-dateutil<2.8.1,>=2.1; python_version >= "2.7" in /usr/local/lib/python3.6/site-packages (from botocore<1.14.0,>=1.13.41->boto3)
Requirement already satisfied: urllib3<1.26,>=1.20; python_version >= "3.4" in /usr/local/lib/python3.6/site-packages (from botocore<1.14.0,>=1.13.41->boto3)
Requirement already satisfied: docutils<0.16,>=0.10 in /usr/local/lib/python3.6/site-packages (from botocore<1.14.0,>=1.13.41->boto3)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.6/site-packages (from python-dateutil<2.8.1,>=2.1; python_version >= "2.7"->botocore<1.14.0,>=1.13.41->boto3)
You are using pip version 9.0.3, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
admin:~/environment $
admin:~/environment $
admin:~/environment $ sudo pip install requests_aws4auth
Collecting requests_aws4auth
  Downloading <https://files.pythonhosted.org/packages/a4/4e/29ac240bff356eb630bcc729da7cdce6f55f21c442af325ca60dd27d4a3f/requests_aws4auth-0.9-py2.py3-none-any.whl> (54kB)
    100% |████████████████████████████████| 61kB 2.9MB/s
Requirement already satisfied: requests in /usr/local/lib/python3.6/site-packages (from requests_aws4auth)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/site-packages (from requests->requests_aws4auth)
Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.6/site-packages (from requests->requests_aws4auth)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/site-packages (from requests->requests_aws4auth)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/site-packages (from requests->requests_aws4auth)
Installing collected packages: requests-aws4auth
Successfully installed requests-aws4auth-0.9
You are using pip version 9.0.3, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
admin:~/environment $
admin:~/environment $
admin:~/environment $ pip install --upgrade pip
Collecting pip
  Downloading <https://files.pythonhosted.org/packages/54/0c/d01aa759fdc501a58f431eb594a17495f15b88da142ce14b5845662c13f3/pip-20.0.2-py2.py3-none-any.whl> (1.4MB)
    100% |████████████████████████████████| 1.4MB 897kB/s
Installing collected packages: pip
  Found existing installation: pip 9.0.3
    Uninstalling pip-9.0.3:
Exception:
Traceback (most recent call last):
  File "/usr/lib64/python3.6/shutil.py", line 550, in move
    os.rename(src, real_dst)
PermissionError: [Errno 13] Permission denied: '/usr/bin/pip' -> '/tmp/pip-9yqqvahx-uninstall/usr/bin/pip'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/dist-packages/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/usr/lib/python3.6/dist-packages/pip/commands/install.py", line 342, in run
    prefix=options.prefix_path,
  File "/usr/lib/python3.6/dist-packages/pip/req/req_set.py", line 778, in install
    requirement.uninstall(auto_confirm=True)
  File "/usr/lib/python3.6/dist-packages/pip/req/req_install.py", line 754, in uninstall
    paths_to_remove.remove(auto_confirm)
  File "/usr/lib/python3.6/dist-packages/pip/req/req_uninstall.py", line 115, in remove
    renames(path, new_path)
  File "/usr/lib/python3.6/dist-packages/pip/utils/__init__.py", line 267, in renames
    shutil.move(old, new)
  File "/usr/lib64/python3.6/shutil.py", line 555, in move
    os.unlink(src)
PermissionError: [Errno 13] Permission denied: '/usr/bin/pip'
You are using pip version 9.0.3, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
admin:~/environment $
admin:~/environment $
admin:~/environment $ sudo pip install --upgrade pip
Cache entry deserialization failed, entry ignored
Collecting pip
  Cache entry deserialization failed, entry ignored
  Downloading <https://files.pythonhosted.org/packages/54/0c/d01aa759fdc501a58f431eb594a17495f15b88da142ce14b5845662c13f3/pip-20.0.2-py2.py3-none-any.whl> (1.4MB)
    100% |████████████████████████████████| 1.4MB 898kB/s
Installing collected packages: pip
  Found existing installation: pip 9.0.3
    Uninstalling pip-9.0.3:
      Successfully uninstalled pip-9.0.3
Successfully installed pip-20.0.2
admin:~/environment $
admin:~/environment $
admin:~/environment $
admin:~/environment $
admin:~/environment $ pip install boto3
bash: /usr/bin/pip: No such file or directory
admin:~/environment $
admin:~/environment $ sudo pip install boto3
sudo: pip: command not found
admin:~/environment $
admin:~/environment $
admin:~/environment $ which pip
/usr/local/bin/pip
admin:~/environment $
admin:~/environment $ pip install boto3
bash: /usr/bin/pip: No such file or directory
admin:~/environment $
admin:~/environment $ /usr/local/bin/pip install boto3
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: boto3 in /usr/local/lib/python3.6/site-packages (1.10.41)
Requirement already satisfied: botocore<1.14.0,>=1.13.41 in /usr/local/lib/python3.6/site-packages (from boto3) (1.13.41)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.6/site-packages (from boto3) (0.9.4)
Requirement already satisfied: s3transfer<0.3.0,>=0.2.0 in /usr/local/lib/python3.6/site-packages (from boto3) (0.2.1)
Requirement already satisfied: urllib3<1.26,>=1.20; python_version >= "3.4" in /usr/local/lib/python3.6/site-packages (from botocore<1.14.0,>=1.13.41->boto3) (1.25.7)
Requirement already satisfied: python-dateutil<2.8.1,>=2.1; python_version >= "2.7" in /usr/local/lib/python3.6/site-packages (from botocore<1.14.0,>=1.13.41->boto3) (2.8.0)
Requirement already satisfied: docutils<0.16,>=0.10 in /usr/local/lib/python3.6/site-packages (from botocore<1.14.0,>=1.13.41->boto3) (0.15.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.6/site-packages (from python-dateutil<2.8.1,>=2.1; python_version >= "2.7"->botocore<1.14.0,>=1.13.41->boto3) (1.14.0)
admin:~/environment $
admin:~/environment $
admin:~/environment $ /usr/local/bin/pip install requests
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: requests in /usr/local/lib/python3.6/site-packages (2.22.0)
Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.6/site-packages (from requests) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/site-packages (from requests) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/site-packages (from requests) (1.25.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/site-packages (from requests) (2019.11.28)
admin:~/environment $
admin:~/environment $
admin:~/environment $ /usr/local/bin/pip isntall requests_aws4auth
ERROR: unknown command "isntall" - maybe you meant "install"
admin:~/environment $
admin:~/environment $
admin:~/environment $ /usr/local/bin/pip install requests_aws4auth
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: requests_aws4auth in /usr/local/lib/python3.6/site-packages (0.9)
Requirement already satisfied: requests in /usr/local/lib/python3.6/site-packages (from requests_aws4auth) (2.22.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/site-packages (from requests->requests_aws4auth) (2019.11.28)
Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.6/site-packages (from requests->requests_aws4auth) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/site-packages (from requests->requests_aws4auth) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/site-packages (from requests->requests_aws4auth) (1.25.7)
admin:~/environment $
admin:~/environment $
admin:~/environment $ echo $PATH
/home/ec2-user/.nvm/versions/node/v10.19.0/bin:/home/ec2-user/.rvm/gems/ruby-2.6.3/bin:/home/ec2-user/.rvm/gems/ruby-2.6.3@global/bin:/home/ec2-user/.rvm/rubies/ruby-2.6.3/bin:/home/ec2-user/.rvm/gems/ruby-2.6.3/bin:/home/ec2-user/.rvm/gems/ruby-2.6.3@global/bin:/home/ec2-user/.rvm/rubies/ruby-2.6.3/bin:/usr/local/bin:/bin:/usr/bin:/home/ec2-user/.local/bin:/home/ec2-user/bin:/home/ec2-user/.rvm/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/aws/bin:/home/ec2-user/.local/bin:/home/ec2-user/bin:/home/ec2-user/.rvm/bin:/home/ec2-user/.local/bin:/home/ec2-user/bin
admin:~/environment $