Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This document provides instructions on shows you how to leverage use the Google Cloud Speech-to-Text transform plugin , to convert audio files to into text files.

...

Before you begin

  1. Ensure that you have enabled the Speech-to-Text API.

  2. In this article, we will be using Build a pipeline that reads data from Google Cloud Storage source to pull data into the pipeline, so upload a speech file to a GCS bucket. Below is a sample file named hello.wav that you can use.

  3. Use the left navigation bar to enter the Studio view.

    Image Removed

  4. From the list of plugins available on the left side, select Google Cloud Storage from the Source section, . Upload the following speech file (hello.wav) to a Cloud Storage bucket.

View file
namehello.wav

Instructions

  1. In the left navigation panel, navigate to Studio.

    Image Added
  2. In the left panel, under the Source section,select a source.In this example, Google Cloud Storage.

  3. Under the Transform section, select Google Cloud Speech-to-Text from .

  4. Under the Transform section, and Sink section, select a sink. In this example, Google Cloud Storage from the Sink section. Note that you can use another source and sink, depending on where you want to get your audio data from and where you want to send your audio data to.

  5. Connect the three plugins on the canvas, from source to transform to sink:

    Image Removed

  6. For the Google Cloud Storage source, configure the GCS Path and make sure that the Format is ‘blob’:

    Image Removed

  7. For the transform, I’ve set specified the sampling rate to be 16000, and set the ‘parts’ and ‘text’ fields. Click “Get Schema” and then “Apply” .

  8. On the canvas, connect the three items.

    Image Added

  9. Hover over the source, Google Cloud Storage. Click on the Properties button that appears.

  10. In the Google Cloud Storage Properties window, set the Path to your Cloud Storage bucket path, and make sure the Format is “blob”.

    Image Added

  11. Click the X button at the top right to save your changes.

  12. Hover over the transform, Google Cloud Speech-to-Text. Click on the Properties button that appears.

  13. Specify the Sampling Rate, Parts, and Text fields.

    Image Added

  14. Click Get Schema, and then Apply, to automatically apply the output schema.

    Image Removed

  15. Configure the sink with the path of where you want the output data to go.

  16. Name the Click the X button at the top right to save your changes.

  17. Hover over the sink, Google Cloud Storage. Click on the Properties button that appears. Set the Path to an output bucket.

  18. Name your pipeline and click Deploy: .

    Image RemovedImage Added
  19. Click on Run to run the pipeline. It will take takes a few minutes to complete:

    Image Removed

    for the pipeline to run.

    Image Added
  20. Once the pipeline succeeds, you can view your transcribed text data in GCS, Cloud Storage or whichever sink you configure!

View file
namehello.wav

 

...

  1. chose. 

Filter by label (Content by label)
showLabelsfalse
max5
spacescom.atlassian.confluence.content.render.xhtml.model.resource.identifiers.SpaceResourceIdentifier@957
showSpacefalse
sortmodified
typepage
reversetrue
labelskb-how-to-article
cqllabel = "kb-how-to-article" and type = "page" and space = "KB"

...