Document AI Batch Source
Introduction
Document AI plugin will allow users to use Document AI processors to process invoice, parse form, extract key value pair and more. User could also use this plugin to make predictions on AutoML custom models that exposed as Document AI processors.
NOTE:Â These plugins will incur additional cost.
https://cloud.google.com/document-ai/docs
Use case(s)
- As a user, I would like to parse my invoices, form/key-value-pair documents in PDF format to extract entities, with Data Fusion pipelines that orchestrate the end to end journey, from a data source (GCS) to a data sink (BigQuery).
User Storie(s)
- As a data pipeline developer, I should be able toÂ
Plugin Type
- Batch Source
- Batch SinkÂ
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Configuration
Invoice API
https://cloud.google.com/document-understanding/alpha/docs/quickstart-invoice
User Facing Name | Type | Description | Default value | Notes |
---|---|---|---|---|
Table Parsing API
https://cloud.google.com/document-ai/docs/process-tables
User Facing Name | Type | Description | Default value | Notes |
---|---|---|---|---|
Form Parsing or KV API
https://cloud.google.com/document-ai/docs/process-forms
User Facing Name | Type | Description | Default value | Notes |
---|---|---|---|---|
Design / Implementation Tips
Design - To be filled in later
Approach(s)
Properties
Security
Limitation(s)
Future Work
Test Case(s) - To be filled in later
- Test case #1
- Test case #2
Sample Pipeline
Please attach one or more sample pipeline(s) and associated data.Â
Pipeline #1
Pipeline #2
References
- Documentation Links go here