What is Document Processing in AppSheet?
Document processing is a feature that allows you to automatically extract useful content from document files that are stored as PDF, GIF, PNG, JPEG or TIFF files. This is built upon Google’s Document AI solution and supported file formats and document types will grow with them.
What documents are supported?
Currently, AppSheet can support content extraction from invoices, receipts and W9 forms. This means that files in a directory that are images or pdfs of invoices, receipts, or W9s can be automatically processed and the meaningful content extracted. Other documents or files in the folder will not have content extracted.
What is the life-cycle of documents in AppSheet?
The new document processing feature in AppSheet will apply machine intelligence to the files in your specified folder on Drive to extract the content. The outcome of this extraction can have several different statuses:
EXTRACTION_ERROR: this is the state if the content was not extractable - usually from unsupported file formats or from files that don’t contain relevant content.
POOR_QUALITY: this is the state if the content was extractable, but the extraction model doesn’t have strong confidence on all the information retrieved, or if some critical fields are missing (for example the Total Amount on an invoice).
QUOTA_EXCEEDED: this is the state if you have used up your document quota for the current cycle. These will automatically be re-extracted when quota is available.
SUCCESS: this is the state if the extraction was successful and the confidence of the majority of the fields is high (an empty state is also considered SUCCESS).
Only documents with a successful status will be automatically available in your application table. All other states will be logged to your applications audit logs. In the audit logs, any extraction issues will be visible as a log entry (one per document). For example a log entry like
and when you click on the details, you will see the reference to the source file and the row in a sheet where the extracted data is stored, as well as details about the issue.
In order to move the document content into your application table, you should review the data and fix any issues. The url under the Source File heading should take you straight to a view of the original document. The url under the Data location heading should take you to a highlighted row for that document. Once you are satisfied the data in the row is correct, set the IsEdited column of that row to TRUE. Within (up to) 15 minutes, your document data will be available in your application. Note: Please do not edit the StatusCode column as it may prevent events from firing in your application.
Additionally, if you have a business plan, you can set up email alerting on these document events that need your attention.
Why is there a separate currency code column?
For documents that have some type of monetary column, the value of the money columns and the currency code are separated out. The main reason for this is to provide more flexibility for the extracted data. For example, supporting a centralized repository of documents that may span multiple currencies, to allow for customizing downstream processes. There is only ever considered a single currency for a single document and if there appears to be inconsistencies in the extraction, they are flagged.
Where can I store my files for Document processing?
Currently, AppSheet only supports processing documents found on Google Drive. We will be working hard to expand the supported file storage providers available for document processing and will announce these as they become available.
What does ‘collection of files’ mean?
In addition to document content extraction, AppSheet now supports exposing your folder contents as a table in your application itself. This feature will allow app creators to expose files directly in their app, and use file metadata in their application logic (e.g. filter by names, dates modified, etc).