Make sure to read and understand the core concepts affecting app performance. If you can focus on improving Sync performance, it will have the greatest positive impact on the end-users of the app. In this article, we'll focus on "read-only" Syncs as this affects all apps. There are additional approaches to improve the speed of Syncs that involve data updates and these are addressed here.
There are six ways to improve Sync performance -- each is described in its own sub-section.
- Choose an efficient data provider
- Reduce the amount of data transferred during Sync
- Reduce the amount of computation (for spreadsheet formulas and virtual columns) performed during Sync
- Fetch or compute data in parallel (relevant if there are many tables in the app or many expensive virtual columns)
- Use a fast network (this is relevant for the connection between the device and the AppSheet Server)
- Make Sync happen in the background so that the user is not waiting for a Sync to occur
The AppSheet platform provides tools (like the Performance Profiler) for you to understand, analyze and improve app performance.
1. Choose an efficient data provider
AppSheet supports many different data providers. All the data providers vary in terms of the initial latency in responding to a request. Here are some general rules of thumb:
- Database providers are generally better than file system/spreadsheet providers.
- Among spreadsheet file systems, Google Sheets is the most performant as it has been built from the ground-up to support web-based concurrent data access.
- If using Excel, it is more efficient to use Office365 than a pure file system like Dropbox or Box. With Office365, spreadsheet formulas can be run in the cloud by Office365, whereas with Dropbox and Box, AppSheet has to download the entire Excel file and evaluate formulas locally.
In reality, this may not be a choice you have because your data set may already reside in a specific data provider.
2. Reduce the amount of data transferred
The first step is to remove unnecessary data in the table/worksheet itself.
- Delete empty rows, including empty rows at the bottom of the sheet. If your worksheets contain hundreds or thousands of empty rows, you can greatly improve performance by deleting them.
- Delete empty columns, including empty columns to the right of the sheet. If your worksheets contain empty columns, you can greatly improve performance by deleting them.
A spreadsheet "workbook" (i.e. file) contains one or more worksheets. The next step is to ensure that there isn't unnecessary data in the spreadsheet workbook.
- Only include worksheets that the application requires and remove the rest. Extra worksheets will add overhead when the data provider is fetching the requested data. Any changes to those worksheets also interferes with caching optimizations.
- Place read-only tables in their own workbook. There are greater opportunities to cache read-only tables. When placed in their own workbook, the platform can more accurately track if they have been updated.
Various settings in your AppSheet app can reduce the amount of data transferred. The app may explicitly reduce the amount of data via:
- Security filters --- these filters eliminate rows that do not satisfy the filter criteria. If the data comes from a spreadsheet source, the entire spreadsheet is read by the AppSheet Server, the security filter is applied, and the subset of rows that satisfy the condition are sent on to the app. If the data comes from a database source and the filter is simple (eg: [ColumnX] = USEREMAIL()), then the filter can be "pushed" into the data retrieval from the data source. In other words, the data retrieval from the data source also becomes much more efficient and less data is transferred to the AppSheet provider.
- Partitioning --- this is a mechanism to deal with very large data sets by dividing them into partitions. Each user only ever needs the data in one partition. Naturally, this significantly reduces the amount of data transferred.
The app may also implicitly reduce the amount of data transferred by avoiding redundant data transfer. Data transfer is redundant if the app already has a cached copy of the latest data available (from a previous sync). In order for caching to be effective, the system needs to record when a cached copy was maintained and check if the data source has been updated since then. There are two optional app settings that control this behavior:
- Server caching --- the AppSheet Server can maintain cached copies of read-only tables for upto half an hour. If your app depends on reference data that doesn't change often, be sure to mark the table as Read-Only and enable server caching.
- Delta sync --- when this option is enabled, the app maintains timestamps for the last time each table was fetched. On each sync, the AppSheet server tries to determine if the table has been updated after that timestamp. Only then is the table data retrieved from the cloud data source. This mechanism has the potential to significantly speed up sync, but it also has the potential to cause data errors. AppSheet is only able to check Delta Sync with cloud providers that use spreadsheet files. It checks the LastModifiedTime property of the file to determine if it has been updated (updates could happen directly to the spreadsheet). In cloud file systems like Google Drive, this timestamp is not perfectly accurate. Further, if the sheet involves formulas fetching data from external sources, the file itself may never reflect a modified timestamp although the computed data in it might change. As a result, this setting requires caution when used.
3. Reduce the amount of computation
Computations happen during sync at two places: (a) in spreadsheet formulas and (b) in AppSheet virtual columns. Both can add significant overhead to Sync times.
Spreadsheets use formulas to compute data. Some formulas can be very expensive and long-running. Such formulas can cause significant delays and in extreme cases even timeouts when using AppSheet.
- Ensure that the spreadsheet only uses simple/cheap formulas.
- Cross-sheet formulas and those using external services (like Google Finance) are likely to severely impact sync performance.
- It is particularly important to minimize worksheet formulas when the data source is Excel on Dropbox or Box, because these formulas must be re-computed by the AppSheet Server each time the Excel worksheet is read.
Virtual columns in a table are defined by App Formulas and are computed by the AppSheet Server on every sync. App Formulas are very powerful and expressive, but it is possible to write very inefficient App Formulas. For example, in an Order Capture app with three tables (Products, Customers and Orders), let us assume there is a virtual column defined on the Order table to find the sum of all prior Orders made by the same Customer for the same Product. Depending on the amount of data in the app, this will probably be very expensive to compute against the entire data set. The Performance Profiler will indicate if virtual columns are contributing significantly to sync time.
4. Fetch or compute data in parallel
So far, we have ignored the fact that most apps have multiple tables. Let's say your app has five tables. Are they fetched from the data source one after the other or are they fetched in parallel? Ideally, this would be as fully parallel as possible, but doing this is resource-intensive for the AppSheet server (each parallel request requires a concurrent "thread" of computation in the server and there is a limit on the total number of these that can run at the same time). Consequently, we control the degree of parallelism based on the subscription plan of the app owner --- the more expensive plans are given a greater degree of parallelism.
A similar approach is applied to the computation of virtual column values as well.
Increasing parallelism doesn't necessarily always result in faster sync. For example, if you have three tables, two of which take just a second to fetch and the third takes a whole minute to fetch, there will be no perceptible difference when increasing the degree of parallelism. The table that takes a whole minute to fetch dominates all other activities.
You can try out the effect of higher-degrees of parallelism in the app emulator hosted inside the app editor. Note that the effects vary depending on the amount of data in the tables as well. So while the effects may be minimal when developing your app with initial datasets, they may be more pronounced as the app is used and the data sets grow.
5. Use a faster network
Most people intuitively believe that a "faster" network should lead to a faster Sync. The same intuition also leads people to believe that the biggest bottleneck in sync is the speed of the data network (2G vs LTE for example) used by the mobile devices. In most situations however, this is an oversimplification.
In our observation,
- Cloud provider latency is usually the biggest contributor to slow syncs. The actual data transfer between cloud provider and AppSheet server is pretty efficient (the internet has high throughput).
- Network latency between the device and the AppSheet server can also add significant delay. We observe this typically with customers who are geographically remote from the AppSheet server cloud.
- The actual data transfer from the AppSheet Server to the mobile device is usually very efficient (i.e. even though mobile networks may not have great throughput, it is more than sufficient for the data volumes being transferred). The fact that AppSheet compresses the data before transmission helps with this.
This is a very important observation and it is because AppSheet apps typically have relatively small data volumes. Even if the underlying data set is very large, each app user typically only needs a small subset of the data and this is achieved via mechanisms like Security Filters and Data Partitioning.
It is difficult for one of our customers to do much to lower latency. The AppSheet servers are located in the Microsoft Azure hosted cloud in the US. A database server that is geographically co-located with the AppSheet servers will have the lowest latency. In 2018, we expect to start using servers that are hosted in other geographic regions as well --- this should help reduce latencies of device-to-AppSheet-Server communication as well as of AppSheet-Server-to-Cloud-Provider communication.
6. Make sync happen in the background
All our discussion so far has been to reduce the actual latency of Sync. However, since all we really care about is latency _perceived_ by the end-user, a very sensible additional step is to minimize the impact of sync latency on the end-user by delaying the sync and/or applying it in the background.
There is a tradeoff in making this choice. The app is perceived to run faster. However, the data may be stale. You need to pick a suitable balance for your app, and utilize various Sync options to tell AppSheet how you want the system to behave.
- With all sync options disabled, the default behavior of the app is to sync immediately if a row is added/updated/deleted by the app. Otherwise, the app will only sync if the user explicitly hits the Sync button. This is the only behavior supported by the Standard plan.
- If Sync-On-Start is enabled, the app forcibly syncs every time it is restarted. This is a good mechanism to limit the staleness of the data.
- If Delayed Sync is enabled, then updates made by the app are not immediately synced. Instead, they remain queued up on the device. They will only be synced later. For apps that are used in remote regions (eg: wilderness surveys), this is a good option. The changes get queued up on the device. When the user comes back to the office or some place with good network access, they can explicitly hit Sync and this pushes all the changes to the backend. Note that the longer that syncs are delayed, the greater the possibility for data conflicts because someone else may have updated the same data.
- In addition to Delayed Sync, the app may also enable the option for Automatic Background Sync. This tries to hide latency without data staleness and is appropriate for environments where there is good network connectivity. Any updates made by the app immediately kick off a sync in the background, while the user can continue to use the app. If there are no updates made by the app, the system will still periodically run a sync to ensure that it has the latest data. This option is fairly cavalier with the use of battery, network, and server resources. It will probably run many more syncs than are actually necessary, and this will need to be weighed against the benefit of having an app that is always close-to-uptodate without users waiting for sync