Flat-file JSONL delivery: How does this work?
How does this work?
We create AWS bucket and push scripts with cron agents running. First, we fill a lot of initial data - then just put updates.
The files are divided by dates/timestamps in filenames. For every table the user subscribed to, we shall create a script (personnel_feed.py, company_feed.py).
Frequency of updates or initial size vary with current load on servers/replication lags.
What steps are required from the user’s side?
We would require User ARN (it looks something like arn:aws:iam::000000000000:root.) which will be added and permissions will be set.
What is the size of the initial export? Size of update files?
The complete dataset is about 1.5TB compressed, but the size varies depending on what all data feeds are included and the geographies it covers.