Skip to main content
Some clients have datasets that are too large for the standard SFTP upload workflow — we’re talking tens of gigabytes of EDI files, remittances, or crosswalks that need to be loaded in bulk.
Large data loads should not flow through our team manually. If a client has bulk data sitting in their own cloud environment (Azure, AWS, GCP), the most efficient path is a direct upload to our ingestion bucket — not downloading it to a local machine and re-uploading it.

Why manual handoffs don’t work

When someone on our team has to sit in the middle of a large data transfer:
  • Download time — Pulling tens of gigabytes to a local machine takes hours
  • Re-upload time — Pushing those files back up to our SFTP or cloud storage takes more hours
  • Extraction overhead — Compressed archives need to be unpacked before they can be processed, multiplying the data moved
  • Fragility — Long-running transfers from local machines fail due to network interruptions, sleep/shutdown, or bandwidth limits
  • Wasted effort — An engineer scripting a one-off transfer is not doing engineering work
A 15GB dataset that could be transferred cloud-to-cloud in minutes can take an entire day when routed through a local machine.
1

Get the source location

Ask the client where their data currently lives — Azure Blob, AWS S3, GCP Cloud Storage, or an on-prem server.
2

Provide direct upload credentials

Generate temporary, scoped credentials (e.g., pre-signed URLs or a time-limited IAM role) that give the client write access to the correct ingestion path: /{customer-slug}/{file-category}/.
3

Client uploads directly

The client (or their IT team) transfers files directly from their environment to our ingestion bucket. Cloud-to-cloud transfers avoid the local machine bottleneck entirely.
4

Validate and process

Once files land in the ingestion bucket, the standard processing pipeline takes over — validation, parsing, and entity creation happen automatically.

When direct upload isn’t possible

If the client can’t upload directly (e.g., compliance restrictions, no cloud environment, or limited IT support):
ApproachWhen to use
Cloud-to-cloud transferClient has data in Azure/AWS/GCP — transfer between cloud providers server-side
Dedicated EC2 instanceData needs transformation before upload — spin up an instance in our VPC to avoid local machine bandwidth limits
Chunked SFTP uploadClient can only use SFTP — break the dataset into smaller batches and upload over multiple sessions
The key principle: data should move between servers, not through laptops. Any time a human is downloading and re-uploading gigabytes, there’s a better way.

Planning ahead

When onboarding a new client, ask about data volume early:
  • How large is the initial historical load? If it’s over a few gigabytes, plan for a direct transfer.
  • What format is the data in? Compressed archives need extraction — factor that into the approach.
  • Where does the data live today? Cloud-to-cloud is always fastest.
  • What’s the ongoing volume? If regular uploads will be large, set up a repeatable pipeline rather than a one-off process.