Resumable high-volume API ingestion for an ad-tech platform
An advertising-technology platform needed a large, paginated external API pulled into SQL on a schedule — tens of thousands of records, deeply nested JSON, a strict rate limit, and a hard rule that a failure halfway through must never corrupt the dataset or restart the whole sync.
The engagement
One long-running job hits the platform's execution ceiling and the API's rate limit at the same time. Durable Functions were heavier than the team wanted to operate. And the source's nested arrays do not map cleanly to relational tables — a naive load either drops data or produces something nobody can query.
I built the sync as chained HTTP-triggered Functions: a timer starts the first run, each run processes a bounded page range and then invokes the next with a resumption token, tracking the last successful record id. Rate limits are absorbed with adaptive backoff and jitter rather than a fixed sleep. Records land through bulk SQL upserts into a normalized schema, with mappers flattening the nested arrays into columns you can actually query.
The sync runs unattended, resumes cleanly from any interruption, and stays inside both the platform's and the API's limits. Because it is incremental and id-tracked, a mid-run failure costs one page — not the whole job — and the normalized output is something the analysts can build on.
Practice areas applied
- Cloud & Platform Architecture. Scale confidently with Azure landing zones, modular APIs, and reference architectures that eliminate brittle infrastructure and rework.