Python Engineer to Architect High-Volume Data Pipeline (Social Engagement Data)
We are a data agency looking to replace an expensive legacy vendor with an in-house solution.
We need a Senior Python Developer to build a high-efficiency data pipeline that aggregates public engagement data (Likes/Comments) from professional social networks.
The Goal:
Build a "Glass Box" scraper that runs on our cloud infrastructure. We want full ownership of the code and direct billing for the underlying resources (Proxies/APIs).
The Specs (Must Have):
- Volume: Capability to process 200,000 - 300,000 lookups per week.
- Inputs: We provide post URLs or Keywords.
- Outputs: CSV/JSON with User Name, Headline, and Profile URL.
Cost Constraint: The system must operate (infrastructure wise) for under $1,200/month at full volume.
The Architecture:
We believe the best approach is a Python script leveraging enterprise APIs to handle the heavy lifting (e.g., Apify, Scrapingdog, or Bright Data). We do not want a Selenium bot running on a laptop. We want a cloud-deployed script (AWS Lambda/DigitalOcean) that manages rotation and rate limits via these APIs.
Requirements:
Deep experience with Apify Actors or Scrapingdog.
Experience with Residential Proxies (configuring bandwidth to minimize waste).
Ability to parse large JSON datasets efficiently.
Ownership: You build it, we own the code.
To Apply:
Please tell me which API or Proxy provider you would recommend to hit a volume of 300k/week while keeping ongoing tech costs under $1,200/month.
Apply tot his job
Apply To this Job