Roles and Responsibility:
1. Assist with onboarding a large catalog of datasets
2. Acquire, transform, and deploy production-level scalable ETL/data pipelines for optimized utilization
3. Troubleshoot data issues and perform root cause analysis to proactively resolve operational issues
4. Identify and implement process enhancements by automating manual processes to optimize data delivery
5. Monitors all projects, responsible for quality checks
Mandatory Skills:
1. Experience with Python language using PySpark/Spark to perform data frame operations
2. Database experience to write queries on large data sets using SQL
3. Analytical thinker with strong attention to details, good verbal, and written communication skills
4. Experience with creating end to end ETL pipelines