
Ateet Gupta
Data Engineering, Pyspark, Azure, ETL Pipeline, SQL, Python
Habilidades

Revisa mis servicios

Porfolio
Experiencia laboral
Manager
Capgemini • Tiempo completo
Jul 2024 - Jan 2025 • 6 mos
• Conducted data transformations and aggregations using SQL and Spark to derive actionable insights for business stakeholders. • Implemented Delta Lake for efficient data storage, enabling ACID transactions and version control for improved data governance. • Analyzed and optimized data processing jobs to enhance performance and reduce execution times through effective resource management and query optimization techniques. • Attained query performance and data processing efficiency using Spark Optimizations, resulting in a reduction in processing time from approximately ~5 hours to about ~2 hours and leading to an increase in throughput.
Senior Manager - Data Science
American Express • Tiempo completo
Jan 2022 - Apr 2024 • 2 yrs 3 mos
• Lead a team of data engineers/data analyst in developing and maintaining scalable data pipelines for processing and analyzing large datasets reducing data processing time by 25% • Guided and designed the architecture for implementing Pyspark ETL processes to extract data from various sources and load it into cornerstone data warehouse with 100 % consistency. • Accelerated migrating the ETL code from Hive to Pyspark for better optimizations and throughput reducing the time by 30% • Collaborated with the Finance Business Team to automate the financial reconciliation process using PySpark and created an on-demand Power BI dashboard, helping the business team reconcile data between the Cornerstone Database and the IBM TM1 account book for credit card spending.
Senior Data Scientist
Optum • Tiempo completo
Feb 2015 - Nov 2021 • 6 yrs 9 mos
• Collaborate with the Optum Risk team to design and implement solutions for tracking and improving performance on various HEDIS (Healthcare Effectiveness Data and Information Set) measures using Pyspark and Azure Databricks and Azure Data Factory. • Develop scalable data pipelines using PySpark on Azure Databricks, ensuring efficient processing and transformation of large healthcare datasets. • Use Azure Data Factory to schedule and automate data pipelines, ensuring seamless data flow from survey responses and healthcare data to the target systems. • Utilize Python and Natural Language Processing (NLP) techniques to analyze UHG member survey data, extracting valuable insights from free-text responses.