a
ateetjss

Ateet Gupta

@ateetjss

Data Engineering, Pyspark, Azure, ETL Pipeline, SQL, Python

India
Inglés
Parte de la información aparece en idioma inglés.
Sobre mí
I am a qualified data engineer who bring 12 years of expertise into Data Engineering side which includes design and implement ETL (Extract, Transform, Load) processes to move data between systems. Moving the On Prem SQL Databases into Cloud Delta formats. Designing the robust ETL pipeline through Azure Data Factory also creating a Technical Design Document for transforming requirements into the ground reality within the Cloud infrastructure.... Lee más

Habilidades

a
ateetjss
Ateet Gupta
desconectado • 

Revisa mis servicios

Consultoría de ingeniería de datos
I will integrate all your data and engineer your data pipelines

Porfolio

Experiencia laboral

Capgemini

Manager

Capgemini • Tiempo completo

Jul 2024 - Jan 20256 mos

• Conducted data transformations and aggregations using SQL and Spark to derive actionable insights for business stakeholders. • Implemented Delta Lake for efficient data storage, enabling ACID transactions and version control for improved data governance. • Analyzed and optimized data processing jobs to enhance performance and reduce execution times through effective resource management and query optimization techniques. • Attained query performance and data processing efficiency using Spark Optimizations, resulting in a reduction in processing time from approximately ~5 hours to about ~2 hours and leading to an increase in throughput.

American_Express

Senior Manager - Data Science

American Express • Tiempo completo

Jan 2022 - Apr 20242 yrs 3 mos

• Lead a team of data engineers/data analyst in developing and maintaining scalable data pipelines for processing and analyzing large datasets reducing data processing time by 25% • Guided and designed the architecture for implementing Pyspark ETL processes to extract data from various sources and load it into cornerstone data warehouse with 100 % consistency. • Accelerated migrating the ETL code from Hive to Pyspark for better optimizations and throughput reducing the time by 30% • Collaborated with the Finance Business Team to automate the financial reconciliation process using PySpark and created an on-demand Power BI dashboard, helping the business team reconcile data between the Cornerstone Database and the IBM TM1 account book for credit card spending.

Senior Data Scientist

Optum • Tiempo completo

Feb 2015 - Nov 20216 yrs 9 mos

• Collaborate with the Optum Risk team to design and implement solutions for tracking and improving performance on various HEDIS (Healthcare Effectiveness Data and Information Set) measures using Pyspark and Azure Databricks and Azure Data Factory. • Develop scalable data pipelines using PySpark on Azure Databricks, ensuring efficient processing and transformation of large healthcare datasets. • Use Azure Data Factory to schedule and automate data pipelines, ensuring seamless data flow from survey responses and healthcare data to the target systems. • Utilize Python and Natural Language Processing (NLP) techniques to analyze UHG member survey data, extracting valuable insights from free-text responses.