s
soumya_saxena_

Soumya S

@soumya_saxena_

Lead Data Engineer

India
Inglés, Hindi
Parte de la información aparece en idioma inglés.
Sobre mí
As a Senior Data Engineer at NAB, I leverage my expertise in cloud computing and data engineering to design, develop, and maintain scalable data pipelines. I optimize complex queries and dashboards using AWS cloud, Pyspark, SQL, and Python, ensuring timely and accurate data delivery.... Lee más

Habilidades

s
soumya_saxena_
Soumya S
desconectado • 

Revisa mis servicios

Almacén de datos
I will build etl data pipelines using python, pyspark, sql and AWS

Porfolio

Experiencia laboral

McKinsey_& Company

Lead Data Engineer (Lancesoft)

McKinsey & Company • Tiempo completo

May 2025 - Present1 yr

Designed and deployed a low-latency real-time ingestion pipeline (AWS API Gateway → Kinesis Firehose → S3 → Snowflake) for multi-source event data (Zoom, Cvent, Salesforce, Mulesoft, Splashthat), enabling a unified Event360 platform tracking 500+ global events annually. Engineered data normalization, deduplication, and cleansing workflows, improving accuracy, reducing duplicates by 30%, and ensuring seamless downstream analytics. Developed a dynamic event capture mechanism to ingest 100% of incoming event streams, increasing coverage by 25%. Resolved data quality issues by analyzing upstream sources, correcting pipeline logic, and implementing automated validation and reconciliation in Snowflake. Migrated Snowflake authentication for Spark connectors in Glue from legacy username/password to secure OAuth, enhancing compliance and security. Enhanced Clientlink datasets with new business-critical columns, updating Glue scripts, Athena schemas, and Snowflake procedures/tables, improving reporting and decision-making.

Senior Data Engineer

NAB • Tiempo completo

Mar 2023 - May 20252 yrs 2 mos

- Engineered and maintained scalable real-time and batch data pipelines using AWS Glue, Spark, Airflow and Kinesis, contributing to customer behavior analytics and personalization initiatives. - Engineered an end-to-end data pipeline to collect, store, and process realtime service outage data from CloudWatch and IT teams, utilized AWS Glue for ETL transformation, executed Athena queries for actionable insights, and automated monthly reporting, reducing manual tasks by 50% and enabling faster decision-making. - Designed distributed data workflows for identity resolution and fraud detection by integrating relational and NoSQL databases, enhancing customer targeting and lifetime value strategies. - Revamped ETL processes and optimized SQL queries, reducing execution time by 15%.

Amazon

Data Engineer

Amazon • Tiempo completo

Feb 2020 - Feb 20233 yrs

Implemented AWS-based data solutions (Glue, Athena, Redshift, S3), reducing infrastructure costs by 20% using Parquet format and optimized query strategies. Optimized Redshift data models by using sort & distribution keys, reducing query execution time by 30% and improving storage efficiency for analytical workloads. Developed the Dwell Time Metric by leveraging clickstream data and automated ETL workflows, enabling deeper insights into customer engagement. Supported new marketplace launches by integrating and validating large datasets via Python and SQL ETL pipelines, ensuring accurate and timely data delivery. Improved system scalability and cost-efficiency by implementing serverless architectures, auto-scaling, and right-sizing AWS resources.