j
josemanuel_diaz

Jose D

@josemanuel_diaz

Data Engineer Python SQL Spark AWS GCP Airflow dbt

España
Español, Inglés
Parte de la información aparece en idioma inglés.
Sobre mí
I’m a freelance Data Engineer (ex-Amazon, Slido/Cisco, Santander). I build reliable data pipelines and analytics foundations so dashboards, ML and GenAI run on data you can trust. WHAT I DO • ETL/ELT in Python and SQL (batch + streaming) • Airflow for orchestration, monitoring and SLAs • dbt for clean models, tests and documentation • Spark/PySpark for scale • AWS and GCP (S3/GCS, BigQuery, Dataproc/Dataflow, Glue/Lambda) • Warehouses: BigQuery, Snowflake, Redshift, PostgreSQL • Production standards: testing, CI/CD, Docker, Terraform... Lee más

Habilidades

j
josemanuel_diaz
Jose D
desconectado • 

Revisa mis servicios

Almacén de datos
I will build production ready dbt models with tests for snowflake or bigquery
ETL de datos
I will build apache airflow dags with backfills retries and alerting

Porfolio

Experiencia laboral

Freelancer.com

Freelance Data Engineer

Freelancer.com • Freelance

Jan 2025 - Present1 yr 4 mos

Portfolio: https://jmdu99.github.io/portfolio/ These are some of the projects I’ve delivered: 1) Project: Hybrid Batch and Streaming Pipeline for IoT, Legacy, and PostgreSQL Data Integration with NiFi, Kafka, Spark, Airflow, dbt, and Snowflake Industry: eHealth Client Type: Startup 2) Project: Batch and Streaming Pipelines for LMS, SIS, SaaS, and Log Data into BigQuery with Fivetran, Dataproc (Spark), Dataflow (Beam), and Cloud Composer Industry: EdTech Client Type: Mid-sized company Skills: Python · SQL · Apache NiFi · Apache Kafka · Spark Streaming · Apache Airflow · dbt · Amazon S3 · Snowflake (DWH) · Docker · Terraform · Shell Scripting · Apache Spark · Apache Beam · Google Cloud Storage · BigQuery · Fivetran · Pub/Sub · Dataflow

Santander_

SQL & Python Developer - Models & Data

Santander • Tiempo completo

Jun 2024 - Nov 20245 mos

Customer: BANCO SANTANDER, S.A. Tasks: - Created and maintained SQL processes through the concatenation of functions developed in PL/pgSQL. - Extracted data from SQL tables using Python, utilizing libraries such as psycopg2 and SQLAlchemy. Technologies: Bash · Python · SQL · Automatización de procesos · PL/pgSQL · SQLAlchemy · PostgreSQL

Amazon

Business Intelligence Engineer - EU Supply Chain

Amazon • Tiempo completo

Aug 2023 - Nov 20233 mos

I provided support to the Supply Chain team in developing stochastic optimization models, with a particular focus on the INSO model (Inbound Network S&OP Plan Optimization). Tasks: - Construction of an automated system that compiles and delivers Excel reports to stakeholders via email. This system leverages AWS services (EC2, S3, Lambda, Glue) and BDT enterprise data analytics products such as Hoot and Datanet. - Development of a Quicksight Dashboard to monitor the inputs and outputs of the INSO model. This involved a transition of data calculations from Excel to SQL using Common Table Expressions (CTEs) and the creation of effective visualizations. - Refactorization of the INSO code to enhance efficiency. This encompasses a shift in input/output management from local to AWS S3 or Redshift, utilization of TOML files for script configuration, and implementation of parallel processing with MPire. Additional enhancements include the integration of docstrings, type hinting, code formatting with Black, and linting with Flake8. - Independent study for the AWS Certified Cloud Practitioner certification, aiming to further enhance expertise in cloud computing. Technologies: Amazon Web Services (AWS) · Amazon EC2 · Python · AWS Lambda · ETL · Amazon QuickSight · Amazon S3 · Amazon Redshift · Amazon Athena · SQL · Automatización de procesos · AWS Identity and Access Management (IAM) · Microsoft Excel · AWS Glue