Summary
Overview
Work History
Education
Skills
Tutoring Experience
Timeline
Generic

Kalyani Vemasani

Baltimore

Summary

Data Engineer with 3 years of experience building and optimizing large-scale data pipelines and models for analytics. Skilled in Python, SQL, PySpark and AWS (S3, EMR, Glue, Step Functions) as well as Hive and Snowflake for ETL/ELT and data warehousing. Improved pipeline performance by up to 35% through automation and orchestration workflows, and ensured data quality and reliability in production. Experienced collaborating with product managers and analysts to translate business needs into data-driven insights.

Overview

5
5
years of professional experience

Work History

Data Engineer

Nuix
06.2025 - Current
  • Designed and deployed a scalable AWS-based data pipeline integrating S3, Snowflake, and web APIs, processing over 2 TB of data daily using PySpark, SQL transformations, and EMR.
  • Developed 7+ PySpark ETL pipelines for extraction, transformation, and loading into Snowflake and S3, incorporating Python scripts for data validation.
  • Designed a master PySpark job to consolidate intermediate datasets into curated outputs in parquet format.
  • Automated orchestration and error handling with AWS Step Functions for cluster provisioning, execution and recovery.
  • Tuned performance using EMR Auto Scaling and optimized transformations, reducing runtime by ~25%.
  • Monitored pipeline health via CloudWatch and collaborated with product managers and analysts for reliable data delivery and requirements gathering.

Big Data Production Support

Accenture solutions
10.2021 - 10.2023
  • Supported production data warehouse workflows in Hive, Snowflake, and SQL databases, ensuring SLA compliance.
  • Troubleshot incidents using PySpark, Python scripts, and HiveQL, reducing resolution time by ~30%.
  • Tuned PySpark jobs on AWS EMR and automated monitoring checks using Airflow to streamline ETL/ELT processes.
  • Leveraged AWS S3 & Glue for storing large datasets, metadata exploration, and debugging ETL issues.
  • Designed ad-hoc Hive reports for business users and monitored systems with Splunk & AppDynamics.

Associate System Engineer

TATA Consultancy Services
03.2021 - 07.2021
  • Developed PySpark ETL jobs for data cleansing, transformations, and aggregations on AWS S3 and HDFS.
  • Automated ingestion from FTP to Hive tables using Sqoop, improving efficiency by ~35%.
  • Built batch processing pipelines using NiFi, Kafka, and Sqoop to ingest structured and semi-structured data into Hadoop.
  • Managed Hive ETL workflows for validation, data quality checks, and reporting support.
  • Monitored and troubleshot applications using Splunk dashboards and alerts.

Education

Master of science - Information Systems

University of Maryland, Baltimore County
12.2025

Skills

  • JAVA
  • SQL
  • Python
  • AWS
  • Snowflake
  • Hadoop
  • IntelliJ
  • Jupyter Notebook
  • PySpark
  • Airflow
  • Kafka
  • NiFi
  • ETL/ELT
  • Git
  • GitHub
  • JIRA
  • Agile/Scrum
  • Windows
  • Linux
  • Splunk
  • Grafana
  • AppDynamics
  • Kibana
  • Dynatrace

Tutoring Experience

Math Tutor — Baltimore County Public Schools | Sep 2024 – Dec 2025

  • Coached students in mathematics, focusing on foundational skills, problem-solving, and confidence-building to improve performance in core subjects and standardized assessments.
  • Collaborated with educators to align coaching strategies with curriculum goals and state standards.
  • Tracked student progress using data-driven methods and adapted teaching techniques to meet diverse learning needs.
  • Contributed to school-wide initiatives promoting academic growth and equity in math education.

Timeline

Data Engineer

Nuix
06.2025 - Current

Big Data Production Support

Accenture solutions
10.2021 - 10.2023

Associate System Engineer

TATA Consultancy Services
03.2021 - 07.2021

Master of science - Information Systems

University of Maryland, Baltimore County
Kalyani Vemasani