Senior PySpark Data Engineer - Assistant Vice President
Confirmed live in the last 24 hours
Citigroup
Job Description
About the Role
We are seeking a highly skilled and experienced Senior PySpark Data Engineer to join our dynamic data engineering team. The ideal candidate will have a strong background in building and managing large-scale data processing systems and a proven track record of working with cutting-edge Big Data technologies. You will be responsible for designing, developing, and maintaining our data pipelines, ensuring they are efficient, reliable, and scalable to meet our growing business needs.
Key Responsibilities
- Design, develop, and maintain robust, scalable, and high-performance data pipelines using PySpark.
- Develop, schedule, and monitor complex data workflows using orchestration tools like Apache Airflow.
- Collaborate with data scientists, analysts, and business stakeholders to understand data requirements and deliver high-quality data solutions.
- Optimize and tune Spark jobs for performance and efficiency.
- Implement data quality checks and ensure data integrity across all data pipelines.
- Design and implement data models for optimal storage and retrieval.
- Mentor junior data engineers and promote best practices in data engineering.
- Ensure compliance with data governance and security policies.
- Troubleshoot and resolve data-related issues in a timely manner.
Required Qualifications
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
- 6+ years of professional experience in a data engineering role.
- Extensive hands-on experience with PySpark and advanced Python programming skills.
- Proven experience with Big Data ecosystems, including Cloudera and/or DataBricks.
- Hands-on experience with distributed query engines like Starburst (Trino/Presto).
- Proficient in designing and managing complex workflows using scheduling tools, particularly Apache Airflow.
- Strong expertise in SQL and experience with relational and non-relational databases.
- Solid understanding of data warehousing concepts, ETL/ELT processes, and data modeling techniques.
- Experience working in a Linux/Unix environment.
- GIT HUB, CI/CD Pipeline
------------------------------------------------------
Job Family Group:
Technology------------------------------------------------------
Job Family:
Applications Development------------------------------------------------------
Time Type:
Full time------------------------------------------------------
Most Relevant Skills
Please see the requirements listed above.------------------------------------------------------
Other Relevant Skills
For complementary skills, please see above and/or contact the recruiter.------------------------------------------------------
Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.
If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.
View Citi’s EEO Policy Statement and the Know Your Rights poster.
Similar Jobs
Travelers
Data Engineer I (ETL, PySpark, SQL)
Citigroup
Pyspark Data Engineer
Travelers
Data Engineer II (AWS, Databricks, PySpark)
Citigroup
Data Engineer - Pyspark
Capital One
Senior Data Engineer (Python, PySpark, Lambda)
GlobalFoundries