Large Scale Distributed Data Systems Specialist

4 days ago


Singapore beBeeDataEngineer Full time $180,000 - $240,000

As a seasoned data engineering professional, you will play a key role in the design and implementation of large-scale distributed systems.

Job Description

The ideal candidate will possess a strong background in data engineering or big data development, with hands-on experience in Spark (Core, SQL, Streaming). They should also have a solid understanding of Hadoop architecture and be proficient in Hive data modeling, query optimization, and performance tuning.

  • A strong track record of designing and implementing ETL/ELT pipelines using Spark for batch and streaming data processing.
  • Proven expertise in managing and optimizing Hadoop clusters (HDFS, YARN) for scalability and reliability.
  • Experience with building and maintaining Hive data models, partitions, and queries for analytics and reporting.
  • Expertise in improving query and pipeline performance through tuning, partitioning, bucketing, and caching.
  • Able to ensure data quality, governance, and security across the big data ecosystem.
  • Familiarity with cloud big data platforms (AWS EMR, Azure HDInsight, or GCP Dataproc) is highly desirable.
Key Responsibilities
  1. Data Engineering: Design and implement ETL/ELT pipelines using Spark for batch and streaming data processing.
  2. Hadoop Cluster Management: Manage and optimize Hadoop clusters (HDFS, YARN) for scalability and reliability.
  3. Hive Data Modeling: Build and maintain Hive data models, partitions, and queries for analytics and reporting.
  4. Performance Optimization: Improve query and pipeline performance through tuning, partitioning, bucketing, and caching.
  5. Data Governance: Ensure data quality, governance, and security across the big data ecosystem.
  6. Collaboration: Collaborate with data scientists, analysts, and architects to support advanced analytics and BI.


  • Singapore beBeeReliability Full time $125,000 - $175,000

    Job TitleA senior site reliability engineer is needed to ensure the smooth operation of large-scale distributed systems.Design, deploy, and manage CI/CD pipelines to deliver software consistently.Administer, scale, and optimize Kubernetes deployments for high availability and fault tolerance.Architect and maintain microservices infrastructure for seamless...


  • Singapore beBeeDataEngineer Full time

    As a seasoned data engineering professional, you will play a key role in the design and implementation of large-scale distributed systems. Job Description The ideal candidate will possess a strong background in data engineering or big data development, with hands-on experience in Spark (Core, SQL, Streaming). They should also have a solid understanding...


  • Singapore beBeeDistributedTraining Full time $125,000 - $175,000

    Distributed Training & Inference Optimization SpecialistWe are looking for a skilled specialist to maximize the performance and efficiency of large-scale training and inference workloads on our GPU clusters.Key Responsibilities:Optimize LLM training frameworks: Maximize GPU utilization and reduce training time using PyTorch, DeepSpeed, Megatron-LM, and...


  • Singapore RISKDATA CONSULTING PTE. LTD. Full time

    We are hiring Data Engineering Technologist - Large-Scale Distributed Systems with below requirements; **Responsibilities** - Design and maintain ETL/ELT pipelines using Spark for batch and streaming data. - Manage and optimize Hadoop clusters (HDFS, YARN) for scalability and reliability. - Build and maintain Hive data models, partitions, and queries for...


  • Singapore RISKDATA CONSULTING PTE. LTD. Full time

    Roles & ResponsibilitiesWe are hiring Data Engineering Technologist – Large-Scale Distributed Systems with below requirements;ResponsibilitiesDesign and maintain ETL/ELT pipelines using Spark for batch and streaming data. Manage and optimize Hadoop clusters (HDFS, YARN) for scalability and reliability. Build and maintain Hive data models, partitions, and...


  • Singapore RISKDATA CONSULTING PTE. LTD. Full time $104,000 - $130,878 per year

    We are hiring Data Engineering Technologist – Large-Scale Distributed Systems with below requirements;ResponsibilitiesDesign and maintain ETL/ELT pipelines using Spark for batch and streaming data.Manage and optimize Hadoop clusters (HDFS, YARN) for scalability and reliability.Build and maintain Hive data models, partitions, and queries for analytics and...


  • Singapore beBeeData Full time $120,000 - $180,000

    We are seeking a skilled Big Data Engineer to join our E-commerce Recommendation Infrastructure team.Job DescriptionThis role involves designing and implementing large-scale recommendation systems, ensuring high-performance storage and computing systems, and troubleshooting production issues. You will work closely with applied machine learning engineers to...


  • Singapore beBeeRecommendation Full time $80,000 - $120,000

    About the RoleWe are seeking a seasoned Data Architect to lead the development of our recommendation system. This is an exceptional opportunity to leverage your expertise and passion for building scalable, stable, and high-performance recommendation systems.Key ResponsibilitiesDesign and implement offline data architectures for large-scale recommendation...


  • Singapore beBeeDataInfrastructure Full time $80,000 - $120,000

    **System Architect for Large-Scale Recommendation Systems** Craft a robust data infrastructure to power offline recommendation systems serving over a billion users. Design an efficient architecture for real-time and offline data processing.Implement scalable storage solutions and computation models. Key Responsibilities:Develop large-scale distributed...


  • Singapore beBeeDataEngineering Full time

    Large Scale Data Engineering Expert We are seeking an experienced Large Scale Data Engineering Expert to join our team. As a key member of our team, you will play a vital role in designing and maintaining large-scale distributed systems. The ideal candidate will have hands-on experience with Spark (Core, SQL, Streaming), a good understanding of Hadoop...