Data Infrastructure Architect

6 days ago


Singapore beBeeDataInfrastructure Full time

Job Title: Data Infrastructure Architect

About the Role

We are seeking an experienced Data Infrastructure Architect to build and maintain scalable data infrastructure, converting research into production-ready solutions for synthetic tabular data generation.

You will architect and operate large-scale data curation, scraping, and cleaning pipelines to deliver massive datasets for pretraining and finetuning large language models on tabular and unstructured domains.

This is an individual contributor role suited for someone who thrives in a fast-paced environment. The ideal candidate has experience scaling data and machine learning systems to handle billions of records and can build complex data pipelines for enterprise applications.

You'll work closely with software, machine learning, and applied research teams to optimize performance and ensure seamless integration of systems.

Key Responsibilities
  1. Data Infrastructure Development:
  • Design and build data ingestion pipelines from enterprise relational databases (e.g. Oracle, SQL Server, PostgreSQL, MySQL) and files (e.g. Parquet, CSV) for large-scale synthetic data pipelines.
  • Architect and maintain data warehouses and data lakes (e.g. Delta Lake) optimized for synthetic data training and generation workflows.
  • Seamlessly transform Pandas-based research code into production-ready pipelines.
  • Build automated data quality monitoring and validation systems to ensure data integrity throughout the pipeline lifecycle.
  • Implement comprehensive data lineage tracking and audit capabilities for regulatory compliance and privacy validation.
  • Design robust error handling mechanisms, with automatic retries and data recovery in case of pipeline failures.
  • Track performance metrics such as data throughput, latency, and processing times to ensure efficient pipeline operations at scale.
  • Implement monitoring and alerting (e.g. Prometheus, Grafana) for pipeline health, throughput, and data quality metrics.
Massive-Scale Data Collection & Ingestion:
  • Design and build distributed web scraping clusters to extract data from millions of pages.
  • Build LLM-aided data filtering systems combining automated model scoring to evaluate and prioritize high-quality content.
Understanding of ML Concepts:
  • Fair understanding of machine learning concepts, training workflows, and algorithms, with familiarity in tools like PyTorch and Hugging Face.
Requirements
  • Bachelor's degree in Computer Science, Software Engineering, Data Engineering, or related field with strong foundation in distributed systems and data processing.
  • Expert proficiency at scaling data pipelines and machine learning systems to handle billions of rows in enterprise environments.
  • 3+ years of experience in building scalable data solutions with Python and distinct libraries such as Pandas, NumPy, Scikit-learn, PyTorch, Spark, Dask, Airflow, Dagster, etc.
  • Expertise in automated data quality frameworks including rule-based and AI-based automation for format validation, anomaly detection, statistical validation.
  • Proficiency in building ETL/ELT pipelines and managing data across relational databases, data lakes, and cloud storage.
  • Experience in building data monitoring and alerting systems.
  • Hands-on experience with web scraping tools (Scrapy, Selenium, Puppeteer).
  • Experience building ML data pipelines and supporting infrastructure for training and deploying machine learning models at scale.
Good to Have
  • Experience with data governance frameworks and compliance requirements (GDPR, CCPA, PDPA) in data processing systems.
  • Experience with containerization and orchestration using Docker, Kubernetes, and cloud-native deployment strategies.
About Us

This is a unique opportunity for someone looking to actively build and scale systems in a fast-moving start-up. If you've successfully scaled machine learning and data systems to billions of rows and thrive in a dynamic environment, this role is for you.

Benefits
  • Flexible time-off arrangements.
  • Flexible work arrangements - work from office at One North or WFH on some days.


  • Singapore beBeeInfrastructure Full time $150,000 - $200,000

    **Data Infrastructure Architect**Our organization is seeking a skilled Data Infrastructure Architect to design and implement robust data infrastructure that transforms raw data into actionable insights.The ideal candidate will have expertise in data engineering, data architecture, and system thinking. They will work on building scalable pipelines and data...


  • Singapore beBeeData Full time $80,000 - $120,000

    Job Title: Data Infrastructure ArchitectWe are seeking a highly skilled and innovative Data Infrastructure Architect to join our team.About the JobThe ideal candidate will have experience in designing and implementing large-scale data infrastructure systems, with a focus on scalability, performance, and reliability.Responsibilities:Design and implement a...


  • Singapore beBeeinfrastructure Full time

    Job Title: IT Infrastructure Architect A challenging opportunity exists for a highly skilled IT infrastructure architect to design and maintain our IT infrastructure, ensuring robustness, security, and scalability. Responsibilities: Design and implement infrastructure solutions that ensure high availability, performance, and security across all...


  • Singapore beBeeArchitecture Full time $80,000 - $120,000

    We are a forward-thinking team seeking an accomplished Infrastructure Architect to spearhead the development and maintenance of our data pipelines, leveraging cutting-edge technologies to drive business outcomes.


  • Singapore Panasonic Full time

    Job Function Information Technology Reference Number 001/1048 Advertised 12 hours ago Experience Required 5 to 10 years Minimum Qualification Degree Job description Infrastructure Architect/Technology Evangelist is part of APAC Infrastructure Solution division (small but sharpest unit in Enterprise Infrastructure). This role is responsible for architecting...

  • Technical Architect

    5 days ago


    Singapore GREENBEEN TECHNOLOGY SERVICES PRIVATE LIMITED Full time

    **Role Overview**: **Key Responsibilities**: - **Architectural Leadership**:Lead the design, planning, and implementation of advanced infrastructure solutions, ensuring alignment with business goals and long-term IT strategy. Define technical architecture roadmaps and oversee the transition to next-generation infrastructure. - **Infrastructure Design and...


  • Singapore Singapore Post Ltd Full time

    Job Description Job Title: IT Infrastructure Architect Corporate Unit: IT Location: Singapore Reporting to: AVP, IT Infrastructure Architect - Support the IT Infrastructure Architect Lead in defining and creating technology strategies for infrastructure to optimize use of technology resources related to IT infrastructure. - Analyze the current...


  • Singapore TRITON AI PTE. LTD. Full time

    **Permanent position, Salary up to $9,000 with AWS + 3 months VB**: - **Reputable Healthcare Sector, West MRT**: **Responsibilities**: - Collaborate with stakeholders to understand requirements for data structure, availability, scalability & access. - Establish and maintain data pipelines from ingestion (batch, real-time, or streaming), stage,...


  • Singapore beBeeSystem Architect Full time $120,000 - $180,000

    Job Title: Enterprise Infrastructure DesignerWe are seeking a highly skilled Enterprise Infrastructure Designer to lead our software engineering team in building and maintaining a comprehensive software stack for various business functions.This role involves driving productivity transformation through technology innovation and ensuring the development of...

  • Data Architect

    6 days ago


    Singapore DFS Group Full time

    DFS is rapidly transforming its information systems landscape to drive interesting business initiatives which has dependencies on how effectively data assets are being managed. In this role we are looking for the data architect to lead the DFS data architecture roadmap across new and legacy technology areas through collaboration with Business Partners,...