Site Reliability Engineer at ScitiX

Position Site Reliability Engineer
Posted 20 Jun 2026
Expired 20 Jul 2026
Company ScitiX
Location Singapore | SG
Job Type Full Time

Job Description:

Latest job information from ScitiX for the position of Site Reliability Engineer. If the Site Reliability Engineer vacancy in Singapore matches your qualifications, please submit your latest application or CV directly through the updated Jobkos job portal.

Please note that applying for a job may not always be easy, as new candidates must meet certain qualifications and requirements set by the company. We hope the career opportunity at ScitiX for the position of Site Reliability Engineer below matches your qualifications.

About the Company

We build the city where AI lives. ScitiX is building the digital foundation for AI to run reliably over time and scale into repeatable delivery. As models keep improving, real-world impact is still held back by fragmented infrastructure—training, fine-tuning, and inference spread across disconnected tools and cloud services, with compute, data, orchestration, billing, access control, and compliance out of sync.

ScitiX brings these pieces together with a cloud-native platform for unified management and intelligent scheduling of heterogeneous compute. We pool general-purpose compute, AI accelerators, and HPC across public, private, and hybrid environments to deliver efficient cross-platform scheduling and stable capacity. Centered on “Your data in. Your AI service out.”, we provide an end-to-end path that connects the full AI lifecycle within one system.

With ScitiX, teams move from experiment to production faster, run services more steadily, use resources more efficiently, and manage costs with clearer boundaries — AI doesn’t just run, it keeps running at scale, consistently.

About the Role

Responsible for kubernetes deployment, daily operation and maintenance, and troubleshooting of each training cluster.

Responsibilities

  • Responsible for the design and development of monitoring and automation functions of the cluster management platform, and continuously improving the cluster management and control capabilities.
  • Assisting in the analysis and troubleshooting of issues related to cluster containers, operating systems, networks, storage, etc.
  • Managing the quota of each business in the cluster, analyzing utilization rates, and subsequent capacity planning.
  • Participate in operation and maintenance duty, promptly handle faults, and respond to user issues and requirements.

Qualifications

  • Bachelor or above degree in computer science or related majors.
  • 3+ years of industrial experience, including solid Linux platform operation, maintenance, and debugging capabilities, with proficiency in troubleshooting, configuration optimization, and performance analysis.
  • Proficient in programming in one of the following programming languages such as: Python, Go, Shell, etc.
  • Familiar with the Kubernetes architecture, understand the functional characteristics of each component, and have rich practical experience in deployment and optimization of Kubernetes CNI, CSI, LB, etc.
  • Experience in large-scale training cluster construction and optimization is preferred.

Preferred Skills

  • Good communication and coordination skills.
  • Demonstrated independent thinking capabilities and troubleshooting skills.
  • Mandarin skills preferred for coordinating with our international partners.

Job Info:

  • Company: ScitiX
  • Position: Site Reliability Engineer
  • Work Location: Singapore
  • Country: SG

How to Submit an Application:

After reading and understanding the criteria and minimum qualification requirements explained in the job information Site Reliability Engineer at the office Singapore above, immediately complete the job application files such as a job application letter, CV, photocopy of diploma, transcript, and other supplements as explained above. Submit via the Next Page link below.

Next Page »

Similar Job Vacancies