Manager, Customer Reliability Engineering

Website Vantiq VANTIQ

High-productivity application development platform for real-time, event-driven collaborative systems

The Manager of Customer Reliability Engineering (CRE Manager) has the responsibility to ensure that customers, partners and Customer Success teams are trained and ready about deploying, maintaining, and ensuring availability of VANTIQ systems. These systems could be in the public VANTIQ Cloud or in private or semi-private clusters managed by customers or co-managed by VANTIQ and customers.

Protecting, provisioning, and delivering a stable operation of VANTIQ systems is the lifeblood of the company.

The CRE Manager reports to the Customer Success organization and works closely with the Site Reliability Engineers (SRE’s) in Engineering as well as Customer Success teams.

The successful candidate has an intimate knowledge of what it takes to deploy and support highly available, scalable, cloud-based solutions in a zero-downtime model as well as experience working directly with customers.

This position is based in the Bay Area in California only.

Critical for success are:

  • Your expertise of PaaS/SaaS/Cloud Operations, Infrastructure and Operations
  • Your love for customer success, consulting and building solutions for customers
  • Your entrepreneurship spirit, your independent and creative thinking
  • Your ability to communicate at executive level
  • Your experience with Project Management and project cycle
  • Your ability to get things done

In service of keeping VANTIQ’s revenue-critical systems up and running the CRE Manager will focus on:

  • Teaching customers how to deploy, maintain and ensure availability for their clusters
  • Design operational processes and define roles and responsibilities between customers and VANTIQ teams depending on the type of deployment
  • Coordination and communication of release activities for the SEs and the Customer Success staff (VANTIQ cloud or other options) so we know all stakeholders are prepared and ready for new releases, maintenance.
  • Cultivate awareness of Customer Success teams worldwide to ensure that customers and prospects are ready for each new release or maintenance.

We’re always on call to keep our services up and running, ensuring that the users developing and deploying VANTIQ applications have the best experience possible.

Additional Responsibilities

  • Act as an escalation point for Customer Success teams for new releases, maintenance and outages.
  • Recommend and continually optimize deployment practices and methodologies by working with customers, Customer Success and SRE’s.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health (24 X 7).
  • Ensure that our hosted services are compliant with our security audit obligations by patching and maintaining the infrastructure to the required level and within agreed SLAs (Corporate Security Policy, GDPR, etc.)
  • Practice sustainable incident response, create communication plans to customers, as well as Remediation Plans and Root Cause Analysis.
  • Document and refine internal policies and procedures.



  • A passion for true Customer Success
  • BS degree in Computer Science or related technical field involving systems engineering (e.g., physics or mathematics), or equivalent practical experience.
  • Project Management experience, Consulting Experience or Support Experience required
  • Experience managing products 24X7 that are deployed to a large-scale, cloud-based infrastructure – private (OpenStack) and/or public (AWS, Azure, Alibaba Cloud, GCP).
  • Creative thinking, Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
  • Enthusiasm and drive to learn technologies, build-maintain-enhance tools, and deploy them in production.


Desired Qualifications:

  • Experience in one or more of the following: C, C++, Java, Python, Go, Perl, Ruby or shell scripting.
  • Detailed trouble-shooting skills to investigate system performance bottlenecks and bugs
  • Good working knowledge of Security Principles
  • Experience with RESTful services and service oriented architecture / microservices
  • Expertise in designing, analyzing and troubleshooting large-scale distributed systems. Proficiency in network, distributed, asynchronous, and concurrent programming.
  • Experience working in a fast-growing early stage start-up with world-wide operations.



VANTIQ’s goal is to provide the best technology platform for enterprises to digitize their business while keeping humans in charge.

The VANTIQ application platform-as-a-service enables users to develop, deploy and run real-time enterprise applications driven by data streams from IoT, connected products, social, enterprise systems of record and people. All applications created with VANTIQ are event-driven, allowing businesses to respond in real-time to any business event.


To apply for this job email your details to

VANTIQ in Plain English

Still not crystal clear on what VANTIQ is? Read our non-technical description HERE

Featured Resources
White Paper
VANTIQ - Enabling the Digital Enterprise
Read the vision of Marty Sprinzen, CEO of VANTIQ, on how VANTIQ truly enables the real-time digital enterprise.
New Gartner Report - Free!
Innovation Insight for Event Thinking
Read Gartner’s latest report on the event-driven application revolution PLUS new product tour videos, VANTIQ FAQ, and interview with VANTIQ co-founders Marty & Paul.
Demo Video
Real-Time Application Overview and VANTIQ Demo
A detailed overview of why real-time, event-driven applications are required to truly transform your business operations. Includes a demo of how to rapidly build and deploy one.

Follow Vantiq