High-productivity application development platform for real-time, event-driven collaborative systems
The Manager of Customer Reliability Engineering (CRE Manager) has the responsibility to ensure that customers, partners and Customer Success teams are trained and ready about deploying, maintaining, and ensuring availability of VANTIQ systems. These systems could be in the public VANTIQ Cloud or in private or semi-private clusters managed by customers or co-managed by VANTIQ and customers.
Protecting, provisioning, and delivering a stable operation of VANTIQ systems is the lifeblood of the company.
The CRE Manager reports to the Customer Success organization and works closely with the Site Reliability Engineers (SRE’s) in Engineering as well as Customer Success teams.
The successful candidate has an intimate knowledge of what it takes to deploy and support highly available, scalable, cloud-based solutions in a zero-downtime model as well as experience working directly with customers.
This position is based in the Bay Area in California only.
Critical for success are:
Your expertise of PaaS/SaaS/Cloud Operations, Infrastructure and Operations
Your love for customer success, consulting and building solutions for customers
Your entrepreneurship spirit, your independent and creative thinking
Your ability to communicate at executive level
Your experience with Project Management and project cycle
Your ability to get things done
In service of keeping VANTIQ’s revenue-critical systems up and running the CRE Manager will focus on:
Teaching customers how to deploy, maintain and ensure availability for their clusters
Design operational processes and define roles and responsibilities between customers and VANTIQ teams depending on the type of deployment
Coordination and communication of release activities for the SEs and the Customer Success staff (VANTIQ cloud or other options) so we know all stakeholders are prepared and ready for new releases, maintenance.
Cultivate awareness of Customer Success teams worldwide to ensure that customers and prospects are ready for each new release or maintenance.
We’re always on call to keep our services up and running, ensuring that the users developing and deploying VANTIQ applications have the best experience possible.
Act as an escalation point for Customer Success teams for new releases, maintenance and outages.
Recommend and continually optimize deployment practices and methodologies by working with customers, Customer Success and SRE’s.
Maintain services once they are live by measuring and monitoring availability, latency, and overall system health (24 X 7).
Ensure that our hosted services are compliant with our security audit obligations by patching and maintaining the infrastructure to the required level and within agreed SLAs (Corporate Security Policy, GDPR, etc.)
Practice sustainable incident response, create communication plans to customers, as well as Remediation Plans and Root Cause Analysis.
Document and refine internal policies and procedures.
A passion for true Customer Success
BS degree in Computer Science or related technical field involving systems engineering (e.g., physics or mathematics), or equivalent practical experience.
Project Management experience, Consulting Experience or Support Experience required
Experience managing products 24X7 that are deployed to a large-scale, cloud-based infrastructure – private (OpenStack) and/or public (AWS, Azure, Alibaba Cloud, GCP).
Creative thinking, Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
Enthusiasm and drive to learn technologies, build-maintain-enhance tools, and deploy them in production.
Experience in one or more of the following: C, C++, Java, Python, Go, Perl, Ruby or shell scripting.
Detailed trouble-shooting skills to investigate system performance bottlenecks and bugs
Good working knowledge of Security Principles
Experience with RESTful services and service oriented architecture / microservices
Expertise in designing, analyzing and troubleshooting large-scale distributed systems. Proficiency in network, distributed, asynchronous, and concurrent programming.
Experience working in a fast-growing early stage start-up with world-wide operations.
VANTIQ’s goal is to provide the best technology platform for enterprises to digitize their business while keeping humans in charge.
The VANTIQ application platform-as-a-service enables users to develop, deploy and run real-time enterprise applications driven by data streams from IoT, connected products, social, enterprise systems of record and people. All applications created with VANTIQ are event-driven, allowing businesses to respond in real-time to any business event.
VANTIQ is a platform that you or your software development partners can use to quickly develop applications that transform the way you operate your business. New technologies such as AI, IoT, and blockchain are easily brought together with your legacy systems by VANTIQ. VANTIQ applications allow you to analyze data flowing in and around your operations and enable your machines or people to take appropriate actions in real time – when the actions have the biggest impact.
Applications built by VANTIQ include facility security (using AI and facial recognition technology); field service management (using real-time location tracking); logistics and supply chain management (using IoT sensors and GPS); and many more.
Such applications are highly customized to your specific needs and are built with VANTIQ in days or weeks – not the months or years it would normally take. VANTIQ makes your organization’s digital transformation quicker and easier than you ever thought possible.