Staff Production Operations Engineer

August 21

🏢 In-office - Toronto

Apply Now

Index Exchange

The most efficient ad marketplace for media owners to monetize content and marketers to deliver relevant ad experiences

501 - 1000

Description

• Lead efforts to ensure our systems and networks operate seamlessly. • Oversee internal metrics, execute effective incident responses, and contribute to system optimizations. • Own the monitoring and maintenance of the health, security, and performance of on-premises and hybrid-cloud infrastructure. • Execute timely and effective incident responses, minimizing downtime and ensuring swift resolution. • Build and update disaster recovery plans and security protocols and drive the maintenance of system backups. • Respond to alerts within our established SLOs and assist in incident triage, ensuring that the right teams are engaged to address issues promptly. • Influence the team’s direction and foster accountability, trust, and focus on goals. • Act as a primary contact for operational issues, providing technical support and ensuring issues are resolved efficiently. • Collaborate with product and software engineering teams to provide operational insights and relay requirements. • Foster a collaborative environment by bringing people together to come up with better designs and approaches to complex problems. • Identify opportunities for system optimization and performance improvements. Engage technical leads and management and drive the change. • Research and implement advancements in technology and industry best practices. • Develop and maintain automation frameworks to streamline processes and reduce manual tasks. • Own risk identification and mitigation to proactively address present or anticipated operational challenges. • Identify and implement catalysts for future optimization including provisioning techniques, deployment optimization, ancillary services, pipelines, ansible playbooks, power usage, bandwidth etc. • Create and ensure maintenance of comprehensive documentation for system configurations, processes, and incident resolution procedures. • Participate in knowledge sharing and provide cross-training to other departments. • Maintain runbooks and technical documentation, ensuring familiarity with internal and external escalation pathways.

Requirements

• In-depth understanding of the Linux operating environment: kernel tuning, network stack tuning, system observability & instrumentation, and security & access management. • Solid understanding of layer 2-7 networking fundamentals and the relationship between servers & services, and the transit of their packets through network hardware. • In-depth experience engineering and maintaining a private-cloud infrastructure: Bare-metal, vSphere, KVM, Kubernetes. • Experience with tools like Ansible, Terraform, Docker, Kafka, Nexus • Experiencing with observability platforms: Prometheus, ELK, Jaeger, Grafana, Nagios, Zabbix • Familiarity with Big Data tools: Hadoop, HDFS, Spark, HBase • Ability to write code in Go, Python, Bash, or Perl for automation. • 7+ years of proven experience in previous roles or one of the following roles: DevOps, Linux System Administration, Site Reliability Engineering. • Built or maintained a private-cloud infrastructure running centos/rocky linux on a mix of bare-metal, virtualization, and containerization. • Managed public cloud environments such as AWS, GCP, Azure and their federation into on-premise environments. • Life-cycle management of bare metal servers such as Dell and Supermicro in globally distributed data centers (e.g. break-fix, baseband/firmware updates). • Built or maintained on-premise and cloud Kubernetes clusters: Kubeadm,EKS, GKE • Built or operated automation & orchestration frameworks for deployment & maintenance pipelines: e.g. Kafka, StackStorm, Ansible, Argo CD, Terraform to push out code or configuration updates, and building new infrastructure systems.

Benefits

• Comprehensive health, dental, and vision plans at no cost to you • Time off and flexible work schedules • Retirement plan with a 5% company match • Stock options and equity packages • Generous parental leave • Monthly wellness stipend plus fitness discounts and quarterly wellness group activities • Community engagement opportunities and donation-matching program • Annual virtual company retreats and regular community-led team events • One day off per year to volunteer • A workplace that supports a diverse, equitable, and inclusive environment

Apply Now
Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or lior@techjobscanada.app
Jobs by Title
Account Executive jobsAccounting Manager jobsAccountant jobsAdministration jobsAdministrative Assistant jobsAnalytics Engineer jobsAndroid Engineer jobsAttorney jobsBackend Engineer jobsBusiness Development Rep jobsBusiness Operations & Strategy jobsChief of Staff jobsCivil Engineer jobsCloud Engineer jobsCommunity Manager jobsCompliance jobsContent Marketing Manager jobsContent Manager jobsContent Writer jobsCopywriter jobsCustomer Success jobsCustomer Support jobsData Analyst jobsDatabase Administrator jobsData Engineer jobsData Entry jobsData Scientist jobsDevOps jobsEcommerce jobsElectrical Engineer jobsEmail Marketing Manager jobsEngineering Manager jobsExecutive Assistant jobsController jobsFinancial Planning and Analysis jobsFull-stack Engineer jobsFrontend Engineer jobsGame Engineer jobsGeneral Counsel jobsGraphics Designer jobsGrowth Marketing jobsHuman Resources jobsiOS Engineer jobsInfluencer Marketing jobsInfrastructure Engineer jobsIT Support jobsMachine Learning Engineer jobsMarketing jobsMedical Writer jobsMechanical Engineer jobsOperations jobsParalegal jobsPerformance Marketing jobsProduct Analyst jobsProduct Designer jobsProduct Manager jobsProject Manager jobsProgram Manager jobsProduct Marketing jobsQA Engineer jobsSDET jobsRecruitment jobsRisk jobsSales jobsSales Development Rep jobsSales Engineer jobsSalesforce Administrator jobsSalesforce Analyst jobsSalesforce Consultant jobsSalesforce Developer jobsScrum Master / Agile Coach jobsSecurity Engineer jobsSEO Marketing jobsSite Reliability Engineer jobsSocial Media Manager jobsSoftware Engineer jobsSolutions Engineer jobsSupport Engineer jobsSystem Administrator jobsSystems Engineer jobsTax jobsTechnical Account Manager jobsTechnical Writer jobsTechnical Product Manager jobsUser Researcher jobs