Anywhere
Posted 2 years ago

The Role:
You will join a team working with cutting-edge technologies and striving to utilize cloud services to the maximum. Your role will be to monitor critical applications and services to ensure their availability and minimize downtime. You will also ensure that the underlying infrastructure runs smoothly and systems/tools work as expected, help developers with troubleshooting and provide consultation in case of alerts.

The main responsibilities of the position include:

Monitor critical application metrics and create alerts
Build or use software to help DevOps, Devs and Support teams
Fix support escalation issues
Maintain documentation and runbooks
Conduct post-incident reviews

Main requirements:

5+ years of experience in a SRE, DevOps or similar role
Experience in incident, problem and change management practices
Extensive working experience with CI/CD procedures and tools (e.g. Gitlab CI)
Strong experience using containers and Kubernetes
Experience with Infrastructure as Code (Terraform, CloudFormation)
Ability to work as part of a distributed team
Experience with monitoring tools (Prometheus, Grafana, New Relic)

The following will be considered an advantage:

Familiarity with database concepts
Working experience with at least one cloud provider (preferably AWS)
Experience with Apache Kafka configuration and troubleshooting
Experience with ELK configuration and troubleshooting
Scripting skills (Bash, PowerShell, Python, Go, etc.)

Benefit from:

Attractive remuneration package
Intellectually stimulating work environment
Continuous personal development and international training opportunities
Attractive relocation package and support for a smooth relocation for you and your family

All applications will be treated with strict confidentiality!

To apply for this job please visit euremotejobs.com.

Site Reliability Engineer

Categories

Pages