Senior Site Reliability Engineer (Database) at Wikimedia Foundation
The Wikimedia Foundation is the non-profit organization that hosts and operates Wikipedia and the other Wikimedia free knowledge projects. Our vision is a world in which every single human can freely share in the sum of all knowledge. We believe that everyone has the potential to contribute something to our shared knowledge, and that everyone should be able to access that knowledge, free of interference. We host the Wikimedia projects, build software experiences for reading, contributing, and sharing Wikimedia content, support the volunteer communities and partners who make Wikimedia possible, and advocate for policies that enable Wikimedia and free knowledge to thrive.
We are recruiting to fill the position below:
Job Title: Senior Site Reliability Engineer (Database)
- We seeking a Senior Site Reliability Engineer (Databases). Our objective is to make the sum of all human knowledge available to everyone, and we persist most of this knowledge in MariaDB.
- Our project sites are some of the most highly trafficked on the internet, with more page views per engineer than any other site.
- As a Senior Site Reliability Engineer for databases at the Wikimedia Foundation, you will be part of a small, focused team of skilled and experienced engineers.
- In this role, you will be responsible for ensuring the health of our database systems - including their availability and performance.
- Your responsibilities will include troubleshooting issues, planning for disaster recovery, and enhancing and maintaining backups. You do not have to be a database expert but must be willing to be trained to be one.
- The work we do is crucial and is used by hundreds of millions of people. This is a unique opportunity to have a huge impact.
- Implementation, maintenance and troubleshooting of relational database systems in production and staging environments
- Database performance tuning, high availability, replication, backups, and general optimization.
- Supporting the development and deployment of new services and systems.
- Handling configuration management, (Debian) package maintenance, patching and building, working with upstream on bug identification and resolution.
- Improving observability (alerting, metrics, monitoring) of database infrastructure
- Multi-datacenter design, capacity and infrastructure planning
- Taking part in incident response, diagnosis and follow-up on system outages or alerts across Wikimedia’s production infrastructure and participating in an on call rotation
- Sharing our values and work in accordance with them.
- B.Sc or M.Sc in Computer Science or equivalent work experience.
- 5+ years experience in an DBA/SRE/Operations/DevOps role as part of a team
- Experience with Open Source configuration management and orchestration tools (Puppet, Ansible, Chef, SaltStack, etc.), as well as modern observability infrastructure (Prometheus, Grafana, Graphite, Logstash/Kibana, Icinga/Nagios, etc.).
- Advanced knowledge of Linux and IO/data storage concepts, internals and troubleshooting.
- Experience with managing remotely both bare-metal servers and virtualized environments.
- Proficient at automation/programming/scripting skills.
- Experience with high traffic and highly available website architectures and operations.
- Strong English language skills.
- Ability to work independently in a fast paced environment, as an effective part of a globally distributed team, including ticket tracking systems and asynchronous communication tools.
- Advanced level of experience with MariaDB or MySQL database administration and replication topologies at scale
- Proficiency in SQL
- Experience in architecture, design, and implementation of persistent data storage & query infrastructure
- Strong track record of open source contributions is a major plus
- Solid knowledge of relational database concepts and working experience with storage systems and architecturesExperience with LAMP stack technologies (PHP/HHVM, memcached/Redis) - MediaWiki experience is a definite plus
- Experience with advanced distributed storage and database systems (Swift, Ceph, Cassandra, etc.) or graph databases (Titan, Blazegraph, etc.) is a big plus.
How to Apply
Interested and qualified candidates should:
Click here to apply