Job Description We partner with the most important institutions in the world to transform how they use data and technology. Our software has been used to stop terrorist attacks, discover new medicines, gain an edge in global financial markets, and more. If these types of projects excite you, we'd love for you to join us.
The Role: You'll be a key part of a cross-functional team that helps critical institutions solve their most pressing problems. Together with a team of analysts, developers, technical project managers, and systems experts, you will be directly responsible for keeping a Palantir system running smoothly and securely. You'll serve as the team's expert and owner for all things systems administration. You are the first line of defense against a variety of threats to the uptime of your servers.
On calm days you'll be proactively keeping things safe and secure through the implementation of industry best practices, security updates, and Palantir developed systems automation. When the unexpected occurs, you'll follow the trail of monitoring alerts and log messages to the source of the trouble, triaging outages and working with your team to understand what went wrong and how to fix it for good.
Core Responsibilities - Administer enterprise Linux servers including operating system patching, security hardening, monitoring, and troubleshooting.
- Administer AWS cloud accounts with Terraform as well as troubleshooting and debugging via the AWS Console and CLI
- Handle the operations of data storage and indexing systems, including monitoring, backup management, and upgrades.
- Configure and maintain web servers including monitoring and configuration management.
- Work with customer IT teams to coordinate changes and troubleshoot intersystem problems
Technologies We Use - Amazon Web Services and on-premises servers
- CentOS and Red Hat Enterprise Linux
- Prometheus and Grafana
- Oracle, Postgres, Cassandra, and Elasticsearch
- NGINX and Envoy HTTP servers
- Puppet, Ansible, Python, and shell scripting
What We Value - Experience with Linux system administration
- Ability to troubleshoot server hardware failures.
- Understanding of Amazon Web Services
- Applied knowledge in operating system security and hardening.
- Ability to automate repetitive tasks using Ansible, Python, or similar language.
- Experience in patch and configuration management in enterprise production environments
- Exposure to the operation and configuration of database and web server technologies
- Unwavering commitment to operational security and best practices