Urgently needed for
Senior Observability Engineer
Company: TriNet
The role of a Senior Observability Engineer is to design, implement, and maintain comprehensive observability solutions for complex systems and applications. This position requires a deep understanding of monitoring and observability practices, as well as expertise in using various tools and technologies to collect and analyze performance, logging, and metrics data.Essential Duties/Responsibilities
- Monitoring Setup and Configuration: Set up and configure the monitoring tools to collect data from various systems, applications, and network components. This involves defining monitoring metrics, configuring data collection agents or agents, and ensuring proper connectivity and access.
- Alert Management: Monitor alerts generated by the tools and perform triage to identify critical issues. Analyze alert patterns, fine-tune alert thresholds, and configure alert escalation workflows to ensure timely response and resolution.
- Performance Analysis and Troubleshooting: Utilize the tools’ features and functionalities to analyze performance metrics, logs, and traces. Conduct investigations and root cause analysis to troubleshoot and resolve performance issues, identifying bottlenecks and areas for optimization.
- Incident Response: Collaborate with cross-functional teams to respond to and resolve incidents in a timely manner. Engage in incident management processes, including incident triage, communication, and coordination with relevant stakeholders, and participate in post-incident reviews to identify areas for improvement.
- Dashboard and Visualization: Create and maintain dashboards and visualizations using tools like Grafana, providing a consolidated view of system health, performance, and key metrics. Customize dashboards to meet specific business and operational requirements and share them with relevant teams and stakeholders.
- Capacity Planning and Scalability: Monitor resource utilization and performance trends to forecast capacity requirements. Collaborate with capacity planning teams to plan and provision resources based on anticipated growth and workload patterns, ensuring scalability and optimal performance.
- Tool Administration and Maintenance: Perform routine administration tasks for the observability tools, such as user management, access control, and system upgrades or patching. Monitor the health and availability of the tools themselves, ensuring their reliability and functionality.
- Documentation and Knowledge Sharing: Document monitoring configurations, troubleshooting procedures, and best practices for future reference. Contribute to internal knowledge bases and collaborate with the team to share insights and lessons learned.
- Tool Integration and Automation: Integrate observability tools with other systems and workflows, such as ticketing systems, incident management platforms, and automation frameworks. Automate monitoring configurations, data collection, and reporting processes to improve efficiency and reduce manual effort.
- Continuous Improvement and Research: Stay updated with the latest developments in observability practices and technologies. Research and evaluate new tools and techniques that could enhance the monitoring and observability capabilities of the organization. Continuously improve existing monitoring setups, workflows, and processes to align with industry best practices.
- Performs other duties as assigned
- Complies with all policies and standards
QUALIFICATIONSEducation
- Bachelor’s Degree in computer science or other highly technical, scientific subject area preferred
Work Experience
- Typically 5+ years experience with systems engineering and/or information technology
Knowledge, Skills and Abilities
- Demonstrate knowledge and experience administering application, cloud infrastructure monitoring.
- Hands-on experience on Prometheus & Grafana
- Hands-on experience on Elasticsearch (AWS OpenSearch) & Oracle Logging Analytics or similar tools like Datadog, Splunk, Sumo Logic
- Hands-on experience on APM tool AppDynamics or similar tools like Dynatrace, New Relic
- Scripting Language experience (Python preferred)
- Strong understanding of web services and swagger is a plus.
- Experience with CI/CD pipelines
- Attitude to thrive in a fun, fast-paced environment.
- Ability to excel at problem solving, adapt easily to change, and contribute effectively both individually and as part of cross-functional teams.
- Proficiency in Infrastructure as Code (IaC), particularly CDK and Terraform, is highly desirable.
- Passion for DevOps, Application/API monitoring, automation, and reliability
Work Environment:
- Work in clean, pleasant, and comfortable home or office setting. The work environment characteristics described here are representative of those an employee encounters while performing the essential functions of this job. Reasonable accommodations may be made to enable persons with disabilities to perform the essential functions.
- Position may be considered remote and require reliable and consistent internet service.
Travel Requirements
MinimalThe salary range for this role is $76,000 to $182,400. The candidate’s final salary offer will be based on the candidate’s skills, education, work location and experience.A candidate’s compensation may also include bonuses consistent with TriNet’s corporate bonus plan.Additionally, subject to applicable eligibility requirements, TriNet offers permanent full-time employees a variety of benefits including medical, dental, and vision plans, life and disability insurance, a 401(K) savings plan, an employee stock purchase plan, eleven (11) Company observed holidays, PTO and a comprehensive leave program. Please click the following link for detailed information about our benefits offerings:Please Note: TriNet reserves the right to change or modify job duties and assignments at any time. The above job description is not all encompassing. Position functions and qualifications may vary depending on business necessity.TriNet is an Equal Opportunity Employer and does not discriminate against applicants based on race, religion, color, disability, medical condition, legally protected genetic information, national origin, gender, sexual orientation, marital status, gender identity or expression, sex (including pregnancy, childbirth or related medical conditions), age, veteran status or other legally protected characteristics. Any applicant with a mental or physical disability who requires an accommodation during the application process should contact recruiting@trinet.com to request such an accommodation.
Expected salary: $76000 – 182400 per year
Location: USA
Job date: Sat, 19 Oct 2024 06:40:11 GMT
Apply for the job now!
[ad_2]