DevOps – Site Reliability Engineer – Document Management

Full Time
New York
Posted 2 months ago

DevOps – Site Reliability Engineer – Document Management


  • Design, code, test and deliver software to automate manual operational work
  • Troubleshoot priority incidents, facilitate blameless post-incident evaluations, and ensure permanent closure of incidents
  • Engage with development teams throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes
  • Identify application patterns and analytics in support of better service level objectives
  • Design and develop self-healing and resiliency patterns
  • Design and develop performance tests, identify bottlenecks and opportunities for optimization and capacity demands, and present solutions for continuous improvements
  • Implement best in class monitoring frameworks to accomplish end-to-end flow monitoring and noiseless alerting
  • Develop automated software and product upgrades, change management and release management solutions
  • Influence developers and other teams across the globe to ensure resiliency and stability standards
  • Effectively split time between operational work and engineering work
  • Contribute to around the clock support coverage as needed


  • BS/BA degree or equivalent experience in a software engineering discipline
  • 5 years of experience in working with at least one technology stack designing, coding, testing, delivering software
  • Previous or current experience with software development, infrastructure development, or development and operations
  • Working knowledge or understanding of infrastructure components such as routers, load balancers, cloud products, container systems, compute, storage, and networks
  • Excellent debugging and trouble shooting skills
  • Experience with Linux infrastructures, database SQL (MS SQL), CI/CD tools (Jenkins, Jules, Maven), scripting such as JavaScript, Python or Perl or Ruby, Scrum/Kanban/Agile methodologies
  • 5+ years of experience in Core Java, Oracle, MS SQL Server, Unix Shell Scripting
  • Basic knowledge about Development in J2EE, Spring Boot, MVC etc.
  • Experience about Support models – Incident Management, Problem Management
  • Knowledge about version control repositories like GIT
  • Understanding of NOSQL databases e.g., Marklogic, Hadoop, MongoDB etc.
  • Working knowledge of Centralized logging (Splunk) Or Log As Service
  • Prefer to have experience Application Monitoring tools like Apica, AppDynamics or Dynatrace
  • Good to have experience with cloud native technologies e.g., AWS, Kubernetes, and Pivotal GAIA
  • Good interpersonal skills and communication with all levels of management and end users, participate in incident management and issue resolution across multiple teams
  • Perform Production releases and support triaging of issues during releases or post releases
  • Problem Solver – think through the problems and come up with alternatives to resolve it short term and/or long term.
  • Able to prioritize and manage time efficiently

Job Features

Job CategoryAdministrative, Banking, Finance, Information Technology, Operations, Project Management, Regulation, Technology

Apply Online