Summary
The Monitoring Engineer is responsible for all system monitoring and performance monitoring in the RealPage DataCenter. This person will be working with each product to understand the overall product architecture and then implement monitoring systems and rules to proactively prevent downtime and performance issues.
PRIMARY RESPONSIBILITIES
- Plan, lead, and implement monitoring projects within the MOM/SCOM environment.
· Maintain and document system configuration as well as all ongoing moves and changes within SCOM environment.
· Evaluate monitoring and performance monitoring systems and implement chosen systems.
· Work closely with the NOC to insure monitoring systems are used effectively.
· Work with product teams to help them define appropriate alerts for their environments.
· Work with product teams and the systems architects to help them understand performance bottlenecks in applications.
· Administer agent & agentless environments within SCOM.
· Understand/use powershell for SCOM/MOM reporting.
· Create management packs & object groups.
· Create overrides and custom alerts for applications or app teams.
· Document processes and changes within the environment.
· Define system scope and objectives based on user defined needs.
- Identify potential performance and process improvements.
· Manage various operational requests and troubleshoot issues.
· Additional duties as assigned.
REQUIRED KNOWLEDGE/SKILLS/ABILITIES
- Bachelor’s Degree in computer science and/or equivalent experience
- Minimum of 3+ years experience in a Systems Analyst or Network Engineer role.
- Minimum of 2+ years of hands-on experience analyzing, supporting, and creating monitoring rules/methods within a System Center Operations Manager (SCOM) and/or Microsoft Operations Manager (MOM) environment.
- Minimum of 1+ year of hands-on experience with Linux Infrastructure Monitoring Systems. NAGIOS experience is highly preferred.
- Experience with Active Directory, Core Windows services (DHCP, DNS, WINS), Exchange, SQL, virtualization, and/or storage.
- Demonstrated analysis of sys and/or event logs within SQL/Oracle/Linux environments.
- Demonstrated knowledge of scripting languages including PowerShell, VBScript, PERL, JavaScript, and/or Java.
- MCSE certification, A+ certification and Network+ certification highly preferred.
- Ability to work independently or within a team environment
- Outstanding analytical and problem solving skills
- Demonstrated ability to manage multiple projects simultaneously
- Strong written and verbal communication skills
- Strong interpersonal and customer service skills required
- Ability to effectively prioritize and escalate issues
- Ability to work extended hours when needed