What a cloud provider technical support engineer does

A support engineer is a person who performs initial diagnostics through monitoring and logging systems and on the basis of that makes hypotheses for solving the incident. For example, he describes what errors he has detected, what has already been done to solve them and what is planned, and also makes recommendations about what the customer needs to do from his side.

In this article I will tell you why cloud support and https://telegram-store.com/catalog/product-category/channels/technologies is not an encyclopedia, but regular challenges and non-standard tasks. And what experienced engineers and architects do in the support department.

Support specialists work on three lines

The first line is L1

The shift on duty. It monitors the state of the virtual infrastructure via notifications in the monitoring system, solves service requests and incidents. If there are technical problems, the team tries to work proactively, notifying L2 and L3. More often than not, the problem is fixed before customer requests start coming in.

As a rule, the duty officer checks the load on the customer’s virtual servers, generates reports on it, updates the OS on VMs, modifies virtual resources at request – for example, increases the amount of RAM on the customer’s virtual servers – and activates/deactivates services: connects or disconnects virtual infrastructure and other services and products #CloudMTS.

The first line processes approximately 70% of customer requests.

The second line is L2

If the incident can’t be solved – for example, there’s no network access to the virtual machine – then the person on duty runs diagnostics, collects logs from the client and passes the information to the second line.

L2 engineers support the client’s infrastructure, each in their own area: networking, virtualization, or backup systems. Every day, the duty engineers collect and work through the requests for their group, train the duty shift, or do other important tasks, such as solving less urgent cases with low priority, which did not have time to finish yesterday, prepare reports.

The third line – L3

Its specialists are responsible for the architecture of the iron and operation, working with the internal infrastructure, carrying out technical work, routine and emergency updates, forming working instructions for L2.

Approximately 50% of our tasks are automated, for example, virtual infrastructure is started with scripts. But the main part: handling requests, responding to monitoring events, and so on, is done manually.

We use a monitoring system written in C, PHP and Java. It records all incidents from the physical servers.

We use a combination of two solutions to monitor the infrastructure. The first allows us to generate a report for a certain period of time, build a graph of resource consumption and system behavior. The second captures all the actions of users and administrators: logging in, attaching machines, creating snapshots. You can open events for an individual virtual machine and track what happened to it.

Hard skills and certification are important for support engineers

Trouble in data center

The hard skills are important for the front line technician on duty: he needs to know how the network is organized and what is TCP/IP, DNS, – and the basic skills of the system administration: diagnostics, load checks, system updates, working with the users catalog. Certification is not required, but if you have it, it’s a plus: it’s easier to immerse yourself in the processes.

Responsibility, the ability to communicate with clients and respond quickly are useful from software.

In order to work on the second line, it is desirable to receive professional training from vendors who work with virtualization platforms, networking, server hardware, enterprise software. Most of our engineers are certified specialists in one service or another.

Hard skills depend on the area in which the engineer wants to develop. These can be:

  • experience with virtualization and certification on vendors that manage virtualization systems;
  • advanced knowledge of the networking stack (e.g., knowledge of BGP), and certification in networking technologies;
  • skills in working with backup systems.

The main soft skill is communication skills. Support communicates a lot with customers, it’s largely a team effort: the more information you can gather, the faster you can help the customer and solve his problem.

L3 engineers must also be certified in one of the areas and have skills to work with systems at the level of an architect:

  • TOR writing;
  • deployment of virtualization systems;
  • working with iron infrastructure;
  • deployment of backup systems;
  • clustering, upgrade of systems at infrastructure level.