SCOM alerts

Razni alerti koji se javljaju pod SCOM-om, i šta raditi po tim pitanjima.

1. Max Concurrent API reached in Server XYZ

MaxConcurrentApi is a registry key which specifies the maximum number of simultaneous, logon-related, application programming interface (API) calls that can be transmitted across a secure channel at any one time.
Windows Authentication, Exchange, SharePoint + LOB outages due to the low default value for MaxConcurrentAPI, which is a ceiling for the maximum NTLM or Kerberos PAC password validations a server can take care of at a time (link1).
Solution1 : Raise the MaxConcurrentApi registry value on the server or servers which are seeing the issue.
Soulution2 : U zavisnosti od verzije SCOM-a i njegovog update-a, ovo može biti lažna uzbuna (link2)

2. NTFS – Delayed Write Lost
Ako se ovo javlja za VM-ove (virtuelme mašine) koje su na VMWare-u, ovo se može ignorisati. U pitanju je VMWare bug (link).

3. SQL server “Stolen Server Memory”
Prvo šta je to :
Stolen memory describes buffers that are in use for sorting or for hashing operations (query workspace memory), or for those buffers that are being used as a generic memory store for allocations to store internal data structures such as locks, transaction context, and connection information. The lazywriter process is not permitted to flush Stolen buffers out of the buffer pool.
The memory is usually taken from Buffer Pool. If you run DBCC MEMORYSTATUS and the output shows you high Stolen Pages, this means that some process is stealing memory from buffer pool more that what is necessary and you need to find that process.
Rešenje1 : ako se ne ponavlja stalno, ovo je u rangu upozorenja.

4. A process serving application pool ‘DefaultAppPool’ failed to respond to a ping. The process id was ‘3792’.
Otići u IIS Manager, kliknuti na “Application Pools”, i sa desne strane videti koji im je status (link) :

Šta još može da se proveri :
a) Količina slobodnog prostora na diskovima
b) Zauzeće CPU+RAM za dati vremenski interval
Ovakav problem se uglavnom reši sam (IIS sam restartuje dati AppPool), ali treba proveriti.

5. The transaction log for database ‘XYZW’ is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases
Zbog čega se ovo dešava je odlično objašnjeno ovde.
*****
After the transaction is committed and after the data pages are preserved on disk, there is no need for SQL Server to hold on to the transaction log data anymore.
BUT
If you have your database set to recovery mode FULL, SQL Server does not reuse any part of the log file until it is backed up with a transaction log backup.
*****
Šta sve treba proveriti :
5a. Da li još ima mesta na disku na kom je sama baza, i na kom su logovi (ne mora biti isti disk)
5b. Kao što i sam opis greške kaže : log_reuse_wait_desc column iz sys.databases, što se radi kroz SQL upit postavljen kroz MS SQL SMS :
SELECT name,log_reuse_wait_desc FROM sys.databases;
The log_reuse_wait_desc column contains the reason why the SQL Server currently can’t reuse the log file of that database.
log_reuse_wait_desc : nvarchar(60) : Description of reuse of transaction log space is currently waiting on as of the last checkpoint.
Problem je što ovaj upit tranje letnji dan do podne, pa je jednostavnije pogledati direktno :
System Databases/master/Views/System Views/sys.databases/desni klik/”Select Top 1000 Rows”, i tu je i stavka “log_reuse_wait_desc” :

Svaki od ovih upita je vezan za po jednu bazu (prvo u spisku su sistemske baze, pa onda korisničke).
Evo linka ka značenju pojedinačnih kolona.
Zgodniji oblik upita :
SELECT TOP 1000 [name]
,[log_reuse_wait_desc]
FROM [master].[sys].[databases]

Comments are closed.