╱╱╭╮╱╱╱╱╱╱╭━━━╮╱╱╱╭╮╱╭╮╱╱╱╱╱╱ ╱╱┃┃╱╱╱╱╱╱┃╭━╮┃╱╱╱┃┃╱┃┃╱╱╱╱╱╱ ╱╱┃┣━━┳━━╮┃┃╱┃┣━╮╱┃╰━╯┣━━┳━╮╱ ╭╮┃┃╭╮┃┃━┫┃╰━╯┃╭╮╮┃╭━╮┃╭╮┃╭╮╮ ┃╰╯┃╭╮┃┃━┫┃╭━╮┃┃┃┃┃┃╱┃┃╭╮┃┃┃┃ ╰━━┻╯╰┻━━╯╰╯╱╰┻╯╰╯╰╯╱╰┻╯╰┻╯╰╯

CS/정보 보안

[정보 보안] AI for Security (12)

재안안 2024. 6. 22. 19:49


[6] AI for Security

Anomaly Detection
- Find rare items or events which deviate significantly from the majority of the data.
- Define normal events.
- No need of data on faulty actions(behaviors).
- Point Anomaly
- Context Anomaly
- Collective Anomaly

@Point Anomaly
- A single instance of data is anomalous if it's too far off from the rest.

@Contextual Anomaly
- The abnormality is context specific.
- This type of anomaly is common in time-series data.

@Collective Anomaly
- A set of data instances collectively helps in detecting anomalies.

Algorithms for Anomaly Detection
- One Class Support Vector Machine
- Isolation Forest
- Local Outlier Factor

One Class Support Vector Machines
- Find a hyperplane that seperates all the data points from the origin.
- Fast
- Not suitable for high dimentional data
- clf = svm.OneClassSVM(nu = 0.01, kernal = "rbf", gamma = 0.00001)

Isolation Forest
- Use the depth of tree varying to data density.
- Usually, inlier data isolated in deep depth of decision tree.
- Otherwise, outlier data isolated in shallow depth of decision tree.
- Speed, explainablility
- Not suitable for high dimentional data
- Train decision trees of Random Forest until every point in dataset is isolated and calculate anomaly score.

@Isolation Forest Terms
- s(X, M) = 2^(- E[h(X)] / c(M))
- X = data point, M = # of sample data
- E[h(X)] = the average h(X) of all trees
- c(M) = M > 2 ? 2H(M - 1) - (2(M - 1)/N) : M == 2 ? 1 : 0
- N = # of test data

- If s is close to 1 then x is likely to be anomaly.
- If s is close to 0.5 then x is likely to be normal.

Local Outlier Factor
- Density-based approach
- Compare the density of a point and the average density of K-neighbors.
- Can find outlier in high density area
- Can handle gigh dimensional data
- Robust to data poisoning
- Slow
- Calculate local reachable densities of every point of dataset then calculate local outlier factor

@Local Outlier Factor Terms
- d(A, B) : distance between point A and B
- k-distance(A) : maximum distance of  A's K-nearest neighbors
- n_k(A) : All K-nearest neighbors of point A
- reachdist_k(A, B) : max(k-distance(A), d(A, B))

- lrd_k(P) = N_k(P).length / sum(N_k(P).map(P' => reachdist_k(P, P')))
- P 주변의 k_neighbors와의 reach_dist의 평균의 역수

- LOF_k(P) = sum(N_k(P).map(P' => lrd_k(P') / lrd_k(P))) / N_k(P).length
- n_k(P)에 속하는 모든 다른 점 P'에 대한 lrd의 비율들의 평균

- If LOF_k(P) is close to 1 then neighbor
- If LOF_k(P) < 1 then inliner (Higher density)
- If LOF_k(P) > 1 then outlier (Lower density)

Instrusion Detection System
- A device or softwae application that monitors a network or a system for malicious activity or policy violations.
- Network-based intrusion detection system(NIDS)
- Host-based intrusion detection system(HIDS)
- IDS types
- 1. Signature-based system
- Look for specific patterns defined as malicious actions by rule.
- 2. Anomaly-based system
- Look for different activities than trustworthy activities to administrators.

HIDS with Anomaly Detection
- Find anomaly actions which is different from normal activities and alert such activities to administrators.
- Asynchronous anomaly detection phase