[6] AI for Security
Anomaly Detection
- Find rare items or events which deviate significantly from the majority of the data.
- Define normal events.
- No need of data on faulty actions(behaviors).
- Point Anomaly
- Context Anomaly
- Collective Anomaly
@Point Anomaly
- A single instance of data is anomalous if it's too far off from the rest.
@Contextual Anomaly
- The abnormality is context specific.
- This type of anomaly is common in time-series data.
@Collective Anomaly
- A set of data instances collectively helps in detecting anomalies.
Algorithms for Anomaly Detection
- One Class Support Vector Machine
- Isolation Forest
- Local Outlier Factor
One Class Support Vector Machines
- Find a hyperplane that seperates all the data points from the origin.
- Fast
- Not suitable for high dimentional data
- clf = svm.OneClassSVM(nu = 0.01, kernal = "rbf", gamma = 0.00001)
Isolation Forest
- Use the depth of tree varying to data density.
- Usually, inlier data isolated in deep depth of decision tree.
- Otherwise, outlier data isolated in shallow depth of decision tree.
- Speed, explainablility
- Not suitable for high dimentional data
- Train decision trees of Random Forest until every point in dataset is isolated and calculate anomaly score.
@Isolation Forest Terms
- s(X, M) = 2^(- E[h(X)] / c(M))
- X = data point, M = # of sample data
- E[h(X)] = the average h(X) of all trees
- c(M) = M > 2 ? 2H(M - 1) - (2(M - 1)/N) : M == 2 ? 1 : 0
- N = # of test data
- If s is close to 1 then x is likely to be anomaly.
- If s is close to 0.5 then x is likely to be normal.
Local Outlier Factor
- Density-based approach
- Compare the density of a point and the average density of K-neighbors.
- Can find outlier in high density area
- Can handle gigh dimensional data
- Robust to data poisoning
- Slow
- Calculate local reachable densities of every point of dataset then calculate local outlier factor
@Local Outlier Factor Terms
- d(A, B) : distance between point A and B
- k-distance(A) : maximum distance of A's K-nearest neighbors
- n_k(A) : All K-nearest neighbors of point A
- reachdist_k(A, B) : max(k-distance(A), d(A, B))
- lrd_k(P) = N_k(P).length / sum(N_k(P).map(P' => reachdist_k(P, P')))
- P 주변의 k_neighbors와의 reach_dist의 평균의 역수
- LOF_k(P) = sum(N_k(P).map(P' => lrd_k(P') / lrd_k(P))) / N_k(P).length
- n_k(P)에 속하는 모든 다른 점 P'에 대한 lrd의 비율들의 평균
- If LOF_k(P) is close to 1 then neighbor
- If LOF_k(P) < 1 then inliner (Higher density)
- If LOF_k(P) > 1 then outlier (Lower density)
Instrusion Detection System
- A device or softwae application that monitors a network or a system for malicious activity or policy violations.
- Network-based intrusion detection system(NIDS)
- Host-based intrusion detection system(HIDS)
- IDS types
- 1. Signature-based system
- Look for specific patterns defined as malicious actions by rule.
- 2. Anomaly-based system
- Look for different activities than trustworthy activities to administrators.
HIDS with Anomaly Detection
- Find anomaly actions which is different from normal activities and alert such activities to administrators.
- Asynchronous anomaly detection phase
'CS > 정보 보안' 카테고리의 다른 글
[정보 보안] AI Security (12) (0) | 2024.06.22 |
---|---|
[정보 보안] Business Chances of Blockchain (11) (0) | 2024.06.22 |
[정보 보안] Symmetric Encryption and Message Confidentiality (10) (0) | 2024.06.22 |
[정보 보안] Symmetric Encryption Message Confidentiality (9) (0) | 2024.06.22 |
[정보 보안] Classic Encryption Techniques (8) (0) | 2024.06.22 |