Anamoly Detection
Anamoly Detection¶
Anomaly detection is the identification of rare datapoints which differ significantly from the majority of the data.
Supervised view: anomalies are what some user labels as anomalies.
Unsupervised view: anomalies are outliers (points of low probability) in the data.
Sklearn Docs on anamoly detection.
Before discussing the algorithms for outliers detections, let's first define a couple of terms.
Outlier Detection
: Alorithms try to fit the region where there is high concentration of training examples. Anything outside the fitted boundary is an outlier. No labels are needed. For example, SVM is not a very good outlier detector in this context since it's very sensitive to outliers- outliers can easily distort its decision boundary.
Novelty Detection
: In this case, we're making the assumption that training set does not contain a lot of outliers and we'll detect whether a new observation is an outlier (also called novelty here). Here, however, SVM is a good novelty detector.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.font_manager
from sklearn import svm
from sklearn.ensemble import IsolationForest
# Useful in beautifying numpy arrays.
from IPython.display import HTML, display
import tabulate
def pp(a, show_head=True):
'''
args: show_head -> if True print only first 5 rows.
return: None
'''
if a.ndim < 2:
a = [a]
if show_head:
display(HTML(tabulate.tabulate(a[:5], tablefmt='html')))
return
display(HTML(tabulate.tabulate(a, tablefmt='html')))
Aside: Plotting Contours (Decision Boundaries)
¶
numpy.meshgrid is a very useful function to plot contours.
Assume, that we have a function which takes a variable x (a scaler) as input and returns a scaler y. We can easily plot it to see how function (y) varies as x varies on a 2d plot; If the relationship is linear, we'll see a straight line; In the ML context, this straight line can act as a decision boundary; any thing above is positive class and below is negative class. However, if a function takes two variables (say, x and y) as input and returns a third variable z, we cannot easily plot how function behaves as both x and y vary. To this end, we use contours to plot the function on 2d surface- you've see contour lines before in a Mathematics class. Along a contour line, the value of function is same. In the ML context, say your features are 2d (x, y), then the classifier outputs a z given both x and y. The plotted contour curve acts as decision boundary- anything inside the curve is positive class and outside is negative class- analogous to straight line boundary.
How can we plot contour?
We can use plt.contour to plot a contour. We'll pass it as input many closely spaced possible combinations of input x and y for which we can use numpy.meshgrid with numpy.linspace.
coordinates_vector1 = np.linspace(-7, 7, 150)
print('coordinates_vector1:')
pp(coordinates_vector1)
coordinates_vector2 = np.linspace(-7, 7, 150)
print('coordinates_vector2:')
pp(coordinates_vector2)
coordinates_vector1:
-7 | -6.90604 | -6.81208 | -6.71812 | -6.62416 | -6.5302 | -6.43624 | -6.34228 | -6.24832 | -6.15436 | -6.0604 | -5.96644 | -5.87248 | -5.77852 | -5.68456 | -5.5906 | -5.49664 | -5.40268 | -5.30872 | -5.21477 | -5.12081 | -5.02685 | -4.93289 | -4.83893 | -4.74497 | -4.65101 | -4.55705 | -4.46309 | -4.36913 | -4.27517 | -4.18121 | -4.08725 | -3.99329 | -3.89933 | -3.80537 | -3.71141 | -3.61745 | -3.52349 | -3.42953 | -3.33557 | -3.24161 | -3.14765 | -3.05369 | -2.95973 | -2.86577 | -2.77181 | -2.67785 | -2.58389 | -2.48993 | -2.39597 | -2.30201 | -2.20805 | -2.11409 | -2.02013 | -1.92617 | -1.83221 | -1.73826 | -1.6443 | -1.55034 | -1.45638 | -1.36242 | -1.26846 | -1.1745 | -1.08054 | -0.986577 | -0.892617 | -0.798658 | -0.704698 | -0.610738 | -0.516779 | -0.422819 | -0.328859 | -0.234899 | -0.14094 | -0.0469799 | 0.0469799 | 0.14094 | 0.234899 | 0.328859 | 0.422819 | 0.516779 | 0.610738 | 0.704698 | 0.798658 | 0.892617 | 0.986577 | 1.08054 | 1.1745 | 1.26846 | 1.36242 | 1.45638 | 1.55034 | 1.6443 | 1.73826 | 1.83221 | 1.92617 | 2.02013 | 2.11409 | 2.20805 | 2.30201 | 2.39597 | 2.48993 | 2.58389 | 2.67785 | 2.77181 | 2.86577 | 2.95973 | 3.05369 | 3.14765 | 3.24161 | 3.33557 | 3.42953 | 3.52349 | 3.61745 | 3.71141 | 3.80537 | 3.89933 | 3.99329 | 4.08725 | 4.18121 | 4.27517 | 4.36913 | 4.46309 | 4.55705 | 4.65101 | 4.74497 | 4.83893 | 4.93289 | 5.02685 | 5.12081 | 5.21477 | 5.30872 | 5.40268 | 5.49664 | 5.5906 | 5.68456 | 5.77852 | 5.87248 | 5.96644 | 6.0604 | 6.15436 | 6.24832 | 6.34228 | 6.43624 | 6.5302 | 6.62416 | 6.71812 | 6.81208 | 6.90604 | 7 |
coordinates_vector2:
-7 | -6.90604 | -6.81208 | -6.71812 | -6.62416 | -6.5302 | -6.43624 | -6.34228 | -6.24832 | -6.15436 | -6.0604 | -5.96644 | -5.87248 | -5.77852 | -5.68456 | -5.5906 | -5.49664 | -5.40268 | -5.30872 | -5.21477 | -5.12081 | -5.02685 | -4.93289 | -4.83893 | -4.74497 | -4.65101 | -4.55705 | -4.46309 | -4.36913 | -4.27517 | -4.18121 | -4.08725 | -3.99329 | -3.89933 | -3.80537 | -3.71141 | -3.61745 | -3.52349 | -3.42953 | -3.33557 | -3.24161 | -3.14765 | -3.05369 | -2.95973 | -2.86577 | -2.77181 | -2.67785 | -2.58389 | -2.48993 | -2.39597 | -2.30201 | -2.20805 | -2.11409 | -2.02013 | -1.92617 | -1.83221 | -1.73826 | -1.6443 | -1.55034 | -1.45638 | -1.36242 | -1.26846 | -1.1745 | -1.08054 | -0.986577 | -0.892617 | -0.798658 | -0.704698 | -0.610738 | -0.516779 | -0.422819 | -0.328859 | -0.234899 | -0.14094 | -0.0469799 | 0.0469799 | 0.14094 | 0.234899 | 0.328859 | 0.422819 | 0.516779 | 0.610738 | 0.704698 | 0.798658 | 0.892617 | 0.986577 | 1.08054 | 1.1745 | 1.26846 | 1.36242 | 1.45638 | 1.55034 | 1.6443 | 1.73826 | 1.83221 | 1.92617 | 2.02013 | 2.11409 | 2.20805 | 2.30201 | 2.39597 | 2.48993 | 2.58389 | 2.67785 | 2.77181 | 2.86577 | 2.95973 | 3.05369 | 3.14765 | 3.24161 | 3.33557 | 3.42953 | 3.52349 | 3.61745 | 3.71141 | 3.80537 | 3.89933 | 3.99329 | 4.08725 | 4.18121 | 4.27517 | 4.36913 | 4.46309 | 4.55705 | 4.65101 | 4.74497 | 4.83893 | 4.93289 | 5.02685 | 5.12081 | 5.21477 | 5.30872 | 5.40268 | 5.49664 | 5.5906 | 5.68456 | 5.77852 | 5.87248 | 5.96644 | 6.0604 | 6.15436 | 6.24832 | 6.34228 | 6.43624 | 6.5302 | 6.62416 | 6.71812 | 6.81208 | 6.90604 | 7 |
# make a grid using the coordinates vectors.
xx, yy = np.meshgrid(coordinates_vector1, coordinates_vector2)
print('xx- elmenents of vector x are repeated along first dimension i.e. each\
row has the same element.')
print('xx:')
pp(xx)
print('yy- elmenents of vector y are repeated along second dimension i.e. each\
col has the same elements.')
print('yy:')
pp(yy)
xx- elmenents of vector x are repeated along first dimension i.e. eachrow has the same element. xx:
-7 | -6.90604 | -6.81208 | -6.71812 | -6.62416 | -6.5302 | -6.43624 | -6.34228 | -6.24832 | -6.15436 | -6.0604 | -5.96644 | -5.87248 | -5.77852 | -5.68456 | -5.5906 | -5.49664 | -5.40268 | -5.30872 | -5.21477 | -5.12081 | -5.02685 | -4.93289 | -4.83893 | -4.74497 | -4.65101 | -4.55705 | -4.46309 | -4.36913 | -4.27517 | -4.18121 | -4.08725 | -3.99329 | -3.89933 | -3.80537 | -3.71141 | -3.61745 | -3.52349 | -3.42953 | -3.33557 | -3.24161 | -3.14765 | -3.05369 | -2.95973 | -2.86577 | -2.77181 | -2.67785 | -2.58389 | -2.48993 | -2.39597 | -2.30201 | -2.20805 | -2.11409 | -2.02013 | -1.92617 | -1.83221 | -1.73826 | -1.6443 | -1.55034 | -1.45638 | -1.36242 | -1.26846 | -1.1745 | -1.08054 | -0.986577 | -0.892617 | -0.798658 | -0.704698 | -0.610738 | -0.516779 | -0.422819 | -0.328859 | -0.234899 | -0.14094 | -0.0469799 | 0.0469799 | 0.14094 | 0.234899 | 0.328859 | 0.422819 | 0.516779 | 0.610738 | 0.704698 | 0.798658 | 0.892617 | 0.986577 | 1.08054 | 1.1745 | 1.26846 | 1.36242 | 1.45638 | 1.55034 | 1.6443 | 1.73826 | 1.83221 | 1.92617 | 2.02013 | 2.11409 | 2.20805 | 2.30201 | 2.39597 | 2.48993 | 2.58389 | 2.67785 | 2.77181 | 2.86577 | 2.95973 | 3.05369 | 3.14765 | 3.24161 | 3.33557 | 3.42953 | 3.52349 | 3.61745 | 3.71141 | 3.80537 | 3.89933 | 3.99329 | 4.08725 | 4.18121 | 4.27517 | 4.36913 | 4.46309 | 4.55705 | 4.65101 | 4.74497 | 4.83893 | 4.93289 | 5.02685 | 5.12081 | 5.21477 | 5.30872 | 5.40268 | 5.49664 | 5.5906 | 5.68456 | 5.77852 | 5.87248 | 5.96644 | 6.0604 | 6.15436 | 6.24832 | 6.34228 | 6.43624 | 6.5302 | 6.62416 | 6.71812 | 6.81208 | 6.90604 | 7 |
-7 | -6.90604 | -6.81208 | -6.71812 | -6.62416 | -6.5302 | -6.43624 | -6.34228 | -6.24832 | -6.15436 | -6.0604 | -5.96644 | -5.87248 | -5.77852 | -5.68456 | -5.5906 | -5.49664 | -5.40268 | -5.30872 | -5.21477 | -5.12081 | -5.02685 | -4.93289 | -4.83893 | -4.74497 | -4.65101 | -4.55705 | -4.46309 | -4.36913 | -4.27517 | -4.18121 | -4.08725 | -3.99329 | -3.89933 | -3.80537 | -3.71141 | -3.61745 | -3.52349 | -3.42953 | -3.33557 | -3.24161 | -3.14765 | -3.05369 | -2.95973 | -2.86577 | -2.77181 | -2.67785 | -2.58389 | -2.48993 | -2.39597 | -2.30201 | -2.20805 | -2.11409 | -2.02013 | -1.92617 | -1.83221 | -1.73826 | -1.6443 | -1.55034 | -1.45638 | -1.36242 | -1.26846 | -1.1745 | -1.08054 | -0.986577 | -0.892617 | -0.798658 | -0.704698 | -0.610738 | -0.516779 | -0.422819 | -0.328859 | -0.234899 | -0.14094 | -0.0469799 | 0.0469799 | 0.14094 | 0.234899 | 0.328859 | 0.422819 | 0.516779 | 0.610738 | 0.704698 | 0.798658 | 0.892617 | 0.986577 | 1.08054 | 1.1745 | 1.26846 | 1.36242 | 1.45638 | 1.55034 | 1.6443 | 1.73826 | 1.83221 | 1.92617 | 2.02013 | 2.11409 | 2.20805 | 2.30201 | 2.39597 | 2.48993 | 2.58389 | 2.67785 | 2.77181 | 2.86577 | 2.95973 | 3.05369 | 3.14765 | 3.24161 | 3.33557 | 3.42953 | 3.52349 | 3.61745 | 3.71141 | 3.80537 | 3.89933 | 3.99329 | 4.08725 | 4.18121 | 4.27517 | 4.36913 | 4.46309 | 4.55705 | 4.65101 | 4.74497 | 4.83893 | 4.93289 | 5.02685 | 5.12081 | 5.21477 | 5.30872 | 5.40268 | 5.49664 | 5.5906 | 5.68456 | 5.77852 | 5.87248 | 5.96644 | 6.0604 | 6.15436 | 6.24832 | 6.34228 | 6.43624 | 6.5302 | 6.62416 | 6.71812 | 6.81208 | 6.90604 | 7 |
-7 | -6.90604 | -6.81208 | -6.71812 | -6.62416 | -6.5302 | -6.43624 | -6.34228 | -6.24832 | -6.15436 | -6.0604 | -5.96644 | -5.87248 | -5.77852 | -5.68456 | -5.5906 | -5.49664 | -5.40268 | -5.30872 | -5.21477 | -5.12081 | -5.02685 | -4.93289 | -4.83893 | -4.74497 | -4.65101 | -4.55705 | -4.46309 | -4.36913 | -4.27517 | -4.18121 | -4.08725 | -3.99329 | -3.89933 | -3.80537 | -3.71141 | -3.61745 | -3.52349 | -3.42953 | -3.33557 | -3.24161 | -3.14765 | -3.05369 | -2.95973 | -2.86577 | -2.77181 | -2.67785 | -2.58389 | -2.48993 | -2.39597 | -2.30201 | -2.20805 | -2.11409 | -2.02013 | -1.92617 | -1.83221 | -1.73826 | -1.6443 | -1.55034 | -1.45638 | -1.36242 | -1.26846 | -1.1745 | -1.08054 | -0.986577 | -0.892617 | -0.798658 | -0.704698 | -0.610738 | -0.516779 | -0.422819 | -0.328859 | -0.234899 | -0.14094 | -0.0469799 | 0.0469799 | 0.14094 | 0.234899 | 0.328859 | 0.422819 | 0.516779 | 0.610738 | 0.704698 | 0.798658 | 0.892617 | 0.986577 | 1.08054 | 1.1745 | 1.26846 | 1.36242 | 1.45638 | 1.55034 | 1.6443 | 1.73826 | 1.83221 | 1.92617 | 2.02013 | 2.11409 | 2.20805 | 2.30201 | 2.39597 | 2.48993 | 2.58389 | 2.67785 | 2.77181 | 2.86577 | 2.95973 | 3.05369 | 3.14765 | 3.24161 | 3.33557 | 3.42953 | 3.52349 | 3.61745 | 3.71141 | 3.80537 | 3.89933 | 3.99329 | 4.08725 | 4.18121 | 4.27517 | 4.36913 | 4.46309 | 4.55705 | 4.65101 | 4.74497 | 4.83893 | 4.93289 | 5.02685 | 5.12081 | 5.21477 | 5.30872 | 5.40268 | 5.49664 | 5.5906 | 5.68456 | 5.77852 | 5.87248 | 5.96644 | 6.0604 | 6.15436 | 6.24832 | 6.34228 | 6.43624 | 6.5302 | 6.62416 | 6.71812 | 6.81208 | 6.90604 | 7 |
-7 | -6.90604 | -6.81208 | -6.71812 | -6.62416 | -6.5302 | -6.43624 | -6.34228 | -6.24832 | -6.15436 | -6.0604 | -5.96644 | -5.87248 | -5.77852 | -5.68456 | -5.5906 | -5.49664 | -5.40268 | -5.30872 | -5.21477 | -5.12081 | -5.02685 | -4.93289 | -4.83893 | -4.74497 | -4.65101 | -4.55705 | -4.46309 | -4.36913 | -4.27517 | -4.18121 | -4.08725 | -3.99329 | -3.89933 | -3.80537 | -3.71141 | -3.61745 | -3.52349 | -3.42953 | -3.33557 | -3.24161 | -3.14765 | -3.05369 | -2.95973 | -2.86577 | -2.77181 | -2.67785 | -2.58389 | -2.48993 | -2.39597 | -2.30201 | -2.20805 | -2.11409 | -2.02013 | -1.92617 | -1.83221 | -1.73826 | -1.6443 | -1.55034 | -1.45638 | -1.36242 | -1.26846 | -1.1745 | -1.08054 | -0.986577 | -0.892617 | -0.798658 | -0.704698 | -0.610738 | -0.516779 | -0.422819 | -0.328859 | -0.234899 | -0.14094 | -0.0469799 | 0.0469799 | 0.14094 | 0.234899 | 0.328859 | 0.422819 | 0.516779 | 0.610738 | 0.704698 | 0.798658 | 0.892617 | 0.986577 | 1.08054 | 1.1745 | 1.26846 | 1.36242 | 1.45638 | 1.55034 | 1.6443 | 1.73826 | 1.83221 | 1.92617 | 2.02013 | 2.11409 | 2.20805 | 2.30201 | 2.39597 | 2.48993 | 2.58389 | 2.67785 | 2.77181 | 2.86577 | 2.95973 | 3.05369 | 3.14765 | 3.24161 | 3.33557 | 3.42953 | 3.52349 | 3.61745 | 3.71141 | 3.80537 | 3.89933 | 3.99329 | 4.08725 | 4.18121 | 4.27517 | 4.36913 | 4.46309 | 4.55705 | 4.65101 | 4.74497 | 4.83893 | 4.93289 | 5.02685 | 5.12081 | 5.21477 | 5.30872 | 5.40268 | 5.49664 | 5.5906 | 5.68456 | 5.77852 | 5.87248 | 5.96644 | 6.0604 | 6.15436 | 6.24832 | 6.34228 | 6.43624 | 6.5302 | 6.62416 | 6.71812 | 6.81208 | 6.90604 | 7 |
-7 | -6.90604 | -6.81208 | -6.71812 | -6.62416 | -6.5302 | -6.43624 | -6.34228 | -6.24832 | -6.15436 | -6.0604 | -5.96644 | -5.87248 | -5.77852 | -5.68456 | -5.5906 | -5.49664 | -5.40268 | -5.30872 | -5.21477 | -5.12081 | -5.02685 | -4.93289 | -4.83893 | -4.74497 | -4.65101 | -4.55705 | -4.46309 | -4.36913 | -4.27517 | -4.18121 | -4.08725 | -3.99329 | -3.89933 | -3.80537 | -3.71141 | -3.61745 | -3.52349 | -3.42953 | -3.33557 | -3.24161 | -3.14765 | -3.05369 | -2.95973 | -2.86577 | -2.77181 | -2.67785 | -2.58389 | -2.48993 | -2.39597 | -2.30201 | -2.20805 | -2.11409 | -2.02013 | -1.92617 | -1.83221 | -1.73826 | -1.6443 | -1.55034 | -1.45638 | -1.36242 | -1.26846 | -1.1745 | -1.08054 | -0.986577 | -0.892617 | -0.798658 | -0.704698 | -0.610738 | -0.516779 | -0.422819 | -0.328859 | -0.234899 | -0.14094 | -0.0469799 | 0.0469799 | 0.14094 | 0.234899 | 0.328859 | 0.422819 | 0.516779 | 0.610738 | 0.704698 | 0.798658 | 0.892617 | 0.986577 | 1.08054 | 1.1745 | 1.26846 | 1.36242 | 1.45638 | 1.55034 | 1.6443 | 1.73826 | 1.83221 | 1.92617 | 2.02013 | 2.11409 | 2.20805 | 2.30201 | 2.39597 | 2.48993 | 2.58389 | 2.67785 | 2.77181 | 2.86577 | 2.95973 | 3.05369 | 3.14765 | 3.24161 | 3.33557 | 3.42953 | 3.52349 | 3.61745 | 3.71141 | 3.80537 | 3.89933 | 3.99329 | 4.08725 | 4.18121 | 4.27517 | 4.36913 | 4.46309 | 4.55705 | 4.65101 | 4.74497 | 4.83893 | 4.93289 | 5.02685 | 5.12081 | 5.21477 | 5.30872 | 5.40268 | 5.49664 | 5.5906 | 5.68456 | 5.77852 | 5.87248 | 5.96644 | 6.0604 | 6.15436 | 6.24832 | 6.34228 | 6.43624 | 6.5302 | 6.62416 | 6.71812 | 6.81208 | 6.90604 | 7 |
yy- elmenents of vector y are repeated along second dimension i.e. each col has the same elements. yy:
-7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 | -7 |
-6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 | -6.90604 |
-6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 | -6.81208 |
-6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 | -6.71812 |
-6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 | -6.62416 |
xx and yy are possible values we need at which to evaluate our function to plot a good visual contour. Instead of SVM or logistic Regression, let's plot contour for some arbirary function which takes x, and y as input:
z = np.sin(xx**2 + yy**2) / (xx**2 + yy**2)
pp(z)
-0.00585084 | 0.00663063 | 0.00959562 | -0.0012151 | -0.0105466 | -0.00558528 | 0.00696217 | 0.0106705 | 0.000872991 | -0.0101972 | -0.00917689 | 0.00263217 | 0.011652 | 0.00791074 | -0.00441437 | -0.0123309 | -0.00787187 | 0.0043953 | 0.012662 | 0.00937412 | -0.00232211 | -0.012172 | -0.0120056 | -0.00218064 | 0.00944852 | 0.0141373 | 0.00869312 | -0.00282418 | -0.0125587 | -0.0143365 | -0.00733054 | 0.0040279 | 0.013177 | 0.0152712 | 0.00952059 | -0.000912185 | -0.0109307 | -0.0160902 | -0.0144479 | -0.0070204 | 0.00304519 | 0.0119694 | 0.0167843 | 0.0162065 | 0.0107713 | 0.00236176 | -0.00656662 | -0.0137531 | -0.0176719 | -0.0177565 | -0.0143363 | -0.00838011 | -0.00116474 | 0.00603573 | 0.0121772 | 0.0165681 | 0.0188939 | 0.0191713 | 0.0176607 | 0.0147675 | 0.0109512 | 0.00665463 | 0.00225714 | -0.00194879 | -0.00576327 | -0.00907171 | -0.0118302 | -0.0140481 | -0.0157704 | -0.0170625 | -0.0179976 | -0.0186473 | -0.0190749 | -0.0193308 | -0.0194499 | -0.0194499 | -0.0193308 | -0.0190749 | -0.0186473 | -0.0179976 | -0.0170625 | -0.0157704 | -0.0140481 | -0.0118302 | -0.00907171 | -0.00576327 | -0.00194879 | 0.00225714 | 0.00665463 | 0.0109512 | 0.0147675 | 0.0176607 | 0.0191713 | 0.0188939 | 0.0165681 | 0.0121772 | 0.00603573 | -0.00116474 | -0.00838011 | -0.0143363 | -0.0177565 | -0.0176719 | -0.0137531 | -0.00656662 | 0.00236176 | 0.0107713 | 0.0162065 | 0.0167843 | 0.0119694 | 0.00304519 | -0.0070204 | -0.0144479 | -0.0160902 | -0.0109307 | -0.000912185 | 0.00952059 | 0.0152712 | 0.013177 | 0.0040279 | -0.00733054 | -0.0143365 | -0.0125587 | -0.00282418 | 0.00869312 | 0.0141373 | 0.00944852 | -0.00218064 | -0.0120056 | -0.012172 | -0.00232211 | 0.00937412 | 0.012662 | 0.0043953 | -0.00787187 | -0.0123309 | -0.00441437 | 0.00791074 | 0.011652 | 0.00263217 | -0.00917689 | -0.0101972 | 0.000872991 | 0.0106705 | 0.00696217 | -0.00558528 | -0.0105466 | -0.0012151 | 0.00959562 | 0.00663063 | -0.00585084 |
0.00663063 | 0.00952142 | -0.00158755 | -0.0106525 | -0.00491363 | 0.00770015 | 0.0102604 | -0.00053056 | -0.0108651 | -0.0079365 | 0.00462508 | 0.0119966 | 0.00581297 | -0.00692981 | -0.0124326 | -0.00503074 | 0.00759665 | 0.0129581 | 0.0060594 | -0.00653115 | -0.0134758 | -0.00889152 | 0.0031635 | 0.0128249 | 0.0126487 | 0.00301005 | -0.00889766 | -0.0147557 | -0.0108374 | 4.5383e-05 | 0.0109803 | 0.0155281 | 0.0113582 | 0.00105339 | -0.00983109 | -0.0159217 | -0.0145806 | -0.00675508 | 0.00391941 | 0.0130037 | 0.017124 | 0.0150801 | 0.00793897 | -0.00168494 | -0.0107198 | -0.0166435 | -0.0181093 | -0.0150967 | -0.00866837 | -0.000505317 | 0.00759834 | 0.014142 | 0.0181579 | 0.0192775 | 0.0176607 | 0.0138445 | 0.00856674 | 0.00260406 | -0.00334857 | -0.00875087 | -0.0132454 | -0.0166517 | -0.0189382 | -0.020184 | -0.0205399 | -0.0201929 | -0.019339 | -0.018164 | -0.0168316 | -0.0154783 | -0.0142124 | -0.013116 | -0.0122483 | -0.0116495 | -0.0113442 | -0.0113442 | -0.0116495 | -0.0122483 | -0.013116 | -0.0142124 | -0.0154783 | -0.0168316 | -0.018164 | -0.019339 | -0.0201929 | -0.0205399 | -0.020184 | -0.0189382 | -0.0166517 | -0.0132454 | -0.00875087 | -0.00334857 | 0.00260406 | 0.00856674 | 0.0138445 | 0.0176607 | 0.0192775 | 0.0181579 | 0.014142 | 0.00759834 | -0.000505317 | -0.00866837 | -0.0150967 | -0.0181093 | -0.0166435 | -0.0107198 | -0.00168494 | 0.00793897 | 0.0150801 | 0.017124 | 0.0130037 | 0.00391941 | -0.00675508 | -0.0145806 | -0.0159217 | -0.00983109 | 0.00105339 | 0.0113582 | 0.0155281 | 0.0109803 | 4.5383e-05 | -0.0108374 | -0.0147557 | -0.00889766 | 0.00301005 | 0.0126487 | 0.0128249 | 0.0031635 | -0.00889152 | -0.0134758 | -0.00653115 | 0.0060594 | 0.0129581 | 0.00759665 | -0.00503074 | -0.0124326 | -0.00692981 | 0.00581297 | 0.0119966 | 0.00462508 | -0.0079365 | -0.0108651 | -0.00053056 | 0.0102604 | 0.00770015 | -0.00491363 | -0.0106525 | -0.00158755 | 0.00952142 | 0.00663063 |
0.00959562 | -0.00158755 | -0.0106812 | -0.00456801 | 0.00811517 | 0.00992212 | -0.0015318 | -0.0112257 | -0.00682828 | 0.00612453 | 0.0119435 | 0.0038531 | -0.00878957 | -0.011916 | -0.00226042 | 0.0099289 | 0.0122539 | 0.0025661 | -0.00972193 | -0.0132319 | -0.00498495 | 0.00768123 | 0.0141176 | 0.00927728 | -0.00282418 | -0.0129543 | -0.013863 | -0.0051721 | 0.00704707 | 0.0148632 | 0.0136169 | 0.00435133 | -0.0074212 | -0.0152575 | -0.0152783 | -0.00780196 | 0.0033375 | 0.0130037 | 0.0171734 | 0.0144532 | 0.00625242 | -0.0041611 | -0.0131097 | -0.017813 | -0.0171306 | -0.0116299 | -0.00312581 | 0.00603573 | 0.0136748 | 0.0182699 | 0.0191713 | 0.01656 | 0.011234 | 0.00432072 | -0.00299955 | -0.00970548 | -0.0150592 | -0.0186473 | -0.0203571 | -0.0203134 | -0.0187992 | -0.0161783 | -0.0128309 | -0.00910755 | -0.00530137 | -0.00163641 | 0.00173257 | 0.00471225 | 0.00725902 | 0.00936578 | 0.0110493 | 0.0123393 | 0.0132693 | 0.0138699 | 0.0141641 | 0.0141641 | 0.0138699 | 0.0132693 | 0.0123393 | 0.0110493 | 0.00936578 | 0.00725902 | 0.00471225 | 0.00173257 | -0.00163641 | -0.00530137 | -0.00910755 | -0.0128309 | -0.0161783 | -0.0187992 | -0.0203134 | -0.0203571 | -0.0186473 | -0.0150592 | -0.00970548 | -0.00299955 | 0.00432072 | 0.011234 | 0.01656 | 0.0191713 | 0.0182699 | 0.0136748 | 0.00603573 | -0.00312581 | -0.0116299 | -0.0171306 | -0.017813 | -0.0131097 | -0.0041611 | 0.00625242 | 0.0144532 | 0.0171734 | 0.0130037 | 0.0033375 | -0.00780196 | -0.0152783 | -0.0152575 | -0.0074212 | 0.00435133 | 0.0136169 | 0.0148632 | 0.00704707 | -0.0051721 | -0.013863 | -0.0129543 | -0.00282418 | 0.00927728 | 0.0141176 | 0.00768123 | -0.00498495 | -0.0132319 | -0.00972193 | 0.0025661 | 0.0122539 | 0.0099289 | -0.00226042 | -0.011916 | -0.00878957 | 0.0038531 | 0.0119435 | 0.00612453 | -0.00682828 | -0.0112257 | -0.0015318 | 0.00992212 | 0.00811517 | -0.00456801 | -0.0106812 | -0.00158755 | 0.00959562 |
-0.0012151 | -0.0106525 | -0.00456801 | 0.00824861 | 0.0097341 | -0.00212828 | -0.0113968 | -0.00596982 | 0.0071738 | 0.0116933 | 0.00218961 | -0.0100743 | -0.0110736 | 0.000195288 | 0.0114624 | 0.0109429 | -0.000689545 | -0.0118417 | -0.0118924 | -0.000971053 | 0.010985 | 0.013637 | 0.00495606 | -0.00783152 | -0.0147032 | -0.0106561 | 0.00111616 | 0.0122409 | 0.0153187 | 0.00862079 | -0.00342606 | -0.013547 | -0.016103 | -0.00999895 | 0.00127831 | 0.0119694 | 0.0170913 | 0.014617 | 0.00596337 | -0.00507408 | -0.0141614 | -0.0181323 | -0.0159383 | -0.00866837 | 0.00114801 | 0.0105494 | 0.0170652 | 0.0192946 | 0.0170644 | 0.011234 | 0.00329463 | -0.00507952 | -0.0124074 | -0.0176396 | -0.020248 | -0.0201929 | -0.0178114 | -0.0136731 | -0.00843924 | -0.00274938 | 0.0028551 | 0.00797044 | 0.0123393 | 0.0158381 | 0.0184518 | 0.0202448 | 0.0213312 | 0.0218512 | 0.0219508 | 0.0217694 | 0.0214305 | 0.0210379 | 0.0206736 | 0.020398 | 0.0202504 | 0.0202504 | 0.020398 | 0.0206736 | 0.0210379 | 0.0214305 | 0.0217694 | 0.0219508 | 0.0218512 | 0.0213312 | 0.0202448 | 0.0184518 | 0.0158381 | 0.0123393 | 0.00797044 | 0.0028551 | -0.00274938 | -0.00843924 | -0.0136731 | -0.0178114 | -0.0201929 | -0.020248 | -0.0176396 | -0.0124074 | -0.00507952 | 0.00329463 | 0.011234 | 0.0170644 | 0.0192946 | 0.0170652 | 0.0105494 | 0.00114801 | -0.00866837 | -0.0159383 | -0.0181323 | -0.0141614 | -0.00507408 | 0.00596337 | 0.014617 | 0.0170913 | 0.0119694 | 0.00127831 | -0.00999895 | -0.016103 | -0.013547 | -0.00342606 | 0.00862079 | 0.0153187 | 0.0122409 | 0.00111616 | -0.0106561 | -0.0147032 | -0.00783152 | 0.00495606 | 0.013637 | 0.010985 | -0.000971053 | -0.0118924 | -0.0118417 | -0.000689545 | 0.0109429 | 0.0114624 | 0.000195288 | -0.0110736 | -0.0100743 | 0.00218961 | 0.0116933 | 0.0071738 | -0.00596982 | -0.0113968 | -0.00212828 | 0.0097341 | 0.00824861 | -0.00456801 | -0.0106525 | -0.0012151 |
-0.0105466 | -0.00491363 | 0.00811517 | 0.0097341 | -0.00232602 | -0.0114612 | -0.00543111 | 0.00783074 | 0.0114044 | 0.000907697 | -0.0109026 | -0.0101469 | 0.00221228 | 0.0123565 | 0.00937412 | -0.00347124 | -0.0130248 | -0.00993016 | 0.00268292 | 0.0130271 | 0.0119032 | 0.000486521 | -0.0115033 | -0.0143998 | -0.0061521 | 0.00680499 | 0.0150123 | 0.0129011 | 0.00216781 | -0.0100572 | -0.0162138 | -0.0128763 | -0.00225969 | 0.00963999 | 0.0166402 | 0.0155001 | 0.00710645 | -0.00446666 | -0.0141614 | -0.018173 | -0.0152749 | -0.00689852 | 0.00378075 | 0.0131912 | 0.0185717 | 0.0186714 | 0.0138445 | 0.00566616 | -0.00369679 | -0.0121206 | -0.0179976 | -0.0204734 | -0.0194588 | -0.0154783 | -0.00943712 | -0.00237937 | 0.00471225 | 0.0110493 | 0.016101 | 0.0195988 | 0.0215013 | 0.0219377 | 0.0211461 | 0.019417 | 0.0170487 | 0.0143163 | 0.0114539 | 0.0086468 | 0.00603215 | 0.00370436 | 0.00172307 | 0.000121876 | -0.00108323 | -0.00188678 | -0.00228821 | -0.00228821 | -0.00188678 | -0.00108323 | 0.000121876 | 0.00172307 | 0.00370436 | 0.00603215 | 0.0086468 | 0.0114539 | 0.0143163 | 0.0170487 | 0.019417 | 0.0211461 | 0.0219377 | 0.0215013 | 0.0195988 | 0.016101 | 0.0110493 | 0.00471225 | -0.00237937 | -0.00943712 | -0.0154783 | -0.0194588 | -0.0204734 | -0.0179976 | -0.0121206 | -0.00369679 | 0.00566616 | 0.0138445 | 0.0186714 | 0.0185717 | 0.0131912 | 0.00378075 | -0.00689852 | -0.0152749 | -0.018173 | -0.0141614 | -0.00446666 | 0.00710645 | 0.0155001 | 0.0166402 | 0.00963999 | -0.00225969 | -0.0128763 | -0.0162138 | -0.0100572 | 0.00216781 | 0.0129011 | 0.0150123 | 0.00680499 | -0.0061521 | -0.0143998 | -0.0115033 | 0.000486521 | 0.0119032 | 0.0130271 | 0.00268292 | -0.00993016 | -0.0130248 | -0.00347124 | 0.00937412 | 0.0123565 | 0.00221228 | -0.0101469 | -0.0109026 | 0.000907697 | 0.0114044 | 0.00783074 | -0.00543111 | -0.0114612 | -0.00232602 | 0.0097341 | 0.00811517 | -0.00491363 | -0.0105466 |
plt.contour(xx, yy, z)
<matplotlib.contour.QuadContourSet at 0x7f8bef780e10>
For a given curve, z is same every where on it.
plt.contourf(xx, yy, z) # filled contours
<matplotlib.contour.QuadContourSet at 0x7f8bef9435c0>
OneClass SVM¶
A One-Class Support Vector Machine is an unsupervised learning algorithm (trained without any labels). It learns the boundaries of these points and is therefore able to classify any points that lie outside the boundary as, outliers.
Similar to normal SVM- maximizing-margin method, the idea of One Class SVM is to find a function that is positive for regions with high density of points, and negative for small densities
By One Class SVM or One Class SVC we'll mean the same thing.
Let's observe its decision boundaries (contours) using a radnom dataset.
X = 0.3 * np.random.randn(100, 2)
X_train = np.r_[X + 2, X - 2]
X_train.shape
(200, 2)
# Following our discussion on meshgrid above.
xx, yy = np.meshgrid(np.linspace(-5, 5, 500), np.linspace(-5, 5, 500))
Random Dataset Generation¶
rng = np.random.RandomState(42) # for reproducible results.
# Generate train data
X = 0.3 * rng.randn(100, 2)
X_train = np.r_[X + 2, X - 2]
# Generate some regular novel observations
X = 0.3 * rng.randn(20, 2)
X_test = np.r_[X + 2, X - 2]
# Generate some abnormal novel observations
X_outliers = rng.uniform(low=-4, high=4, size=(20, 2))
Fit SVM¶
# fit the model
clf = svm.OneClassSVM(nu=0.1, kernel="rbf", gamma=0.1)
clf.fit(X_train)
OneClassSVM(gamma=0.1, nu=0.1)
Note that one class svm prediction returns -1 for outlier and 1 for inlier.
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)
n_error_train = y_pred_train[y_pred_train == -1].size
n_error_test = y_pred_test[y_pred_test == -1].size
n_error_outliers = y_pred_outliers[y_pred_outliers == 1].size
Plot Contours/Decision Boundaries¶
Using decision function of One Class SVM
# plot the line, the points, and the nearest vectors to the plane
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.title("Novelty Detection")
plt.contour(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu)
# plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred')
# plt.contourf(xx, yy, Z, levels=[0, Z.max()], colors='palevioletred')
<matplotlib.contour.QuadContourSet at 0x7f8bf08ba630>
Filled contour
plt.title("Novelty Detection")
plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu)
plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred')
<matplotlib.contour.QuadContourSet at 0x7f8bf0a23da0>
Notice the decision boundaries.
Overlay datapoints¶
plt.title("Novelty Detection")
plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu)
a = plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred')
plt.contourf(xx, yy, Z, levels=[0, Z.max()], colors='palevioletred')
s = 40
b1 = plt.scatter(X_train[:, 0], X_train[:, 1], c='white', s=s, edgecolors='k')
b2 = plt.scatter(X_test[:, 0], X_test[:, 1], c='blueviolet', s=s,
edgecolors='k')
c = plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='gold', s=s,
edgecolors='k')
plt.axis('tight')
plt.xlim((-5, 5))
plt.ylim((-5, 5))
plt.legend([a.collections[0], b1, b2, c],
["learned frontier", "training observations",
"new regular observations", "new abnormal observations"],
loc="upper left",
prop=matplotlib.font_manager.FontProperties(size=11))
plt.xlabel(
"error train: %d/200 ; errors novel regular: %d/40 ; "
"errors novel abnormal: %d/40"
% (n_error_train, n_error_test, n_error_outliers))
plt.show()
Isolation Forest¶
Similar to one class SVM, Isolation Forest is also used in unsupervised learning to detect outliers. The primary difference is that it's decision boundary is not easily affected by many outliers; Thus, a better outlier detector- compared to one class svm.
Working:
The IsolationForest ‘isolates’ observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature.
Pointers to Anamollies:
Random partitioning produces noticeable shorter paths for anomalies. Hence, when a forest of random trees collectively produce shorter path lengths for particular samples, they are highly likely to be anomalies.
Let's obseve its decision boundaries.
Using the same dataset we used for one class SVM.¶
# fit the model IsolationForest
clf = IsolationForest(max_samples=100, random_state=rng)
clf.fit(X_train)
IsolationForest(max_samples=100, random_state=RandomState(MT19937) at 0x7F8BF0D72258)
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)
Plot Contour Decision Boundaries¶
# plot the line, the samples, and the nearest vectors to the plane
xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.title("IsolationForest")
plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)
<matplotlib.contour.QuadContourSet at 0x7f8bf1816da0>
Notice the high density bright points.
Overlay datapoints¶
plt.title("IsolationForest")
plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)
b1 = plt.scatter(X_train[:, 0], X_train[:, 1], c='white',
s=20, edgecolor='k')
b2 = plt.scatter(X_test[:, 0], X_test[:, 1], c='green',
s=20, edgecolor='k')
c = plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='red',
s=20, edgecolor='k')
plt.axis('tight')
plt.xlim((-5, 5))
plt.ylim((-5, 5))
plt.legend([b1, b2, c],
["training observations",
"new regular observations", "new abnormal observations"],
loc="upper left")
plt.show()