How weird can dimensions get?
Let us put some oranges in a box.
Consider the problem here. We have a box, in each corner there are balloons as shown in the figure below. We have to put an orange in the middle of the box such that it is covered by a balloon and box. So what will be the maximum size of Orange which is to be placed in the middle?
Let’s start with a one dimensional box and balloons i.e. lines. From the below figure we can see there is no place left to adjust the orange. So in one dimension we cannot put the orange in the box. So let’s move on to two dimensions.
In two dimensions the box is nothing but a square and the balloons are circles. So from the below figure we can see that there is some space in the middle where we can place oranges. With the help of high school geometry we got 0.4 as the maximum radius of the orange (which is also a circle in two dimensions) that can fit inside the box.
Now let’s try this in three dimensions. In 3D it is quite difficult to find the space. But I have done this for you. In 3D you can put an orange with a maximum radius of 0.7.
Now you see the trend. As the number of dimensions increases, the size of orange also increases. The below graph shows how the shape of orange increases as we go increasing the dimension.
Now you can see that in nine dimensions the radius of our hyperorange is 2, which means that the diameter of our orange is equal to the length of the square despite being surrounded by 512 hyperspheres of radius 1, each in one of the 512 corners of this 9D hypercube. If the orange is as largeas the box, how are the balloons protecting anything?
But it doesn’t end here, It becomes even crazier when we go in 10 dimensions. In 10D the scale of hyperorange exceeds the scale of hypercube which was meant to cover the orange. It appears to be extending past the perimeters of the cube, despite the fact that we built it to suit each within the field and the balloons which can nevertheless be in each corner. It’s difficult for us with 3-D brains to picture 10 dimensions (or greater), however the equations test out: the orange is concurrently within the field and lengthening past the field.
The moral is that our intuition can fail us when we get into a space with many dimensions. This is important because we work with data that has tens of thousands of features.
Anytime we work with data with more than 3 features, we have entered the world of higher dimension and we should notreason by analogy with what we know from our experience with twoand three dimensions. We need to keep our eyes open and rely onmath and logic, rather than intuition and analogy.
Curse of Dimensionality
Consider the height of 10 people. We plot the height of those people on a 1-dimensional graph i.e line. So below is the graph. I use different colours to mark children and adults. I say that children have height less than 5 feet and adults have height more than 5 feet. So 5 feet is a boundary which separates adults from children.
But this might not be enough so we consider another feature- weight and say that children normally have a weight of less than 55 Kg. So now we can properly separate adults from childrens. But the problem is that now we are not able to decide which of the boundaries efficiently separates adults from children. There are many decision boundaries which classify adults from children.
Again If we consider another feature, let say experience which we measure on the scale of 10. Then we get a graph which looks like this:
Now you got the problem? As we go on increasing the dimension or feature we get more and more void space. Which in turn can affect our classifier to classify things. As there is so much void space between two classified things, our algorithm might not be able to guess the right decision boundary. So next time when our algorithm gets new data it may put that data on the wrong side.
How to avoid the curse of dimensionality?
Regrettably there is no fixed definition of how many features should be used to avoid this problem. It depends on the amount of training data, complexity of decision boundary and the type of classifier used.
Ideally if we have an infinite number of training data then there is no question of curse of dimensionality as we can use an infinite number of features. If it takes N amount of data to classify things in one dimension then in two dimensions it takes N^2 data and in three dimensions it takes N^3 data and so on.
Furthermore, overfitting will occur both when estimating relatively few parameters in a highly dimensional space, and when estimating a lot of parameters in a lower dimensional space.
If we have less data points it is always better to work with less features.Dimensionality reduction is used to tackle the curse of dimensionality. Dimensional reduction techniques are very useful to transform sparse features to dense features. Dimensionality reduction is also used for feature selection and feature extraction.
So what is dimensionality reduction?
In the above example of classifying people as children and adults we saw that as we increase the number of dimensions, accuracy of classification starts decreasing. We can also see there is some correlation between weight and height, which is redundant. So we can ignore any one of the features from weight and height. Hence the process of reducing the number of random variables under consideration, by obtaining a set of principal variables is dimensionality reduction.
Some Algorithms used for Dimensionality Reduction
- Principal Component Analysis (PCA)
- Linear Discriminant Analysis (LDA)
- Generalised Discriminant Analysis (GDA)
Advantages of dimensionality reduction
- It helps in data compression by reducing features
- It makes machine learning algorithms computationally efficient
Disadvantage of dimensionality reduction
- It may lead to some amount of data loss
- Accuracy is compromised
Do follow our LinkedIn page for updates: [ Myraah IO on LinkedIn ]