Previously, I have went through heuristically the description of topology using homology groups in this entry. [Ho 2015] This is the essence of algebraic topology. We describe the topology using Betti numbers, the rank of the homolog groups. What they mean can be summarized as: [Bubenik 2015]
“… homology in degree 0 describes the connectedness of the data; homology in degree 1 detects holes and tunnels; homology in degree 2 captures voids; and so on.
Concept of Persistence
However, in computational problems, it is the discrete points that we are dealing with. We formulate their connectedness through constructing complexes, as described by my another blog entry. [Ho 2015] From the Wolfram Demonstration that I quoted previously, connectedness depends on some parameters, such as the radii of points that are considered connected. Whether it is Čech Complex, RP complex, or Alpha complex, the idea is similar. With discrete data, therefore, there is no definite answer how the connectedness among the points are, as it depends on the parameters.
Therefore, the concept of persistence has been developed to tackle this problem. This is the core concept for computational topology. There are a lot of papers about persistence, but the most famous work is done by Zomorodian and Carlsson, who algebraically studied it. [Zomorodian & Carlsson 2005] The idea is that as one increases the radii of points, the complexes change, and so do the homology groups. By varying the radii, we can observe which topology persists.
From the diagram above, we can see that as the radii ε increase, the diagram becomes more connected. To understand the changes of homologies, there are a few ways. In the diagram above, barcodes represent the “life span” of a connected component as ε increases. The Betti numbers of a certain degree (0, 1, or 2 in this example) at a certain value of ε is the number of barcodes at that degree. For example, look at the left most vertical dashed line, , as there are 10 barcodes existing for . Note there are indeed 10 separate connected components. For the second leftmost vertical dashed line, (6 connected components), and (2 holes).
Another way is using the persistence diagram, basically plotting the “birth” and “death” times of all the barcodes above. For an explanation of persistence diagram, please refer to this blog entry by Sebastien Bubeck, [Bubeck 2013] or the paper by Fasy et. al. [Fasy et. al. 2014] Another way to describe persistent topology is the persistence landscape. [Bubenik 2015]
TDA Package in R
There is a package in R that wraps Dionysus and PHAT, called TDA. To install it, simply open an R session, and enter
To load it, simply enter
We know that for a circle, , as it has on connected components, and a hole. Prepare the circle and store it in X by the function circleUnif:
X<- circleUnif(n=1000, r=1) plot(X)
Then we can see a 2-dimensional circle like this:
To calculate the persistent homology, use the function gridDiag:
diag.info<- gridDiag(X=X, FUN=kde, h=0.3, lim=cbind(c(-1, 1), c(-1, 1)), by=0.01, sublevel = FALSE, library = 'PHAT', printProgress=FALSE)
To plot the barcodes and persistence diagram, enter:
par(mfrow=c(2,1)) plot(diag.info$diagram) plot(diag.info$diagram, barcode=TRUE)
In the plots, black refers to degree 0, and red refers to degree 1.
We can play the same game by adding a horizontal line to cut the circle into two halves:
X<- circleUnif(n=1000, r=1) hl<- matrix(c(seq(-1, 1, 0.02), rep(0, 201)), ncol=2) X<- rbind(X, hl) plot(X)
And the barcodes and persistence diagram are:
We can try this with three-dimensional objects like sphere, or torus, but I never finished the calculation in reasonable speeds.
- K.-Y. Ho, “TDA (3) – Homology and Betti Numbers,” WordPress (2015).
- K.-Y. Ho, “TDA (1) – Starting the Journey of Topological Data Analysis (TDA),” WordPress (2015).
- P. Bubenik, “Statistical Topological Data Analysis using Persistence Landscape”, Journal of Machine Learning Research 16, 77-102 (2015). [arXiv:1207.6437]
- K.-Y. Ho, “TDA (2) – Constructing Connectivities,” WordPress (2015).
- “Simplicial Homology of the Alpha Complex,” http://demonstrations.wolfram.com/SimplicialHomologyOfTheAlphaComplex/ | Wolfram Demonstrations Project
- A. Zomorodian, G. Carlsson, “Computing Persistent Homology,” Discrete Comput. Geom. 33, 249-274 (2005). [PDF in Stanford]
- R. Ghrist, “Barcodes: The persistent topology of data,” Bulletin-American Mathematical Society 45, 1-15 (2008). [PDF in AMS]
- L. Wasserman, “Topological Data Analysis,” WordPress (2012).
- S. Bubeck, “Topological Inference,” WordPress (2013).
- B. T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan, A. Singh, “Confidence Sets for Persistence Diagram,” Annals of Statistics 42, 2301-2339 (2014). [arXiv:1303.7177]
- Ayasdi Core.
- Dionysus (C++ library for persistent homology. It has a Python binding.).
- PHAT (Persistent Homology Algorithm Toolbox).
- CRAN – Package TDA. [article PDF]
- G. Carlsson, “Topology and Data”, Bull. Amer. Math. Soc. 46, 255-308 (2009).
- A. Zomorodian, “Topology for Computing“, Camrbidge (2009).