Previously, I have went through heuristically the description of topology using homology groups in this entry. [Ho 2015] This is the essence of algebraic topology. We describe the topology using Betti numbers, the rank of the homolog groups. What they mean can be summarized as: [Bubenik 2015]

“… homology in degree 0 describes the connectedness of the data; homology in degree 1 detects holes and tunnels; homology in degree 2 captures voids; and so on.

**Concept of Persistence**

However, in computational problems, it is the discrete points that we are dealing with. We formulate their connectedness through constructing complexes, as described by my another blog entry. [Ho 2015] From the Wolfram Demonstration that I quoted previously, connectedness depends on some parameters, such as the radii of points that are considered connected. Whether it is Čech Complex, RP complex, or Alpha complex, the idea is similar. With discrete data, therefore, there is no definite answer how the connectedness among the points are, as it depends on the parameters.

Therefore, the concept of persistence has been developed to tackle this problem. This is the core concept for computational topology. There are a lot of papers about persistence, but the most famous work is done by Zomorodian and Carlsson, who algebraically studied it. [Zomorodian & Carlsson 2005] The idea is that as one increases the radii of points, the complexes change, and so do the homology groups. By varying the radii, we can observe which topology *persists*.

Another way is using the persistence diagram, basically plotting the “birth” and “death” times of all the barcodes above. For an explanation of persistence diagram, please refer to this blog entry by Sebastien Bubeck, [Bubeck 2013] or the paper by Fasy *et. al.* [Fasy *et. al.* 2014] Another way to describe persistent topology is the persistence landscape. [Bubenik 2015]

## TDA Package in R

There are a lot of tools to perform topological data analysis. Ayasdi Core is a famous one. There are open sources C++ libraries such as Dionysus, or PHAT. There is a Python binding for Dionysus too.

There is a package in R that wraps Dionysus and PHAT, called TDA. To install it, simply open an R session, and enter

install.package('TDA')

To load it, simply enter

library(TDA)

We know that for a circle, , as it has on connected components, and a hole. Prepare the circle and store it in X by the function circleUnif:

X<- circleUnif(n=1000, r=1) plot(X)

Then we can see a 2-dimensional circle like this:

To calculate the persistent homology, use the function gridDiag:

diag.info<- gridDiag(X=X, FUN=kde, h=0.3, lim=cbind(c(-1, 1), c(-1, 1)), by=0.01, sublevel = FALSE, library = 'PHAT', printProgress=FALSE)

To plot the barcodes and persistence diagram, enter:

par(mfrow=c(2,1)) plot(diag.info$diagram) plot(diag.info$diagram, barcode=TRUE)

In the plots, black refers to degree 0, and red refers to degree 1.

We can play the same game by adding a horizontal line to cut the circle into two halves:

X<- circleUnif(n=1000, r=1) hl<- matrix(c(seq(-1, 1, 0.02), rep(0, 201)), ncol=2) X<- rbind(X, hl) plot(X)

And the barcodes and persistence diagram are:

We can try this with three-dimensional objects like sphere, or torus, but I never finished the calculation in reasonable speeds.

- K.-Y. Ho, “TDA (3) – Homology and Betti Numbers,” WordPress (2015).
- K.-Y. Ho, “TDA (1) – Starting the Journey of Topological Data Analysis (TDA),” WordPress (2015).
- P. Bubenik, “Statistical Topological Data Analysis using Persistence Landscape”,
*Journal of Machine Learning Research***16**, 77-102 (2015). [arXiv:1207.6437] - K.-Y. Ho, “TDA (2) – Constructing Connectivities,” WordPress (2015).
- “Simplicial Homology of the Alpha Complex,” http://demonstrations.wolfram.com/SimplicialHomologyOfTheAlphaComplex/ | Wolfram Demonstrations Project
- A. Zomorodian, G. Carlsson, “Computing Persistent Homology,”
*Discrete Comput. Geom.***33**, 249-274 (2005). [PDF in Stanford] - R. Ghrist, “Barcodes: The persistent topology of data,”
*Bulletin-American Mathematical Society***45**, 1-15 (2008). [PDF in AMS] - L. Wasserman, “Topological Data Analysis,” WordPress (2012).
- S. Bubeck, “Topological Inference,” WordPress (2013).
- B. T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan, A. Singh, “Confidence Sets for Persistence Diagram,”
*Annals of Statistics***42**, 2301-2339 (2014). [arXiv:1303.7177] - Ayasdi Core.
- Dionysus (C++ library for persistent homology. It has a Python binding.).
- PHAT (Persistent Homology Algorithm Toolbox).
- CRAN – Package TDA. [article PDF]
- G. Carlsson, “Topology and Data”,
*Bull. Amer. Math. Soc.***46**, 255-308 (2009). - A. Zomorodian, “Topology for Computing“, Camrbidge (2009).

Thank you a lot for this great introduction. I am just starting with TDA and didn’t know where to start.

LikeLike

Thank you for your comment. I am also learning too! I found this more than interesting.

LikeLike

This is such a great walk through, thanks! Truly nicely done. I’m struggling however going from the TDA package vignette examples to applying to my own discrete data, mostly in understanding if there’s a best function to use for distance and how to determine what that function is. Do you have any recommendations on how to learn? Thanks for your thoughts 🙂

LikeLike

Thank you for the comment. TDA package in R is only fine, and it actually makes use of some other package such as dionysus. To learn this, I would really recommend Zomoradian’s book. But for implementation, I am still not sure which open-source codes are better….

LikeLike

Oh sorry I think I was vague. I am learning the TDA package and that has been nice, but in building the persistent homology how does one determine which Diag function to use, and if using gridDiag which function to use for the FUN parameter? For example I could use the ripsDiag or I could use gridDiag and use kde, kernelDist, knnDE, etc. So that is where I am not sure the best approach to getting the “most accurate picture” of the homologies of the point cloud.

LikeLike

sorry for my late reply. to be honest I am not an expert of this R package. However, for a more general theory about which Diag function, it depends on the nature of that particular problem. just like with all these discrete points, there is no general recipe which kind of simplicial complexes to use

LikeLike

Ok thanks for the response!

LikeLike