Previously, I have went through heuristically the description of topology using homology groups in this entry. [Ho 2015] This is the essence of algebraic topology. We describe the topology using Betti numbers, the rank of the homolog groups. What they mean can be summarized as: [Bubenik 2015]

“… homology in degree 0 describes the connectedness of the data; homology in degree 1 detects holes and tunnels; homology in degree 2 captures voids; and so on.

## Concept of Persistence

However, in computational problems, it is the discrete points that we are dealing with. We formulate their connectedness through constructing complexes, as described by my another blog entry. [Ho 2015] From the Wolfram Demonstration that I quoted previously, connectedness depends on some parameters, such as the radii of points that are considered connected. Whether it is Čech Complex, RP complex, or Alpha complex, the idea is similar. With discrete data, therefore, there is no definite answer how the connectedness among the points are, as it depends on the parameters.

Therefore, the concept of persistence has been developed to tackle this problem. This is the core concept for computational topology. There are a lot of papers about persistence, but the most famous work is done by Zomorodian and Carlsson, who algebraically studied it. [Zomorodian & Carlsson 2005] The idea is that as one increases the radii of points, the complexes change, and so do the homology groups. By varying the radii, we can observe which topology persists.

From the diagram above, we can see that as the radii ε increase, the diagram becomes more connected. To understand the changes of homologies, there are a few ways. In the diagram above, barcodes represent the “life span” of a connected component as ε increases. The Betti numbers of a certain degree (0, 1, or 2 in this example) at a certain value of ε is the number of barcodes at that degree. For example, look at the left most vertical dashed line, $\beta_0=10$, as there are 10 barcodes existing for $H_0$. Note there are indeed 10 separate connected components. For the second leftmost vertical dashed line, $\beta_0=6$ (6 connected components), and $\beta_1=2$ (2 holes).

Another way is using the persistence diagram, basically plotting the “birth” and “death” times of all the barcodes above. For an explanation of persistence diagram, please refer to this blog entry by Sebastien Bubeck, [Bubeck 2013] or the paper by Fasy et. al. [Fasy et. al. 2014] Another way to describe persistent topology is the persistence landscape. [Bubenik 2015]

## TDA Package in R

There are a lot of tools to perform topological data analysis. Ayasdi Core is a famous one. There are open sources C++ libraries such as Dionysus, or PHAT. There is a Python binding for Dionysus too.

There is a package in R that wraps Dionysus and PHAT, called TDA. To install it, simply open an R session, and enter

install.package('TDA')


To load it, simply enter

library(TDA)


We know that for a circle, $\beta_0=\beta_1=1$, as it has on connected components, and a hole. Prepare the circle and store it in X by the function circleUnif:

X<- circleUnif(n=1000, r=1)
plot(X)


Then we can see a 2-dimensional circle like this:

To calculate the persistent homology, use the function gridDiag:

diag.info<- gridDiag(X=X, FUN=kde, h=0.3, lim=cbind(c(-1, 1), c(-1, 1)), by=0.01, sublevel = FALSE, library = 'PHAT', printProgress=FALSE)


To plot the barcodes and persistence diagram, enter:

par(mfrow=c(2,1))
plot(diag.info$diagram) plot(diag.info$diagram, barcode=TRUE)


In the plots, black refers to degree 0, and red refers to degree 1.

We can play the same game by adding a horizontal line to cut the circle into two halves:

X<- circleUnif(n=1000, r=1)
hl<- matrix(c(seq(-1, 1, 0.02), rep(0, 201)), ncol=2)
X<- rbind(X, hl)
plot(X)


And the barcodes and persistence diagram are:

We can try this with three-dimensional objects like sphere, or torus, but I never finished the calculation in reasonable speeds.

## 7 thoughts on “Persistence”

1. Thank you a lot for this great introduction. I am just starting with TDA and didn’t know where to start.

Like

1. Thank you for your comment. I am also learning too! I found this more than interesting.

Like

2. Kay says:

This is such a great walk through, thanks! Truly nicely done. I’m struggling however going from the TDA package vignette examples to applying to my own discrete data, mostly in understanding if there’s a best function to use for distance and how to determine what that function is. Do you have any recommendations on how to learn? Thanks for your thoughts 🙂

Like

1. Thank you for the comment. TDA package in R is only fine, and it actually makes use of some other package such as dionysus. To learn this, I would really recommend Zomoradian’s book. But for implementation, I am still not sure which open-source codes are better….

Like

1. Kay says:

Oh sorry I think I was vague. I am learning the TDA package and that has been nice, but in building the persistent homology how does one determine which Diag function to use, and if using gridDiag which function to use for the FUN parameter? For example I could use the ripsDiag or I could use gridDiag and use kde, kernelDist, knnDE, etc. So that is where I am not sure the best approach to getting the “most accurate picture” of the homologies of the point cloud.

Like

2. sorry for my late reply. to be honest I am not an expert of this R package. However, for a more general theory about which Diag function, it depends on the nature of that particular problem. just like with all these discrete points, there is no general recipe which kind of simplicial complexes to use

Like

1. Kay says:

Ok thanks for the response!

Like