Almost three years ago, I wrote a blog entry titled Useful Python Packages, which listed the essential packages that I deemed important. How has the list been changed over the past three years?
First of all, three years ago, most people were still writing Python 2.7. But now there is a trend to switch to Python 3. I admitted that I still have not started the switch yet, but in the short term, I will have no choice and I will.
What are some of the essential packages?
- numpy: numerical Python, containing most basic numerical routines such as matrix manipulation, linear algebra, random sampling, numerical integration etc. There is a built-in wrapper for Fortran as well. Actually, numpy is so important that some Linux system includes it with Python.
- scipy: scientific Python, containing some functions useful for scientific computing, such as sparse matrices, numerical differential equations, advanced linear algebra, special functions etc.
- networkx: package that handles various types of networks
- PuLP: linear programming
- cvxopt: convex optimization
- matplotlib: basic plotting.
- ggplot2: the ggplot2 counterpart in Python for producing quality publication plots.
- pandas: data manipulation, working with data frames in Python, and save/load of various formats such as CSV and Excel
- scikit-learn: machine-learning library in Python, containing classes and functions for supervised and unsupervised learning
Deep Learning Frameworks
- TensorFlow: because of Google’s marketing effort, TensorFlow is now the industrial standard for building deep learning networks, with rich source of mathematical functions, esp. for neural network cells, with GPU capability
- Keras: containing routines of high-level layers for deep learning neural networks, with TensorFlow, Theano, or CNTK as the backbone
- PyTorch: a rivalry against TensorFlow
Natural Language Processing
- nltk: natural language processing toolkit for Python, containing bag-of-words model, tokenizer, stemmers, chunker, lemmatizers, part-of-speech taggers etc.
- gensim: a useful natural language processing package useful for topic modeling, word-embedding, latent semantic indexing etc., running in a fast fashion
- shorttext: text mining package good for handling short sentences, that provide high-level routines for training neural network classifiers, or generating feature represented by topic models or autoencodings.
- spacy: industrial standard for natural language processing common tools
I can probably list more, but I think I covered most of them. If you do not find something useful, it is probably time for you to write a brand new package.