What is a Python Module? An Introduction to Python's Modular Programming
Table of Content:
A Python module can contain objects like docstrings, variables constants, classes, objects, statements, functions.
Modules are accessed by using the import statement. A module is loaded only once, regardless the number of times it is imported.
Syntax
import module_name
from import
Mathematical functions
Mathematical operations can be performed by importing the math module. Different types of mathematical func- tions:
(i) sqrt() : find the square root of a specified expression
(ii) pow() : compute the power of a number
(iii) fabs() : Returns the absolute value of x.
(iv) ceil(x) : returns smallest integer value greater than or equal to x.
(v) floor(x) : returns the largest integer value less than or equal to x.
Random Functions
Python offers random module that can generate random numbers. Different random functions are as follows
(i) random() : Used to generate a float random number less than 1 and greater than or equal to 0.
(ii) choice() : Used to generate 1 random number from a container.
Statistics Module
To access Python’s statistics functions, we need to import the functions from the statistics module. Some statistics functions are as follows:
(i) mean() : Returns the simple arithmetic mean of data which can be a sequence or an iterator.
(ii) median() : Calculates middle value of the arithmetic data in iterative order.
(iii) mode() : Returns the number with maximum number of occurrences.
Popular Python libraries used in data science
Scientific Computing and Statistics
NumPy (Numerical Python)—Python does not have a built-in array data structure. It uses lists, which are convenient but relatively slow. NumPy provides the high-performance ndarray
data structure to represent lists and matrices, and it also provides routines for processing such data structures.
SciPy (Scientific Python)—Built on NumPy, SciPy adds routines for scientific processing, such as integrals, differential equations, additional matrix processing and more. scipy.org
controls SciPy and NumPy.
StatsModels—Provides support for estimations of statistical models, statistical tests and statistical data exploration.
Data Manipulation and Analysis
Pandas—An extremely popular library for data manipulations. Pandas makes abundant use of NumPy’s ndarray
. Its two key data structures are Series
(one dimensional) and DataFrame
s (two dimensional).
Visualization
Matplotlib—A highly customizable visualization and plotting library. Supported plots include regular, scatter, bar, contour, pie, quiver, grid, polar axis, 3D and text.
Seaborn—A higher-level visualization library built on Matplotlib. Seaborn adds a nicer look-and-feel, additional visualizations and enables you to create visualizations with less code.
Machine Learning, Deep Learning and Reinforcement Learning
scikit-learn—Top machine-learning library. Machine learning is a subset of AI. Deep learning is a subset of machine learning that focuses on neural networks.
Keras—One of the easiest to use deep-learning libraries. Keras runs on top of TensorFlow (Google), CNTK (Microsoft’s cognitive toolkit for deep learning) or Theano (Université de Montréal).
TensorFlow—From Google, this is the most widely used deep learning library. TensorFlow works with GPUs (graphics processing units) or Google’s custom TPUs (Tensor processing units) for performance. TensorFlow is important in AI and big data analytics—where processing demands are huge. You’ll use the version of Keras that’s built into TensorFlow.
OpenAI Gym—A library and environment for developing, testing and comparing reinforcement-learning algorithms.
Natural Language Processing (NLP)
NLTK (Natural Language Toolkit)—Used for natural language processing (NLP) tasks.
TextBlob—An object-oriented NLP text-processing library built on the NLTK and pattern NLP libraries. TextBlob simplifies many NLP tasks.
Gensim—Similar to NLTK. Commonly used to build an index for a collection of documents, then determine how similar another document is to each of those in the index.