## Introduction to Python for Data Sciences | Franck Iutzeler |

\n", "\n", "

\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1- Packages\n", "\n", "\n", "\n", "Python has a large standard library, commonly cited as one of Python's greatest strengths, providing tools suited to many tasks. As of May, 2017, the official repository containing third-party software for Python, contains over 107,000 packages.\n", "\n", "A *package* is a collection of *modules* i.e. groups of functions, classes, constants, types, etc." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Built-in modules\n", "\n", "To use a module, you have to *import* it using the command `import`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import math" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can now use it in the code by using its name as a prefix. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.0\n" ] } ], "source": [ "x = math.cos(2 * math.pi)\n", "\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To explore the function and other content of the module/library:\n", " * Use the web documentation (e.g. for the `math` library Doc for Python 3)\n", " * Use the built-in`help`" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on built-in module math:\n", "\n", "NAME\n", " math\n", "\n", "DESCRIPTION\n", " This module provides access to the mathematical functions\n", " defined by the C standard.\n", "\n", "FUNCTIONS\n", " acos(x, /)\n", " Return the arc cosine (measured in radians) of x.\n", " \n", " acosh(x, /)\n", " Return the inverse hyperbolic cosine of x.\n", " \n", " asin(x, /)\n", " Return the arc sine (measured in radians) of x.\n", " \n", " asinh(x, /)\n", " Return the inverse hyperbolic sine of x.\n", " \n", " atan(x, /)\n", " Return the arc tangent (measured in radians) of x.\n", " \n", " atan2(y, x, /)\n", " Return the arc tangent (measured in radians) of y/x.\n", " \n", " Unlike atan(y/x), the signs of both x and y are considered.\n", " \n", " atanh(x, /)\n", " Return the inverse hyperbolic tangent of x.\n", " \n", " ceil(x, /)\n", " Return the ceiling of x as an Integral.\n", " \n", " This is the smallest integer >= x.\n", " \n", " comb(n, k, /)\n", " Number of ways to choose k items from n items without repetition and without order.\n", " \n", " Evaluates to n! / (k! * (n - k)!) when k <= n and evaluates\n", " to zero when k > n.\n", " \n", " Also called the binomial coefficient because it is equivalent\n", " to the coefficient of k-th term in polynomial expansion of the\n", " expression (1 + x)**n.\n", " \n", " Raises TypeError if either of the arguments are not integers.\n", " Raises ValueError if either of the arguments are negative.\n", " \n", " copysign(x, y, /)\n", " Return a float with the magnitude (absolute value) of x but the sign of y.\n", " \n", " On platforms that support signed zeros, copysign(1.0, -0.0)\n", " returns -1.0.\n", " \n", " cos(x, /)\n", " Return the cosine of x (measured in radians).\n", " \n", " cosh(x, /)\n", " Return the hyperbolic cosine of x.\n", " \n", " degrees(x, /)\n", " Convert angle x from radians to degrees.\n", " \n", " dist(p, q, /)\n", " Return the Euclidean distance between two points p and q.\n", " \n", " The points should be specified as sequences (or iterables) of\n", " coordinates. Both inputs must have the same dimension.\n", " \n", " Roughly equivalent to:\n", " sqrt(sum((px - qx) ** 2.0 for px, qx in zip(p, q)))\n", " \n", " erf(x, /)\n", " Error function at x.\n", " \n", " erfc(x, /)\n", " Complementary error function at x.\n", " \n", " exp(x, /)\n", " Return e raised to the power of x.\n", " \n", " expm1(x, /)\n", " Return exp(x)-1.\n", " \n", " This function avoids the loss of precision involved in the direct evaluation of exp(x)-1 for small x.\n", " \n", " fabs(x, /)\n", " Return the absolute value of the float x.\n", " \n", " factorial(x, /)\n", " Find x!.\n", " \n", " Raise a ValueError if x is negative or non-integral.\n", " \n", " floor(x, /)\n", " Return the floor of x as an Integral.\n", " \n", " This is the largest integer <= x.\n", " \n", " fmod(x, y, /)\n", " Return fmod(x, y), according to platform C.\n", " \n", " x % y may differ.\n", " \n", " frexp(x, /)\n", " Return the mantissa and exponent of x, as pair (m, e).\n", " \n", " m is a float and e is an int, such that x = m * 2.**e.\n", " If x is 0, m and e are both 0. Else 0.5 <= abs(m) < 1.0.\n", " \n", " fsum(seq, /)\n", " Return an accurate floating point sum of values in the iterable seq.\n", " \n", " Assumes IEEE-754 floating point arithmetic.\n", " \n", " gamma(x, /)\n", " Gamma function at x.\n", " \n", " gcd(x, y, /)\n", " greatest common divisor of x and y\n", " \n", " hypot(...)\n", " hypot(*coordinates) -> value\n", " \n", " Multidimensional Euclidean distance from the origin to a point.\n", " \n", " Roughly equivalent to:\n", " sqrt(sum(x**2 for x in coordinates))\n", " \n", " For a two dimensional point (x, y), gives the hypotenuse\n", " using the Pythagorean theorem: sqrt(x*x + y*y).\n", " \n", " For example, the hypotenuse of a 3/4/5 right triangle is:\n", " \n", " >>> hypot(3.0, 4.0)\n", " 5.0\n", " \n", " isclose(a, b, *, rel_tol=1e-09, abs_tol=0.0)\n", " Determine whether two floating point numbers are close in value.\n", " \n", " rel_tol\n", " maximum difference for being considered \"close\", relative to the\n", " magnitude of the input values\n", " abs_tol\n", " maximum difference for being considered \"close\", regardless of the\n", " magnitude of the input values\n", " \n", " Return True if a is close in value to b, and False otherwise.\n", " \n", " For the values to be considered close, the difference between them\n", " must be smaller than at least one of the tolerances.\n", " \n", " -inf, inf and NaN behave similarly to the IEEE 754 Standard. That\n", " is, NaN is not close to anything, even itself. inf and -inf are\n", " only close to themselves.\n", " \n", " isfinite(x, /)\n", " Return True if x is neither an infinity nor a NaN, and False otherwise.\n", " \n", " isinf(x, /)\n", " Return True if x is a positive or negative infinity, and False otherwise.\n", " \n", " isnan(x, /)\n", " Return True if x is a NaN (not a number), and False otherwise.\n", " \n", " isqrt(n, /)\n", " Return the integer part of the square root of the input.\n", " \n", " ldexp(x, i, /)\n", " Return x * (2**i).\n", " \n", " This is essentially the inverse of frexp().\n", " \n", " lgamma(x, /)\n", " Natural logarithm of absolute value of Gamma function at x.\n", " \n", " log(...)\n", " log(x, [base=math.e])\n", " Return the logarithm of x to the given base.\n", " \n", " If the base not specified, returns the natural logarithm (base e) of x.\n", " \n", " log10(x, /)\n", " Return the base 10 logarithm of x.\n", " \n", " log1p(x, /)\n", " Return the natural logarithm of 1+x (base e).\n", " \n", " The result is computed in a way which is accurate for x near zero.\n", " \n", " log2(x, /)\n", " Return the base 2 logarithm of x.\n", " \n", " modf(x, /)\n", " Return the fractional and integer parts of x.\n", " \n", " Both results carry the sign of x and are floats.\n", " \n", " perm(n, k=None, /)\n", " Number of ways to choose k items from n items without repetition and with order.\n", " \n", " Evaluates to n! / (n - k)! when k <= n and evaluates\n", " to zero when k > n.\n", " \n", " If k is not specified or is None, then k defaults to n\n", " and the function returns n!.\n", " \n", " Raises TypeError if either of the arguments are not integers.\n", " Raises ValueError if either of the arguments are negative.\n", " \n", " pow(x, y, /)\n", " Return x**y (x to the power of y).\n", " \n", " prod(iterable, /, *, start=1)\n", " Calculate the product of all the elements in the input iterable.\n", " \n", " The default start value for the product is 1.\n", " \n", " When the iterable is empty, return the start value. This function is\n", " intended specifically for use with numeric values and may reject\n", " non-numeric types.\n", " \n", " radians(x, /)\n", " Convert angle x from degrees to radians.\n", " \n", " remainder(x, y, /)\n", " Difference between x and the closest integer multiple of y.\n", " \n", " Return x - n*y where n*y is the closest integer multiple of y.\n", " In the case where x is exactly halfway between two multiples of\n", " y, the nearest even value of n is used. The result is always exact.\n", " \n", " sin(x, /)\n", " Return the sine of x (measured in radians).\n", " \n", " sinh(x, /)\n", " Return the hyperbolic sine of x.\n", " \n", " sqrt(x, /)\n", " Return the square root of x.\n", " \n", " tan(x, /)\n", " Return the tangent of x (measured in radians).\n", " \n", " tanh(x, /)\n", " Return the hyperbolic tangent of x.\n", " \n", " trunc(x, /)\n", " Truncates the Real x to the nearest Integral toward 0.\n", " \n", " Uses the __trunc__ magic method.\n", "\n", "DATA\n", " e = 2.718281828459045\n", " inf = inf\n", " nan = nan\n", " pi = 3.141592653589793\n", " tau = 6.283185307179586\n", "\n", "FILE\n", " (built-in)\n", "\n", "\n" ] } ], "source": [ "help(math)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on built-in function sqrt in module math:\n", "\n", "sqrt(x, /)\n", " Return the square root of x.\n", "\n" ] } ], "source": [ "help(math.sqrt)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the name prefix can make the code obfuscated as it can get quite verbose (e.g. `scipy.optimize.minimize`) so Python provides simpler ways to import:\n", "* `import name as nickname`: the prefix to call is now `nickname`" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3.141592653589793\n" ] } ], "source": [ "import math as m\n", "\n", "print(m.pi)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* `from name import function1,constant1` : `function1` `constant1` can now be called directly. You can even import all contents with `from name import *` but this may be dangerous as names may conflict or override former ones, it is thus not advised except on user-generated modules." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.0\n" ] } ], "source": [ "from math import e,log\n", "\n", "print(log(e**4))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Installing packages\n", "\n", "\n", "Python comes with a lot a lot of packages (ie. functions and small programs), provided by the community. To install a package `SomePackage`, the recommended way is know to use `pip` :\n", "\n", "`python -m pip install SomePackage` or simply `pip install SomePackage`\n", "\n", "\n", "See https://docs.python.org/3.9/installing/index.html for details on installing packages (and if you do not have `pip` installed, see https://packaging.python.org/tutorials/installing-packages/#requirements-for-installing-packages )\n", "\n", "\n", "*Warning:* this is the preferred way, however:\n", "* if you are using Anaconda, it is preferrable to install packages directly using the Anaconda interface, see https://docs.anaconda.com/anaconda/navigator/tutorials/manage-packages/ \n", "* If you don't have administrator rights on the machine (eg. at university), use `pip --user install SomePackage` to install a package locally." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once installed, you can import the packages as above. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "import scipy" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on package scipy:\n", "\n", "NAME\n", " scipy\n", "\n", "DESCRIPTION\n", " SciPy: A scientific computing package for Python\n", " ================================================\n", " \n", " Documentation is available in the docstrings and\n", " online at https://docs.scipy.org.\n", " \n", " Contents\n", " --------\n", " SciPy imports all the functions from the NumPy namespace, and in\n", " addition provides:\n", " \n", " Subpackages\n", " -----------\n", " Using any of these subpackages requires an explicit import. For example,\n", " ``import scipy.cluster``.\n", " \n", " ::\n", " \n", " cluster --- Vector Quantization / Kmeans\n", " fft --- Discrete Fourier transforms\n", " fftpack --- Legacy discrete Fourier transforms\n", " integrate --- Integration routines\n", " interpolate --- Interpolation Tools\n", " io --- Data input and output\n", " linalg --- Linear algebra routines\n", " linalg.blas --- Wrappers to BLAS library\n", " linalg.lapack --- Wrappers to LAPACK library\n", " misc --- Various utilities that don't have\n", " another home.\n", " ndimage --- N-D image package\n", " odr --- Orthogonal Distance Regression\n", " optimize --- Optimization Tools\n", " signal --- Signal Processing Tools\n", " signal.windows --- Window functions\n", " sparse --- Sparse Matrices\n", " sparse.linalg --- Sparse Linear Algebra\n", " sparse.linalg.dsolve --- Linear Solvers\n", " sparse.linalg.dsolve.umfpack --- :Interface to the UMFPACK library:\n", " Conjugate Gradient Method (LOBPCG)\n", " sparse.linalg.eigen --- Sparse Eigenvalue Solvers\n", " sparse.linalg.eigen.lobpcg --- Locally Optimal Block Preconditioned\n", " Conjugate Gradient Method (LOBPCG)\n", " spatial --- Spatial data structures and algorithms\n", " special --- Special functions\n", " stats --- Statistical Functions\n", " \n", " Utility tools\n", " -------------\n", " ::\n", " \n", " test --- Run scipy unittests\n", " show_config --- Show scipy build configuration\n", " show_numpy_config --- Show numpy build configuration\n", " __version__ --- SciPy version string\n", " __numpy_version__ --- Numpy version string\n", "\n", "PACKAGE CONTENTS\n", " __config__\n", " _build_utils (package)\n", " _distributor_init\n", " _lib (package)\n", " cluster (package)\n", " conftest\n", " constants (package)\n", " fft (package)\n", " fftpack (package)\n", " integrate (package)\n", " interpolate (package)\n", " io (package)\n", " linalg (package)\n", " misc (package)\n", " ndimage (package)\n", " odr (package)\n", " optimize (package)\n", " setup\n", " signal (package)\n", " sparse (package)\n", " spatial (package)\n", " special (package)\n", " stats (package)\n", " version\n", "\n", "CLASSES\n", " builtins.DeprecationWarning(builtins.Warning)\n", " numpy.ModuleDeprecationWarning\n", " builtins.IndexError(builtins.LookupError)\n", " numpy.AxisError(builtins.ValueError, builtins.IndexError)\n", " builtins.RuntimeError(builtins.Exception)\n", " numpy.TooHardError\n", " builtins.RuntimeWarning(builtins.Warning)\n", " numpy.ComplexWarning\n", " builtins.UserWarning(builtins.Warning)\n", " numpy.RankWarning\n", " numpy.VisibleDeprecationWarning\n", " builtins.ValueError(builtins.Exception)\n", " numpy.AxisError(builtins.ValueError, builtins.IndexError)\n", " builtins.bytes(builtins.object)\n", " numpy.bytes_(builtins.bytes, numpy.character)\n", " builtins.object\n", " numpy.DataSource\n", " numpy.MachAr\n", " numpy.broadcast\n", " numpy.busdaycalendar\n", " numpy.dtype\n", " numpy.finfo\n", " numpy.flatiter\n", " numpy.format_parser\n", " numpy.generic\n", " numpy.bool_\n", " numpy.datetime64\n", " numpy.flexible\n", " numpy.character\n", " numpy.bytes_(builtins.bytes, numpy.character)\n", " numpy.str_(builtins.str, numpy.character)\n", " numpy.void\n", " numpy.record\n", " numpy.number\n", " numpy.inexact\n", " numpy.complexfloating\n", " numpy.complex128(numpy.complexfloating, builtins.complex)\n", " numpy.complex256\n", " numpy.complex64\n", " numpy.floating\n", " numpy.float128\n", " numpy.float16\n", " numpy.float32\n", " numpy.float64(numpy.floating, builtins.float)\n", " numpy.integer\n", " numpy.signedinteger\n", " numpy.int16\n", " numpy.int32\n", " numpy.int64\n", " numpy.int8\n", " numpy.longlong\n", " numpy.timedelta64\n", " numpy.unsignedinteger\n", " numpy.uint16\n", " numpy.uint32\n", " numpy.uint64\n", " numpy.uint8\n", " numpy.ulonglong\n", " numpy.object_\n", " numpy.iinfo\n", " numpy.ndarray\n", " numpy.chararray\n", " numpy.matrix\n", " numpy.memmap\n", " numpy.recarray\n", " numpy.ndenumerate\n", " numpy.ndindex\n", " numpy.nditer\n", " numpy.poly1d\n", " numpy.ufunc\n", " numpy.vectorize\n", " builtins.str(builtins.object)\n", " numpy.str_(builtins.str, numpy.character)\n", " contextlib.ContextDecorator(builtins.object)\n", " numpy.errstate\n", " \n", " class AxisError(builtins.ValueError, builtins.IndexError)\n", " | AxisError(axis, ndim=None, msg_prefix=None)\n", " | \n", " | Axis supplied was invalid.\n", " | \n", " | Method resolution order:\n", " | AxisError\n", " | builtins.ValueError\n", " | builtins.IndexError\n", " | builtins.LookupError\n", " | builtins.Exception\n", " | builtins.BaseException\n", " | builtins.object\n", " | \n", " | Methods defined here:\n", " | \n", " | __init__(self, axis, ndim=None, msg_prefix=None)\n", " | Initialize self. See help(type(self)) for accurate signature.\n", " | \n", " | ----------------------------------------------------------------------\n", " | Data descriptors defined here:\n", " | \n", " | __weakref__\n", " | list of weak references to the object (if defined)\n", " | \n", " | ----------------------------------------------------------------------\n", " | Static methods inherited from builtins.ValueError:\n", " | \n", " | __new__(*args, **kwargs) from builtins.type\n", " | Create and return a new object. See help(type) for accurate signature.\n", " | \n", " | ----------------------------------------------------------------------\n", " | Methods inherited from builtins.BaseException:\n", " | \n", " | __delattr__(self, name, /)\n", " | Implement delattr(self, name).\n", " | \n", " | __getattribute__(self, name, /)\n", " | Return getattr(self, name).\n", " | \n", " | __reduce__(...)\n", " | Helper for pickle.\n", " | \n", " | __repr__(self, /)\n", " | Return repr(self).\n", " | \n", " | __setattr__(self, name, value, /)\n", " | Implement setattr(self, name, value).\n", " | \n", " | __setstate__(...)\n", " | \n", " | __str__(self, /)\n", " | Return str(self).\n", " | \n", " | with_traceback(...)\n", " | Exception.with_traceback(tb) --\n", " | set self.__traceback__ to tb and return self.\n", " | \n", " | ----------------------------------------------------------------------\n", " | Data descriptors inherited from builtins.BaseException:\n", " | \n", " | __cause__\n", " | exception cause\n", " | \n", " | __context__\n", " | exception context\n", " | \n", " | __dict__\n", " | \n", " | __suppress_context__\n", " | \n", " | __traceback__\n", " | \n", " | args\n", " \n", " class ComplexWarning(builtins.RuntimeWarning)\n", " | The warning raised when casting a complex dtype to a real dtype.\n", " | \n", " | As implemented, casting a complex number to a real discards its imaginary\n", " | part, but this behavior may not be what the user actually wants.\n", " | \n", " | Method resolution order:\n", " | ComplexWarning\n", " | builtins.RuntimeWarning\n", " | builtins.Warning\n", " | builtins.Exception\n", " | builtins.BaseException\n", " | builtins.object\n", " | \n", " | Data descriptors defined here:\n", " | \n", " | __weakref__\n", " | list of weak references to the object (if defined)\n", " | \n", " | ----------------------------------------------------------------------\n", " | Methods inherited from builtins.RuntimeWarning:\n", " | \n", " | __init__(self, /, *args, **kwargs)\n", " | Initialize self. See help(type(self)) for accurate signature.\n", " | \n", " | ----------------------------------------------------------------------\n", " | Static methods inherited from builtins.RuntimeWarning:\n", " | \n", " | __new__(*args, **kwargs) from builtins.type\n", " | Create and return a new object. See help(type) for accurate signature.\n", " | \n", " | ----------------------------------------------------------------------\n", " | Methods inherited from builtins.BaseException:\n", " | \n", " | __delattr__(self, name, /)\n", " | Implement delattr(self, name).\n", " | \n", " | __getattribute__(self, name, /)\n", " | Return getattr(self, name).\n", " | \n", " | __reduce__(...)\n", " | Helper for pickle.\n", " | \n", " | __repr__(self, /)\n", " | Return repr(self).\n", " | \n", " | __setattr__(self, name, value, /)\n", " | Implement setattr(self, name, value).\n", " | \n", " | __setstate__(...)\n", " | \n", " | __str__(self, /)\n", " | Return str(self).\n", " | \n", " | with_traceback(...)\n", " | Exception.with_traceback(tb) --\n", " | set self.__traceback__ to tb and return self.\n", " | \n", " | ----------------------------------------------------------------------\n", " | Data descriptors inherited from builtins.BaseException:\n", " | \n", " | __cause__\n", " | exception cause\n", " | \n", " | __context__\n", " | exception context\n", " | \n", " | __dict__\n", " | \n", " | __suppress_context__\n", " | \n", " | __traceback__\n", " | \n", " | args\n", " \n", " class DataSource(builtins.object)\n", " | DataSource(destpath='.')\n", " | \n", " | DataSource(destpath='.')\n", " | \n", " | A generic data source file (file, http, ftp, ...).\n", " | \n", " | DataSources can be local files or remote files/URLs. The files may\n", " | also be compressed or uncompressed. DataSource hides some of the\n", " | low-level details of downloading the file, allowing you to simply pass\n", " | in a valid file path (or URL) and obtain a file object.\n", " | \n", " | Parameters\n", " | ----------\n", " | destpath : str or None, optional\n", " | Path to the directory where the source file gets downloaded to for\n", " | use. If `destpath` is None, a temporary directory will be created.\n", " | The default path is the current directory.\n", " | \n", " | Notes\n", " | -----\n", " | URLs require a scheme string (``http://``) to be used, without it they\n", " | will fail::\n", " | \n", " | >>> repos = np.DataSource()\n", " | >>> repos.exists('www.google.com/index.html')\n", " | False\n", " | >>> repos.exists('http://www.google.com/index.html')\n", " | True\n", " | \n", " | Temporary directories are deleted when the DataSource is deleted.\n", " | \n", " | Examples\n", " | --------\n", " | ::\n", " | \n", " | >>> ds = np.DataSource('/home/guido')\n", " | >>> urlname = 'http://www.google.com/'\n", " | >>> gfile = ds.open('http://www.google.com/')\n", " | >>> ds.abspath(urlname)\n", " | '/home/guido/www.google.com/index.html'\n", " | \n", " | >>> ds = np.DataSource(None) # use with temporary file\n", " | >>> ds.open('/home/guido/foobar.txt')\n", " |