Got Pandas?

Practical Data Wrangling with Pandas

Keith E. Maull, PhD

Pandas is ...

  • fast and built on top of NumPy, allowing easy import and conversion of data to / from NumPy

and provides ...

  • flexible, consistent data import and export from a wide array of sources, including SQL, CSV, EXCEL, etc.
  • tabular / matrix data representation with heterogeneous labeled or unlabeled columns
  • intuitive handling of missing data
  • sophisticated slicing, indexing and subsetting of data
  • support for hierarchical labeling of data
  • support for time series data, including time/date conversion, moving windows, etc.
  • and much more ...

GOT

?

Why Pandas?

Pandas has become known as the go-to library in the Python data science stack.

Pandas is not a replacement for NumPy, but rather a supplement to it.

With its sophisticated indexing, it becomes a more powerful way to access and prepare data for analysis in NumPy, and in many cases it will become a necessary compliment to the features already provided by NumPy.

Pandas brings the fun back into data engineering, and once mastered is one of many tools that will be required for doing high quality data analysis in Python.

How Pandas?

Pandas can be installed in Python 2 and Python 3, though it is recommended to use Python 3 as Python 2 will soon lose support and updates.

  • If you've installed Anaconda then you need do nothing -- Pandas is installed by default in the conda stack.

  • If you want, you can install Pandas via binaries from Pypi or you can install via pip:

pip install pandas

For (a lot) more about installation, please see:

... on to Part I: Pandas Data Structures.