Introduction to Machine Learning with IPython and scikit-learn

Workshop description

The purpose of this workshop is to get a first hands on experience in building predictive models using the PyData stack (NumPy, scikit-learn, IPython and maybe a bit of pandas if we have time).

Scikit-learn is a versatile Machine Learning library for Python that blends well with the NumPy and SciPy ecosystem and is used by a growing user-base of both academic researchers and data scientists and engineers in the tech industry.

IPython with its notebook interface is an interactive programming environment that is particularly well suited for data exploration, modelling and sharing of analysis results notably via nbviewer.ipython.org.

Target audience

Anyone who wants to learn more about how data can be visualized in Python. Some previous coding experience is necessary. Basic Python knowledge is recommended. People new to Python programming should have a look at a tutorial such as http://docs.python.org/2/tutorial/ prior to the workshop.

Prerequisites

Bring your own on laptop and make sure to install the following open source packages:

- Python 2.7 (Python 3.3 should work as well but is not yet as mainstream as 2.7)
- NumPy 1.7.1+
- SciPy 0.12+
- scikit-learn 0.14.1+
- IPython 1.1+
- pandas 0.12+

Some of those libraries can be hard to build and install with cross-platform installers like pip or easy_install. It is therefore recommended to use the Anaconda binary distribution to download all those projects at once with a single free installer (no registration required but the download is ~250MB large, so please do it in advance):

http://continuum.io/downloads

Choose the installer that matches your operating system.

Once installed, you should be able to check that the “python” command in your PATH is able to load all the required libraries in correct versions by typing in command prompt:

python –version
python -c “import numpy; print(numpy.__version__)”
python -c “import scipy; print(scipy.__version__)”
python -c “import pandas; print(pandas.__version__)”
python -c “import sklearn; print(sklearn.__version__)”

You should also be able to launch IPython notebook using this command:

ipython notebook

Executing that command should open a new browser window with the notebook interface.

If any get any error message performing the above, please feel free to send me an email at olivier.grisel@ensta.org with the details (your operating system versions, how you installed Python and the libraries and the detailed copy and paste of the error message your get).

Workshop length

2×1.5 hours

Presenter

Olivier Grisel
Szakmai bemutatkozó