ben_nuttall wrote:It's because Python's import hierarchy starts with files in the current folder before looking in the installed modules.
It is odd, though. I wonder if numpy should be using relative imports "import .number" rather than "import number"?
Indeed - the default Python search path looks something like this (from my Ubuntu machine):
Code: Select all
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
['', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/home/dave/.local/lib/python2.7/site-packages', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages/PILcompat', '/usr/lib/python2.7/dist-packages/gtk-2.0', '/usr/lib/pymodules/python2.7', '/usr/lib/python2.7/dist-packages/ubuntu-sso-client', '/usr/lib/python2.7/dist-packages/wx-2.8-gtk2-unicode']
You can see the first entry there is '' which simply means the current path. If you look through the stack trace from the top you can get an idea of what's going on:
- You run something like "python numbers.py" so the Python interpreter starts up and places your module as "__main__" in sys.modules (which is the cache of loaded modules)
- numbers.py executes "from astro_pi import AstroPi". Python first checks whether "astro_pi" is in the sys.modules cache; it isn't so Python searches for the astro_pi package in sys.path and when it finds it, places "astro_pi" in sys.modules
- astro_pi/__init__.py executes "from .astro_pi import AstroPi" which means it wants to import the astro_pi module from within its package (also called astro_pi). Python searches for "astro_pi.astro_pi" in sys.modules, doesn't find it, and so searches for it under the astro_pi package's path (the "." makes this a relative import so it doesn't search the whole of sys.path). It's found and bunged in sys.modules as "astro_pi.astro_pi"
- astro_pi/astro_pi.py executes "import numpy as np". Python searches for "numpy" in sys.modules, doesn't find it, and searches sys.path for it. It's found and added to sys.modules as "numpy"
- numpy/__init__.py executes "from . import add_newdocs". This is another relative import so Python searches sys.modules for "numpy.add_newdocs", doesn't find it, then searches numpy's path for add_newdocs.py, finds it and adds it to sys.modules as "numpy.add_newdocs"
- Dave's fingers get tired from all this typing and I'm sure you've got the idea by now so things continue like this until...
- numpy/core/numerictypes.py attempts to execute "import numbers". Python searches sys.modules for "numbers", doesn't find it, searches sys.path and finds it in the current directory. Python adds it to sys.modules as "numbers". Numpy gets a module it didn't expect containing stuff it doesn't want and things go "bang"!
Now, Ben pointed out numpy ought to be doing a relative import of numbers, like "import .numbers" or "from . import numbers" to ensure that only numpy's path is searched and not the whole sys.path. That would indeed be sensible ... if the numbers module that numpy is looking for was part of numpy itself. Unfortunately it's not. Instead it's looking for the numbers module
from the standard library; it's in Python 2 and 3 and it defines the abstract base types for numerals (integral, rational, real, etc.). Because numpy's looking for something outside itself it has no option but to use an absolute import, and thus the problem arises.
So, the solution is simply: don't name your scripts after top level packages in the standard library (and preferably top level third party packages too). Admittedly the standard library of Python is unusually large (the "batteries included" philosophy and all that) so this is harder than in other languages, but sticking with patterns like "my_numbers.py" or "numbers_1.py" tends to remove such problems (naming packages is a tougher problem!).
Now, onto Dougie's criticism and solution: just shove the current directory to the end of the sys.path and all will be well! In the scenario presented above that's absolutely correct. Unfortunately from the point of view of language design, it's just shoved the problem elsewhere...
Consider the case where the user now decides that "numbers" should be a module that their main script calls, so they've got a "main.py" and "numbers.py". They run "python main.py" and the first thing it does is "import numbers". Now Python searches sys.path and finds ... the standard library's numbers module instead of the user's expected module and things go "bang!" again. The core problem is still the same: the user has named a module the same thing as a top-level module/package - the only difference is when things go bang.
There are a number of "proper" solutions to this but for one reason or another none of them are going to happen:
- Ensure everything sits under a namespace unlikely to be duplicated. For example, stick the entire standard library in a "stdlib" package so instead of "import numbers" you'd have to do "import stdlib.numbers" and instead of "import os.path" you'd have "import stdlib.os.path". This is more or less Java's method and it's a rather good one. Unfortunately it's also a huge incompatible change so it's not going to happen.
- Introduce syntax to distinguish between imports of modules from the current directory (or below) and packages installed in the system. This is more or less C's method (the difference between #include <foo.h> and #include "foo.h"). However, that leads to nasty problems in the module cache (not a problem in C as modules don't get "cached" like that - although given C's complete lack of namespaces plenty of other problems can arise without careful naming!).
- Prevent imports from the current directory entirely (i.e. remove '' from sys.path). In other words, you're allowed a top level script (imported as __main__ as usual) but anything you want to import has to be installed as a package. Unfortunately this imposes a nasty burden on anyone wanting to break up a long script into manageable chunks and would probably just lead to people hacking '' back into sys.path anyway.
There's probably more but I can't think of them off the top of my head. Anyway, suffice it to say it's not a trivial problem that one can solve just by shuffling things around in sys.path. At this stage in Python's life (i.e. thoroughly mature and widely deployed) there's no good solution.
All I can say in its defence is that a pretty serious amount of thought has gone into the evolution of the import system over time: PEP-328
, and PEP-395
to name but a few. They're worth a read to get an idea of how subtle and difficult language design is.