Pyspark Import User Defined Module Or .py Files
Solution 1:
It turned out that since I'm submitting my application in client mode, then the machine I run the spark-submit
command from will run the driver program and will need to access the module files.
I added my module to the PYTHONPATH
environment variable on the node I'm submitting my job from by adding the following line to my .bashrc
file (or execute it before submitting my job).
export PYTHONPATH=$PYTHONPATH:/home/welshamy/modules
And that solved the problem. Since the path is on the driver node, I don't have to zip and ship the module with --py-files
or use sc.addPyFile()
.
The key to solving any pyspark module import error problem is understanding whether the driver or worker (or both) nodes need the module files.
Important
If the worker nodes need your module files, then you need to pass it as a zip archive with --py-files
and this argument must precede your .py file argument. For example, notice the order of arguments in these examples:
This is correct:
./bin/spark-submit --py-files wesam.zip mycode.py
this is not correct:
./bin/spark-submit mycode.py --py-files wesam.zip
Solution 2:
Put mycode.py and wesam.py in the same path location and try
sc.addPyFile("wesam.py")
It might work.
Post a Comment for "Pyspark Import User Defined Module Or .py Files"