Blog Home Categories
  • Announcements
  • Application Testing / QA
  • Architecture
  • Backend
  • Business
  • Cloud
  • Data Platforms and Visualization
  • Design UX/UI
  • Finance
  • Front-end
  • Healthcare
  • Machine Learning and AI
  • Mobile
  • Technology
CONTACT
Share
Cloud, Data Platforms and Visualization, Machine Learning and AI, Technology

How to reuse custom Python libraries across AWS Glue jobs: A step-by-step guide

Rajmohan Krishna Moorthy · August 25, 2021 · 2 mins read

AWS Glue initially supported a limited number of Python libraries. We had issues when we had to use other Python libraries like pandas or Paramiko. Furthermore, we experienced more trouble when we shared or reused custom libraries/modules across different Glue jobs. 

But we solved it! Here’s how.

The solution lies in wheel files (Python files with a .whl extension). Glue started supporting custom-built wheel files recently and this allowed us to import external libraries or even our own custom modules/libraries easily into AWS Glue.

What are wheel files?

Wheels are a component of the Python ecosystem that help make package installations faster  and provide more stability in the package distribution process. A ‘wheel’ file is basically a ZIP-format archive with a specially formatted filename and the .whl extension. It is designed to contain all the files for a PEP 376 compatible installation in a way that is very close to the on-disk format.

How to create wheel files?

We can build our python code as wheel formatted files. To do this, we need to follow a folder structure with a “setup.py” file. 

Folder Structure:

    – module

        – module-named-folder

            – class.py

            – __init__.py (empty file)

        – setup.py

    Eg : 

        util

            – util_module

                – __init__.py

                – common_util.py

                – date_util.py

            – setup.py

Here’s a sample setup.py

Sample setup.py

To build your code as a wheel file, run the below command.

> python setup.py bdist_wheel

It will create build, dist, and util_module.egg-info folders. The dist folder will have the wheel file (“*.whl”). Now, we can add this wheel file to the Glue job.

Adding wheel files to a Glue Job

Navigate to AWS Glue > Jobs > Click ‘Add Job’ button

Adding wheel files to a Glue Job

Now, here’s how we import the reusable wheel file in a Glue job

Importing a reusable wheel file in a Glue job

Now, let’s consider a different use case, where we need to use external packages across several Glue jobs.

I had a scenario where I wanted to use the ‘Paramiko’ library to connect my SFTP server from my Glue Python job. To use this in my Glue job, I cloned the code from GitHub and used the “setup.py” to create a .whl file for that library. Here are the steps that I followed.

  1. Git clone ‘https://github.com/paramiko/paramiko/’
  2. cd paramiko
  3. python setup.py bdist_wheel

After execution, you can see the “paramiko-2.7.2-py2.py3-none-any.whl” file in the dist folder. Upload this to a bucket in S3 and now we can use this file in your Glue job as Python lib path “–extra-py-files” 

Now navigate to AWS Glue > Jobs > Click ‘Add Job’ button.

AWS Glue > Jobs > Click ‘Add Job’ button

Here’s how we import the reusable wheel file in a Glue job

Importing a reusable wheel file in a Glue job

References

  • https://aws.amazon.com/about-aws/whats-new/2019/09/aws-glue-now-supports-wheel-files-as-dependencies-for-glue-python-shell-jobs/
  • https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html

 

 

Write a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Contact Us

Click here to message us
Copyright © Ideas2IT | All Rights Reserved
  • Fort Worth
  • Houston
  • Austin
  • Dallas
  • Blog
  • Resources
  • Platforms
  • Ideas2IT Foundation
  • IdeaNuggets
  • LinkedIn
  • Facebook
  • Twitter
  • Instagram

USA

Ideas2IT Technologies LLC
5717, Legacy Drive, Suite 250
Plano, TX 75024, USA

Phone: 1 (844) 987-4332

India

Olympia Square
Water Works Road
TVK Industrial Estate
Guindy, Chennai 600032
Tamil Nadu, India

Contact Us

  • contactus@ideas2it.com
  • careers@ideas2it.com

USA

Ideas2IT Technologies LLC
5717, Legacy Drive, Suite 250
Plano, TX 75024, USA

Phone: 1 (844) 987-4332

India

Olympia Square
Water Works Road
TVK Industrial Estate
Guindy, Chennai 600032
Tamil Nadu, India

  • Fort Worth
  • Houston
  • Austin
  • Dallas
  • Blog
  • Resources
  • Platforms
  • Ideas2IT Foundation
  • IdeaNuggets
  • LinkedIn
  • Facebook
  • Twitter
  • Instagram
Copyright © Ideas2IT | All Rights Reserved
Privacy Policy
contact us
  • BECOME OUR CLIENT
  • PARTNER WITH US
  • JOIN US
[contact-form-7 id="976" title="BECOME OUR CLIENT"]
[contact-form-7 id="985" title="PARTNER WITH US"]
[contact-form-7 id="986" title="JOIN US"]