I started learning Python programming few years ago because of the overall Eco-system. It has variety of frameworks for almost all major use-cases. Every organization is using Python to implement a software, either in a cloud or on-prem. With the exceptional collaboration in open source communities, there are tools available to code every component of a system. In my journey with Python so far, I have been organizing my projects into a structure that includes from infrastructure to tests to CI/CD. A modern Python project structure, in my experience, requires few elements so that teams can effectively collaborate and mainly increase their productivity. Even though this article refers to Python language and different tools specific to Python, the same principles in essence can be applied to a project in a different language. This post provides the details of setting up a modern Python project structure for mainly a cloud infrastructure
Prerequisites
- Basic understanding of agile software development life cycle
- Basic understanding of Python programming language (Python3)
- Basic understanding of cloud infrastructure and development
Assumptions
- Infrastructure on a cloud (AWS or Azure or GCP)
- Github as a source code repository
- VSCode as an IDE
Why is the modern project structure so important?
As you might be aware, every software follows a development cycle. In the recent days, it has been agile methodology. At the most basic level, every cycle goes through design, development, testing and release. Every agile team has few developers and they have to keep repeating these steps over and over. The team requires to reduce all the manual steps to be quick and efficient, using automation. The automation means writing a code for all the areas of software, from infrastructure to the release. This makes the structure of the project important because the source code becomes single source of truth
Infrastructure as code
This, in my opinion, is the most critical part of modern software development. Having the whole infrastructure as code helps to quickly replicate the cloud resources without any unknowns. I have seen manual release deployments in many cases. They are very error prone and adds major delays to the software rollout. I have been using CDKs (Cloud Development Kit); AWS CDK and CDKTF (CDK Terraform) to create IaC. It allows to write infrastructure in the multiple languages, including Python
The infrastructure of the project is at the base of the project so you are creating resource for your application component first. For example, for a microservice, the infrastructure code will create the resource and then deploy the application code for the service. My previous post shows this structure using AWS CDK
Tests
Unit tests and integration tests are required to ensure the quality of software delivery. Infrastructure as code using CDKs allows to write tests for the infrastructure. The tests are at the root of project structure, same as CDK and application code. You can use Pytest or Unittest. My previous post shows the tests folder structure, it can be extended to include integration tests in the same folder
You can create individual folders under tests for unit and integration. Also, respective folder can contain sub-folder for each module and it’s tests
Consistent coding style
Every developer writes source code differently and formats it differently. Any team with multiple developers will end up with source code in multiple styles. There are multiple tools to standardize the formatting and style
Enforce the standards using pre-commit hooks at the time of check-in and the merge
Source code documentation
Docstrings in Python code allows to generate documentation from the code. Enforce Docstring using Pydocstyle so that developers will require to add relevant documentation for the code. This will allow to generate documentation using tools such as pdoc3 and keep it up to date
CI/CD
Once you have the individual areas of the project in a source code, CI/CD will bring them together. Having infrastructure as code, coding standards and documentation standards in place, CI/CD pipeline is required to repetitively and consistently apply them for every change by every team member. I have personally used Jenkins, Azure DevOps, AWS CodePipeline, Github Actions, etc. You can pick the CI/CD tool of your preference. The most common workflows for the CI/CD using either of these tools are,
- Integration: This workflow is executed for every check-in and pull request. It validates the code, enforces all the standards, minimum test coverage and executes all the unit-test cases. If any of the steps fails, the check-in or the pull request is marked rejected in Github
- Release: This workflow is executed for every merge on a specific branch, development or stage or main, to build and release the changes into the respective environment. It deploys the infrastructure as well as the application binaries into the created cloud resources. Ensure that the environments are created in the Github repo and sensitive details related to the deployment is stored in secrets per environment
Github Pages
Github Pages allows to host a website for each repo. This can help to link the auto-generated API documentation to the Github page for the repo. Having Docstring for the source code up to date and generating documentation in the CI/CD pipeline, the repo website will always contain the latest documentation without any manual step
To enable the pages for your repo, go to settings and then pages. Select a branch and the path to the auto-generated documentation
Github settings
Besides all the elements described above, there are few important Github settings that can help to streamline the development process in a large team
- Branch protection rules: It allows to enforce merge and other rules for a branch in the repo. The most common rules that are applied through branch protection are as shown in the screen shot below
- Pull request template: For a team with multiple developers, pull request template is a way to create a checklist for the developers e.g. self-review is completed, Docstrings are included, test cases are included, etc. This link provides the details to create the template