Improve your Python Skills
Are you a student learning programming with Python? An experienced Java developer having to jump into a Python codebase? Every Python enthusiast started by writing mediocre code and got better by practicing. Here are the few advices that got me from writing, quite frankly, bad code to code that I'm proud of and gets deployed to thousands of servers.
Embrace the habits of the community
There is this saying that "the language is just a tool". What people often mean is that computer science is based on core concepts that stand true no matter the language. A hash table has roughly the same properties in every language or your knowledge of threads is still relevant in Java, Python and even Go.
People often miss the fact that a language is more than a mean to feed instructions to a CPU. More than anything a language is a community and its habits. Anyone writing code is free to adhere to the community habits or not. However when people start interacting with each other having a common ground comes very handy. In Python, people often refer to this common ground as code being "idiomatic or "Pythonic".
The Python community is a social entity that evolves over time, and with it its habits. What was considered good practice a few years ago may be to avoid now. You can clearly see this evolutions in the Python standard library: compare the deprecated asyncore and the new asyncio. They visually look different, one has code commented out, probably from a time when version control was not so popular, the other use way more docstrings.
Read lots of code
If there is one point to remember from the all the advices is this one: read as much code as you can. Reading other's code serves multiple purposes.
You will learn how to organize and format your code. You will discover new modules from the standard library that you never thought existed. Likewise you will discover awesome third party libraries that can be reused quickly. You will find what makes code easy or difficult to read for others.
You will also train yourself to jump into new codebases. After the tour of the office and the introduction to new colleagues around a coffee, one of the first things someone starting a new job has to do is getting familiar with the company code. This can be overwhelming as the codebase may be huge, written by many developers and with some legacy parts. All the training you can get beforehand will prepare you for this.
A huge benefit of GitHub being the central point where almost all open source software is developed is that it has never been easier to read some code.
I remember at my first real job after graduating, I was amazed by a coworker: while I was struggling to find some information in the documentation of an open source project, he was just going through the code. He was usually faster than me and his answers were way more accurate. I realized that reading code is definitely not a skill they teach at school!
Work with people
Which brings me to working with people. Getting code reviewed by peers is one of the best ways to improve. Even two beginners reviewing each others can help to grasp new concepts and develop healthy habits.
There is a reason why even small companies embrace code reviews and agile methodologies, working with people helps to produce better quality code.
Sitting close to someone can also help progress in unexpected ways. For instance the way people use their development tools: I used to code in Python using vim until I saw another coworker using PyCharm, at that point I changed my tools and it really made a difference.
Use an IDE that helps you
Developers tend to be very attached to their text editor. If you haven't found your editor yet and are getting serious about Python, I highly encourage you to take a look at PyCharm.
PyCharm comes with sane defaults for developing Python. A beginner can learn a lot on writing idiomatic code by just looking at the suggestions PyCharm makes.
PyCharm has a community edition which is free to use and already very good. Most developers I know start with it and end up buying the paid version, mostly to encourage these friendly people from Czech Republic.
Now don't get me wrong: there is nothing PyCharm gives you that is not feasible with a well tuned vim or emacs. Hell, you don't even need syntax highlighting to write awesome code, have you ever seen David Beazley codding?
Know your PEP8
PEP8 is a document providing a set of guidelines about code style that developers are free to follow or not.
Nowadays most open source and enterprise software tend to follow PEP8 to some extend, sometimes relaxing a few constraints like the line length.
Learning how to write PEP8 compliant code will help you the day you need to submit a pull request to an open source library. It is common for projects with many developers working on the same codebase to verify that each commit follows the guidelines. They often execute a tool like pep8, flake8 or pyflakes during the Continuous Integration build. If the code you write is not compliant, it cannot be merged.
Style guidelines have been an issue in many languages for decades. Newer languages like Go and Rust solve this problem by having a tool that automatically formats the code in the recommended way, putting an end to endless discussions.
Know the difference between bytes and str
Python 3 puts a clear delimitation on what is raw data used as bytes and what represents text used as str.
Data containing text can be decoded using its decode method and text that must be stored or sent to the network can be encoded as bytes using the encode method.
A common pattern in Python is to manipulate bytes at the application boundaries between other systems (network, file system or other processes) and manipulate str in the core of the application.
The key is to never loose track of what you hold. Because Python 2 used the same type for both it was hard to keep track of that, in Python 3 it is way easier once you have grasped this concept.
Use type annotations
If you target Python 3.5 and above you can use type annotations. It is a somehow controversial syntax that explicitly documents the type of functions arguments, return values and variables.
def read_config(file: str) -> dict:
with open(file, 'r') as f:
return json.load(f)
This example shows that clear naming and the extra 13 characters are enough to document was the function does. Some tools like IDEs or mypy can use these annotations to catch bugs during development.
These annotations coupled with static analysis help prevent errors that are typically happening at runtime, like TypeError or AttributeError. Pretty much like with a statically typed language.
Annotations are completely optional and have no impact on performance, the Python interpreter does not take them into account at all. They are only useful for you and your tools.
Avoid overusing classes
This is one of the strangest part of Python. It is an object oriented language, in Python literally everything is an object: a class is an object, a function is an object, a module is an object... Yet in practice, Python code does not use classes that often.
I once was reviewing the code of a developer that I didn't know. After a few seconds of reading his code I could swear that he had a strong Java background: everything was wrapped into classes. After asking him he told me that he knew mostly Java and this code was literally his first lines of Python. Indeed object oriented languages can mean widely different things.
Again the standard library itself is a good example: more often than not a functionality is provided as a function in a module rather than a method in a class:
import json
jsonified = json.dumps([1, 2, 3])
While it is common in other languages to have this pattern:
from json import Json
json = Json()
jsonified = json.dumps([1, 2, 3])
In practice Python classes are used when some data share a set of functions. When there is no function associated with the data it is common to see the usage of classes hidden through a namedtuple.
Avoid overusing packages
In Python a file ending in .py is called a module and a directory containing modules a package.
When I started Python I had this inexplicable need of organizing my code like I would organize my music library: as a deeply nested set of directories. I spent more time moving files around and fighting against circular imports than actually writing code.
Then I slowly realized that it didn't make my code any easier to manage. I looked at the layout of many open source projects, big and small, and came to the conclusion that most of them keep the hierarchy very flat. A prime example of it being the standard library itself.
A word of caution though: Django code is organized around directories. Do not break your application just because you read on a blog that less is better.
Understand how packaging works
Python has a great standard library, but it is nothing in comparison of the 100.000 libraries and applications available on PyPI, just a pip install away.
There is a catch however, Python is a language first and turned into an ecosystem only recently. Which means that the tooling and practices are not top of the line. But it's moving in the right direction, really.
Now if you do not know what Python virtual environments are, read the article about Virtualenvs in the Hitchhiker’s Guide to Python. Virtual environment are a bit like cloud servers: you create one, install a lot of crap in it to try something and get rid of it as soon as you are done. They allow to fearlessly install and try new dependencies without messing with your system.
If you want to publish your work on PyPI have a look at the Python Packaging User Guide and of course read the setup.py of your favorite library.
Forget about Python 2 if you can
From my experience, small to medium Python shops largely settled on Python 3. The release of Python 3.5 has been such a turning point in the history of the language that now I don't see any more project being started in 2.7.
I recently had to port a codebase from Python 2.7 to 3.5, the effort needed to rewire my brain to legacy Python made me realize how good Python 3 is and how well the abstraction it provides over data and strings fits my way of thinking.
I love Python 3 and I love being away of Python 2.