skip to navigation
skip to content

Planet Python

Last update: April 07, 2020 07:47 AM UTC

April 07, 2020


Codementor

Flask Delicious Tutorial : Building a Library Management System Part 2 - Start With A Loaded Skeleton

In this part, we include some front-end libs to have a great start!

April 07, 2020 07:36 AM UTC


Mike Driscoll

Python 101 – Working with Strings

You will be using strings very often when you program. A string is a series of letters surrounded by single, double or triple quotes. Python 3 defines string as a “Text Sequence Type”. You can cast other types to a string using the built-in str() function.

In this article you will learn how to:

  • Create strings
  • String methods
  • String formatting
  • String concatenation
  • String slicing

Let’s get started by learning the different ways to create strings!

Creating Strings

Here are some examples of creating strings:

name = 'Mike'
first_name = 'Mike'
last_name = "Driscoll"
triple = """multi-line
string"""

When you use triple quotes, you may use three double quotes at the beginning and end of the string or three single quotes. Also, note that using triple quotes allows you to create multi-line strings. Any whitespace within the string will also be included.

Here is an example of converting an integer to a string:

>>> number = 5
>>> str(number)
'5'

In Python, backslashes can be used to create escape sequences. Here are a couple of examples:

  • \b – backspace
  • \n – line feed
  • \r – ASCII carriage return
  • \t – tab

There are several others that you can learn about if you read Python’s documentation.

You can also use backslashes to escape quotes:

>>> 'This string has a single quote, \', in the middle'
"This string has a single quote, ', in the middle"

If you did not have the backslash in the code above, you would receive a SyntaxError:

>>> 'This string has a single quote, ', in the middle'
Traceback (most recent call last):
  Python Shell, prompt 59, line 1
invalid syntax: <string>, line 1, pos 38

This occurs because the string ends at that second single quote. It is usually better to mix double and single quotes to get around this issue:

>>> "This string has a single quote, ', in the middle"
"This string has a single quote, ', in the middle"

In this case, you create the string using double quotes and put a single quote inside of it. This is especially helpful when working with contractions, such as “don’t”, “can’t”, etc.

Now let’s move along and see what methods you can use with strings!

String Methods

In Python, everything is an object. You will learn how useful this can be in chapter 18 when you learn about introspection. For now, just know that strings have methods (or functions) that you can call on them.

Here are three examples:

>>> name = 'mike'
>>> name.capitalize()
'Mike'
>>> name.upper()
'MIKE'
>>> 'MIke'.lower()
'mike'

The method names give you a clue as to what they do. For example, .capitalize() will change the first letter in the string to a capital letter.

To get a full listing of the methods and attributes that you can access, you can use Python’s built-in dir() function:

>>> dir(name)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__',
'__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__',
'__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize',
'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index',
'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable',
'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace',
'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip',
'swapcase', 'title', 'translate', 'upper', 'zfill']

The first third of the listing are special methods that are sometimes called “dunder methods” (AKA double-underscore methods) or “magic methods”. You can ignore these for now as they are used more for intermediate and advanced use-cases. The items in the list above that don’t have double-underscores at the beginning are the ones that you will probably use the most.

You will find that the .strip() and .split() methods are especially useful when parsing or manipulating text.

You can use .strip() and its variants, .rstrip() and .lstrip() to strip off white space from the string, including tab and new line characters. This is especially useful when you are reading in a text file that you need to parse.

In fact, you will often end up stripping end-of-line characters from strings and then using .split() on the result to parse out sub-strings.

Let’s do a little exercise where you will learn how to parse out the 2nd word in a string.

To start, here’s a string:

>>> my_string = 'This is a string of words'
'This is a string of words'

Now to get the parts of a string, you can call .split(), like this:

>>> my_string.split()
['This', 'is', 'a', 'string', 'of', 'words']

The result is a list of strings. Now normally you would assign this result to a variable, but for demonstration purposes, you can skip that part.

Instead, since you now know that the result is a string, you can use list slicing to get the second element:

>>> 'This is a string of words'.split()[1]
'is'

Remember, in Python, lists elements start at 0 (zero), so when you tell it you want element 1 (one), that is the second element in the list.

When doing string parsing for work, I personally have found that you can use the .strip() and .split() methods pretty effectively to get almost any data that you need. Occasionally you will find that you might also need to use Regular Expressions (regex), but most of the time these two methods are enough.

String Formatting

String formatting or string substitution is where you have a string that you would like to insert into another string. This is especially useful when you need to do a template, like a form letter. But you will use string substitution a lot for debugging output, printing to standard out and much more.

Python has three different ways to accomplish string formatting:

  • Using the % Method
  • Using .format()
  • Using formatted string literals (f-strings)

This book will focus on f-strings the most and also use .format() from time-to-time. But it is good to understand how all three work.

Let’s take a few moments to learn more about string formatting.

Formatting Strings Using %s (printf-style)

Using the % method is Python’s oldest method of string formatting. It is sometimes referred to as “printf-style string formatting”. If you have used C or C++ in the past, then you may already be familiar with this type of string substitution. For brevity, you will learn the basics of using % here.

Note: This type of formatting can be quirky to work with and has been known to lead to common errors such as failing to display Python tuples and dictionaries incorrectly. Using either of the other two methods is preferred in that case.

The most common use of using the % sign is when you would use %s, which means convert any Python object to a string using str().

Here is an example:

>>> name = 'Mike'
>>> print('My name is %s' % name)
My name is Mike

In this code, you take the variable name and insert it into another string using the special %s syntax. To make it work, you need to use % outside of the string followed by the string or variable that you want to insert.

Here is a second example that shows that you can pass in an int into a string and have it automatically converted for you:

>>> age = 18
>>> print('You must be at least %s to continue' % age)
You must be at least 18 to continue

This sort of thing is especially useful when you need to convert an object but don’t know what type it is.

You can also do string formatting with multiple variables. In fact, there are two ways to do this.

Here’s the first one:

>>> name = 'Mike'
>>> age = 18
>>> print('Hello %s. You must be at least %i to continue!' % (name, age))
Hello Mike. You must be at least 18 to continue!

In this example, you create two variables and use %s and %i. The %i indicates that you are going to pass an integer. To pass in multiple items, you use the percent sign followed by a tuple of the items to insert.

You can make this clearer by using names, like this:

>>> print('Hello %(name)s. You must be at least %(age)i to continue!' % {'name': name, 'age': age})
Hello Mike. You must be at least 18 to continue!

When the argument on the right side of the % sign is a dictionary (or another mapping type), then the formats in the string must refer to the parenthesized key in the dictionary. In other words, if you see %(name)s, then the dictionary to the right of the % must have a name key.

If you do not include all the keys that are required, you will receive an error:

>>> print('Hello %(name)s. You must be at least %(age)i to continue!' % {'age': age})
Traceback (most recent call last):
   Python Shell, prompt 23, line 1
KeyError: 'name'

For more information about using the printf-style string formatting, you should see the following link:

https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting

Now let’s move on to using the .format() method.

Formatting Strings Using .format()

Python strings have supported the .format() method for a long time. While this book will focus on using f-strings, you will find that .format() is still quite popular.

For full details on how formatting works, see the following:

https://docs.python.org/3/library/string.html#formatstrings

Let’s take a look at a few short examples to see how .format() works:

>>> age = 18
>>> name = 'Mike'
>>> print('Hello {}. You must be at least {} to continue!'.format(name, age))
Hello Mike. You must be at least 18 to continue!

This example uses positional arguments. Python looks for two instances of {} and will insert the variables accordingly. If you do not pass in enough arguments, you will receive an error like this:

>>> print('Hello {}. You must be at least {} to continue!'.format(age))
Traceback (most recent call last):
    Python Shell, prompt 33, line 1
IndexError: tuple index out of range

This error indicates that you do not have enough items inside the .format() call.

You can also use named arguments in a similar way to the previous section:

>>> age = 18
>>> name = 'Mike'
>>> print('Hello {name}. You must be at least {age} to continue!'.format(name=name, age=age))
Hello Mike. You must be at least 18 to continue!

Instead of passing a dictionary to .format(), you can pass in the parameters by name. In fact, if you do try to pass in a dictionary, you will receive an error:

>>> print('Hello {name}. You must be at least {age} to continue!'.format({'name': name, 'age': age}))
Traceback (most recent call last):
  Python Shell, prompt 34, line 1
KeyError: 'name'

There is a workaround for this though:

>>> print('Hello {name}. You must be at least {age} to continue!'.format(**{'name': name, 'age': age}))
Hello Mike. You must be at least 18 to continue!

This looks a bit weird, but in Python when you see a double asterisk (**) used like this, it means that you are passing named parameters to the function. So Python is converting the dictionary to name=name, age=age for you.

You can also use repeat a variable multiple times in the string using .format():

>>> name = 'Mike'
>>> print('Hello {name}. Why do they call you {name}?'.format(name=name))
Hello Mike. Why do they call you Mike?

Here you refer to {name} twice in the string and you are able to replace both of them using .format().

If you want, you can also interpolate values using numbers:

>>> print('Hello {1}. You must be at least {0} to continue!'.format(name, age))
Hello 18. You must be at least Mike to continue!

Because most things in Python start at 0 (zero), in this example you ended up passing the age to {1} and the name to {0}.

A common coding style when working with .format() is to create a formatted string and save it to a variable to be used later:

>>> age = 18
>>> name = 'Mike'
>>> greetings = 'Hello {name}. You must be at least {age} to continue!'
>>> greetings.format(name=name, age=age)
'Hello Mike. You must be at least 18 to continue!'

This allows you to reuse greetings and pass in updated values for name and age later on in your program.

You can also specify the string width and alignment:

>>> '{:<20}'.format('left aligned')
'left aligned        '
>>> '{:>20}'.format('right aligned')
'       right aligned'
>>> '{:^20}'.format('centered')
'      centered      '

Left aligned is the default. The colon (:) tells Python that you are going to apply some kind of formatting. In the first example, you are specifying that the string be left aligned and 20 characters wide. The second example is also 20 characters wide, but it is right aligned. Finally the ^ tells Python to center the string within the 20 character string.

If you want to pass in a variable like in the previous examples, here is how you would do that:

>>> '{name:^20}'.format(name='centered')
'      centered      '

Note that the name must come before the : inside of the {}.

At this point, you should be pretty familiar with the way .format() works.

Let’s go ahead and move along to f-strings!

Formatting Strings with f-strings

Formatted string literals or f-strings are strings that have an “f” at the beginning and curly braces inside of them that contain expressions, much like the ones you saw in the previous section. These expressions tell the f-string about any special processing that needs to be done to the inserted string, such as justification, float precision, etc.

The f-string was added in Python 3.6. You can read more about it and how it works by checking out PEP 498 here:

https://www.python.org/dev/peps/pep-0498/

The expressions that are contained inside of f-strings are evaluated at runtime. This makes it impossible to use an f-string as a docstring to a function, method or class if it contains an expression. The reason being that docstrings are defined at function definition time.

Let’s go ahead and look at a simple example:

>>> name = 'Mike'
>>> age = 20
>>> f'Hello {name}. You are {age} years old'
'Hello Mike. You are 20 years old'

Here you create the f-string by putting an “f” right before the single, double or triple quote that begins your string. Then inside of the string, you use the curly braces, {}, to insert variables into your string.

However, your curly braces must enclose something. If you create an f-string with empty braces, you will get an error:

>>> f'Hello {}. You are {} years old'
SyntaxError: f-string: empty expression not allowed

The f-string can do things that neither %s nor .format() can do though. Because of the fact that f-strings are evaluated at runtime, you can put any valid Python expression inside of them.

For example, you could increase the age variable:

>>> age = 20
>>> f'{age+2}'
'22'

Or call a method or function:

>>> name = 'Mike'
>>> f'{name.lower()}'
'mike'

You can also access dictionary values directly inside of an f-string:

>>> sample_dict = {'name': 'Tom', 'age': 40}
>>> f'Hello {sample_dict["name"]}. You are {sample_dict["age"]} years old'
'Hello Tom. You are 40 years old'

However, backslashes are not allowed in f-string expressions:

>>> print(f'My name is {name\n}')
SyntaxError: f-string expression part cannot include a backslash

But you can use backslashes outside of the expression in an f-string:

>>> name = 'Mike'
>>> print(f'My name is {name}\n')
My name is Mike

One other thing that you can’t do is add a comment inside of an expression in an f-string:

>>> f'My name is {name # name of person}'
SyntaxError: f-string expression part cannot include '#'

In Python 3.8, f-strings added support for =, which will expand the text of the expression to include the text of the expression plus the equal sign and then the evaluated expression. That sounds kind of complicated, so let’s look at an example:

>>> username = 'jdoe'
>>> f'Your {username=}'
"Your username='jdoe'"

This example demonstrates that the text inside of the expression, username= is added to the output followed by the actual value of username in quotes.

f-strings are very powerful and extremely useful. They will simplify your code quite a bit if you use them wisely. You should definitely give them a try.

Let’s find out what else you can do with strings!

String Concatenation

Strings also allow concatenation, which is a fancy word for joining two strings into one.

To concatenate strings together, you can use the + sign:

>>> first_string = 'My name is'
>>> second_string = 'Mike'
>>> first_string + second_string
'My name isMike'

Oops! It looks like the strings merged in a weird way because you forgot to add a space to the end of the first_string. You can change it like this:

>>> first_string = 'My name is '
>>> second_string = 'Mike'
>>> first_string + second_string
'My name is Mike'

Another way to merge strings is to use the .join() method. The .join() method accepts an iterable, such as a list, of strings and joins them together.

>>> first_string = 'My name is '
>>> second_string = 'Mike'
>>> ''.join([first_string, second_string])
'My name is Mike'

This will make the strings join right next to each other. You could put something inside of the string that you are joining though:

>>> '***'.join([first_string, second_string])
'My name is ***Mike'

In this case, it will join the first string to *** plus the second string.

More often than not, you can use an f-string rather than concatenation or .join() and the code will be easier to follow.

String Slicing

Slicing in strings works in much the same way that it does for Python lists. Let’s take the string “Mike”. The letter “M” is at position zero and the letter “e” is at position 3.

If you want to grab characters 0-3, you would use this syntax: my_string[0:4]

What that means is that you want the substring starting at position zero up to but not including position 4.

Here are a few examples:

>>> 'this is a string'[0:4]
'this'
>>> 'this is a string'[:4]
'this'
>>> 'this is a string'[-4:]
'ring'

The first example grabs the first four letters from the string and returns them. If you want to, you can drop the zero as that is the default and use [:4] instead, which is what example two does.

You can also use negative position values. So [-4:] means that you want to start at the end of the string and get the last four letters of the string.

You should play around with slicing on your own and see what other slices you can come up with.

Wrapping Up

Python strings are powerful and quite useful. They can be created using single, double or triple quotes. Strings are objects, so they have methods. You also learned about string concatenation, string slicing and three different methods of string formatting.

The newest flavor of string formatting is the f-string. It is also the most powerful and the currently preferred method for formatting strings.

Related Reading

The post Python 101 – Working with Strings appeared first on The Mouse Vs. The Python.

April 07, 2020 05:06 AM UTC


Podcast.__init__

Building The Seq Language For Bioinformatics

Bioinformatics is a complex and computationally demanding domain. The intuitive syntax of Python and extensive set of libraries make it a great language for bioinformatics projects, but it is hampered by the need for computational efficiency. Ariya Shajii created the Seq language to bridge the divide between the performance of languages like C and C++ and the ecosystem of Python with built-in support for commonly used genomics algorithms. In this episode he describes his motivation for creating a new language, how it is implemented, and how it is being used in the life sciences. If you are interested in experimenting with sequencing data then give this a listen and then give Seq a try!

Summary

Bioinformatics is a complex and computationally demanding domain. The intuitive syntax of Python and extensive set of libraries make it a great language for bioinformatics projects, but it is hampered by the need for computational efficiency. Ariya Shajii created the Seq language to bridge the divide between the performance of languages like C and C++ and the ecosystem of Python with built-in support for commonly used genomics algorithms. In this episode he describes his motivation for creating a new language, how it is implemented, and how it is being used in the life sciences. If you are interested in experimenting with sequencing data then give this a listen and then give Seq a try!

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, node balancers, a 40 Gbit/s public network, fast object storage, and a brand new managed Kubernetes platform, all controlled by a convenient API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they’ve got dedicated CPU and GPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on great conferences. And now, the events are coming to you, with no travel necessary! We have partnered with organizations such as ODSC, and Data Council. Upcoming events include the Observe 20/20 virtual conference on April 6th and ODSC East which has also gone virtual starting April 16th. Go to pythonpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
  • Your host as usual is Tobias Macey and today I’m interviewing Ariya Shajii about Seq, a programming language built for bioinformatics and inspired by Python

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what Seq is and your motivation for creating it?
    • What was lacking in other languages or libraries for your use case that is made easier by creating a custom language?
    • If someone is already working in Python, possibly using BioPython, what might motivate them to consider migrating their work to Seq?
  • Can you give an impression of the scope and nature of the tasks or projects that a biologist or geneticist might build with Seq?
  • What was your process for identifying and prioritizing features and algorithms that would be beneficial to the target audience?
  • For someone using Seq can you describe their workflow and how it might differ from performing the same task in Python?
  • How is Seq implemented?
    • What are some of the features that are included to simplify the work of bioinformatics?
    • What was your process of designing the language and runtime?
    • How has the scope or direction of the project evolved since it was first conceived?
  • What impact do you anticipate Seq having on the domain of bioinformatics and genomics?
  • What have you found to be the most interesting, unexpected, and/or challenging aspects of building a language for this problem domain?
  • What is in store for the future of Seq?

Keep In Touch

Picks

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

April 07, 2020 01:35 AM UTC

April 06, 2020


Go Deh

Spin the table: Solution!

An answer to a puzzle set by Matt Parker on his YouTube channel:



The Puzzle:


  1. A Circular table with positions 1..7
  2. Delegates numbered 1..7 arranged arbitrarily around the table.
  3. Delegates start with only one match to a correct seating position.
  4. Is it ALWAYS possible to spin the table and get two or more matches?
Find arrangements where it is not possible to spin the table to get more than one match.

The Model.

Let's number the positions on the table with integers 1 .. 7; and also give the delegates numbers 1 .. 7. If the delegate number equals the table number then that constitutes a seating match.

Storing the table order as a fixed list of [1, 2, ..7], the all the rotations of the table  can be modelled by the additional six rotations of that initial list found by popping the last number from the last rotation and appending to the front of the list, i.e:
[[1, 2, .. 7],
 [7, 1, .. 6],
 [6, 7, 1, .. 5],
 ..
 [2, 3, .. 7, 1]]

This is stored in variable spun.

The permutations of delegate numbers 1 .. 7 are all the possible orderings of delegates.
Every "base" perm will have six other perms which are just other rotations of the "base" ordering . If only the permutations generated with an initial 1 are used, (for example), then we can assume that any of the other six rotations of the permutation will also be an answer.

The Code.



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# -*- coding: utf-8 -*-
"""
Spin the table.

Created on Wed Mar 25 19:59:49 2020

@author: Paddy3118
"""

from itertools import permutations


table = list(range(1, 8))
spun = [] # spinning table positions
for i in range(7):
spun.append(table)
table = table[-1:] + table[:-1] # rotate once

def match_count(table, delegates):
return sum(1 for t, d in zip(table, delegates)
if t == d)

def max_matches(delegates):
return max(match_count(table, delegates) for table in spun)

def spin_the_table():
return [d for d in permutations(table)
if d[0] == 1 and match_count(table, d) == 1 and max_matches(d) < 2]


if __name__ == '__main__':
answer = spin_the_table()
print(f"Found {len(answer)} initial arrangements of the delegates:")
for a in answer:
print(' ', str(a)[1:-1])
print("The 6 rotations of each individual answer above are also solutions")

Answer:



Found 19 initial arrangements of the delegates:
1, 3, 5, 7, 2, 4, 6
1, 3, 6, 2, 7, 5, 4
1, 3, 7, 6, 4, 2, 5
1, 4, 2, 7, 6, 3, 5
1, 4, 6, 3, 2, 7, 5
1, 4, 7, 2, 6, 5, 3
1, 4, 7, 3, 6, 2, 5
1, 4, 7, 5, 3, 2, 6
1, 5, 2, 6, 3, 7, 4
1, 5, 4, 2, 7, 3, 6
1, 5, 7, 3, 6, 4, 2
1, 6, 2, 5, 7, 4, 3
1, 6, 4, 2, 7, 5, 3
1, 6, 4, 3, 7, 2, 5
1, 6, 4, 7, 3, 5, 2
1, 6, 5, 2, 4, 7, 3
1, 7, 4, 6, 2, 5, 3
1, 7, 5, 3, 6, 2, 4
1, 7, 6, 5, 4, 3, 2
The 6 rotations of each individual answer above are also solutions


Matt Parker's Solution Video: (He mentions my solution at time 05:00).

END.

April 06, 2020 05:35 PM UTC


Real Python

How to Make an Instagram Bot With Python and InstaPy

What do SocialCaptain, Kicksta, Instavast, and many other companies have in common? They all help you reach a greater audience, gain more followers, and get more likes on Instagram while you hardly lift a finger. They do it all through automation, and people pay them a good deal of money for it. But you can do the same thing—for free—using InstaPy!

In this tutorial, you’ll learn how to build a bot with Python and InstaPy, which automates your Instagram activities so that you gain more followers and likes with minimal manual input. Along the way, you’ll learn about browser automation with Selenium and the Page Object Pattern, which together serve as the basis for InstaPy.

In this tutorial, you’ll learn:

You’ll begin by learning how Instagram bots work before you build one.

Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.

How Instagram Bots Work

How can an automation script gain you more followers and likes? Before answering this question, think about how an actual person gains more followers and likes.

They do it by being consistently active on the platform. They post often, follow other people, and like and leave comments on other people’s posts. Bots work exactly the same way: They follow, like, and comment on a consistent basis according to the criteria you set.

The better the criteria you set, the better your results will be. You want to make sure you’re targeting the right groups because the people your bot interacts with on Instagram will be more likely to interact with your content.

For example, if you’re selling women’s clothing on Instagram, then you can instruct your bot to like, comment on, and follow mostly women or profiles whose posts include hashtags such as #beauty, #fashion, or #clothes. This makes it more likely that your target audience will notice your profile, follow you back, and start interacting with your posts.

How does it work on the technical side, though? You can’t use the Instagram Developer API since it is fairly limited for this purpose. Enter browser automation. It works in the following way:

  1. You serve it your credentials.
  2. You set the criteria for who to follow, what comments to leave, and which type of posts to like.
  3. Your bot opens a browser, types in https://instagram.com on the address bar, logs in with your credentials, and starts doing the things you instructed it to do.

Next, you’ll build the initial version of your Instagram bot, which will automatically log in to your profile. Note that you won’t use InstaPy just yet.

How to Automate a Browser

For this version of your Instagram bot, you’ll be using Selenium, which is the tool that InstaPy uses under the hood.

First, install Selenium. During installation, make sure you also install the Firefox WebDriver since the latest version of InstaPy dropped support for Chrome. This also means that you need the Firefox browser installed on your computer.

Now, create a Python file and write the following code in it:

 1 from time import sleep
 2 from selenium import webdriver
 3 
 4 browser = webdriver.Firefox()
 5 
 6 browser.get('https://www.instagram.com/')
 7 
 8 sleep(5)
 9 
10 browser.close()

Run the code and you’ll see that a Firefox browser opens and directs you to the Instagram login page. Here’s a line-by-line breakdown of the code:

This is the Selenium version of Hello, World. Now you’re ready to add the code that logs in to your Instagram profile. But first, think about how you would log in to your profile manually. You would do the following:

  1. Go to https://www.instagram.com/.
  2. Click the login link.
  3. Enter your credentials.
  4. Hit the login button.

The first step is already done by the code above. Now change it so that it clicks on the login link on the Instagram home page:

 1 from time import sleep
 2 from selenium import webdriver
 3 
 4 browser = webdriver.Firefox()
 5 browser.implicitly_wait(5)
 6 
 7 browser.get('https://www.instagram.com/')
 8 
 9 login_link = browser.find_element_by_xpath("//a[text()='Log in']")
10 login_link.click()
11 
12 sleep(5)
13 
14 browser.close()

Note the highlighted lines:

Run the script and you’ll see your script in action. It will open the browser, go to Instagram, and click on the login link to go to the login page.

On the login page, there are three important elements:

  1. The username input
  2. The password input
  3. The login button

Next, change the script so that it finds those elements, enters your credentials, and clicks on the login button:

 1 from time import sleep
 2 from selenium import webdriver
 3 
 4 browser = webdriver.Firefox()
 5 browser.implicitly_wait(5)
 6 
 7 browser.get('https://www.instagram.com/')
 8 
 9 login_link = browser.find_element_by_xpath("//a[text()='Log in']")
10 login_link.click()
11 
12 sleep(2)
13 
14 username_input = browser.find_element_by_css_selector("input[name='username']")
15 password_input = browser.find_element_by_css_selector("input[name='password']")
16 
17 username_input.send_keys("<your username>")
18 password_input.send_keys("<your password>")
19 
20 login_button = browser.find_element_by_xpath("//button[@type='submit']")
21 login_button.click()
22 
23 sleep(5)
24 
25 browser.close()

Here’s a breakdown of the changes:

  1. Line 12 sleeps for two seconds to allow the page to load.
  2. Lines 14 and 15 find username and password inputs by CSS. You could use any other method that you prefer.
  3. Lines 17 and 18 type your username and password in their respective inputs. Don’t forget to fill in <your username> and <your password>!
  4. Line 20 finds the login button by XPath.
  5. Line 21 clicks on the login button.

Run the script and you’ll be automatically logged in to to your Instagram profile.

You’re off to a good start with your Instagram bot. If you were to continue writing this script, then the rest would look very similar. You would find the posts that you like by scrolling down your feed, find the like button by CSS, click on it, find the comments section, leave a comment, and continue.

The good news is that all of those steps can be handled by InstaPy. But before you jump into using Instapy, there is one other thing that you should know about to better understand how InstaPy works: the Page Object Pattern.

How to Use the Page Object Pattern

Now that you’ve written the login code, how would you write a test for it? It would look something like the following:

def test_login_page(browser):
    browser.get('https://www.instagram.com/accounts/login/')
    username_input = browser.find_element_by_css_selector("input[name='username']")
    password_input = browser.find_element_by_css_selector("input[name='password']")
    username_input.send_keys("<your username>")
    password_input.send_keys("<your password>")
    login_button = browser.find_element_by_xpath("//button[@type='submit']")
    login_button.click()

    errors = browser.find_elements_by_css_selector('#error_message')
    assert len(errors) == 0

Can you see what’s wrong with this code? It doesn’t follow the DRY principle. That is, the code is duplicated in both the application and the test code.

Duplicating code is especially bad in this context because Selenium code is dependent on UI elements, and UI elements tend to change. When they do change, you want to update your code in one place. That’s where the Page Object Pattern comes in.

With this pattern, you create page object classes for the most important pages or fragments that provide interfaces that are straightforward to program to and that hide the underlying widgetry in the window. With this in mind, you can rewrite the code above and create a HomePage class and a LoginPage class:

from time import sleep

class LoginPage:
    def __init__(self, browser):
        self.browser = browser

    def login(self, username, password):
        username_input = self.browser.find_element_by_css_selector("input[name='username']")
        password_input = self.browser.find_element_by_css_selector("input[name='password']")
        username_input.send_keys(username)
        password_input.send_keys(password)
        login_button = browser.find_element_by_xpath("//button[@type='submit']")
        login_button.click()
        sleep(5)

class HomePage:
    def __init__(self, browser):
        self.browser = browser
        self.browser.get('https://www.instagram.com/')

    def go_to_login_page(self):
        self.browser.find_element_by_xpath("//a[text()='Log in']").click()
        sleep(2)
        return LoginPage(self.browser)

The code is the same except that the home page and the login page are represented as classes. The classes encapsulate the mechanics required to find and manipulate the data in the UI. That is, there are methods and accessors that allow the software to do anything a human can.

One other thing to note is that when you navigate to another page using a page object, it returns a page object for the new page. Note the returned value of go_to_log_in_page(). If you had another class called FeedPage, then login() of the LoginPage class would return an instance of that: return FeedPage().

Here’s how you can put the Page Object Pattern to use:

from selenium import webdriver

browser = webdriver.Firefox()
browser.implicitly_wait(5)

home_page = HomePage(browser)
login_page = home_page.go_to_login_page()
login_page.login("<your username>", "<your password>")

browser.close()

It looks much better, and the test above can now be rewritten to look like this:

def test_login_page(browser):
    home_page = HomePage(browser)
    login_page = home_page.go_to_login_page()
    login_page.login("<your username>", "<your password>")

    errors = browser.find_elements_by_css_selector('#error_message')
    assert len(errors) == 0

With these changes, you won’t have to touch your tests if something changes in the UI.

For more information on the Page Object Pattern, refer to the official documentation and to Martin Fowler’s article.

Now that you’re familiar with both Selenium and the Page Object Pattern, you’ll feel right at home with InstaPy. You’ll build a basic bot with it next.

Note: Both Selenium and the Page Object Pattern are widely used for other websites, not just for Instagram.

How to Build an Instagram Bot With InstaPy

In this section, you’ll use InstaPy to build an Instagram bot that will automatically like, follow, and comment on different posts. First, you’ll need to install InstaPy:

$ python3 -m pip install instapy

This will install instapy in your system.

Note: The best practice is to use virtual environments for every project so that the dependencies are isolated.

Essential Features

Now you can rewrite the code above with InstaPy so that you can compare the two options. First, create another Python file and put the following code in it:

from instapy import InstaPy

InstaPy(username="<your_username>", password="<your_password>").login()

Replace the username and password with yours, run the script, and voilà! With just one line of code, you achieved the same result.

Even though your results are the same, you can see that the behavior isn’t exactly the same. In addition to simply logging in to your profile, InstaPy does some other things, such as checking your internet connection and the status of the Instagram servers. This can be observed directly on the browser or in the logs:

INFO [2019-12-17 22:03:19] [username]  -- Connection Checklist [1/3] (Internet Connection Status)
INFO [2019-12-17 22:03:20] [username]  - Internet Connection Status: ok
INFO [2019-12-17 22:03:20] [username]  - Current IP is "17.283.46.379" and it's from "Germany/DE"
INFO [2019-12-17 22:03:20] [username]  -- Connection Checklist [2/3] (Instagram Server Status)
INFO [2019-12-17 22:03:26] [username]  - Instagram WebSite Status: Currently Up

Pretty good for one line of code, isn’t it? Now it’s time to make the script do more interesting things than just logging in.

For the purpose of this example, assume that your profile is all about cars, and that your bot is intended to interact with the profiles of people who are also interested in cars.

First, you can like some posts that are tagged #bmw or #mercedes using like_by_tags():

 1 from instapy import InstaPy
 2 
 3 session = InstaPy(username="<your_username>", password="<your_password>")
 4 session.login()
 5 session.like_by_tags(["bmw", "mercedes"], amount=5)

Here, you gave the method a list of tags to like and the number of posts to like for each given tag. In this case, you instructed it to like ten posts, five for each of the two tags. But take a look at what happens after you run the script:

INFO [2019-12-17 22:15:58] [username]  Tag [1/2]
INFO [2019-12-17 22:15:58] [username]  --> b'bmw'
INFO [2019-12-17 22:16:07] [username]  desired amount: 14  |  top posts [disabled]: 9  |  possible posts: 43726739
INFO [2019-12-17 22:16:13] [username]  Like# [1/14]
INFO [2019-12-17 22:16:13] [username]  https://www.instagram.com/p/B6MCcGcC3tU/
INFO [2019-12-17 22:16:15] [username]  Image from: b'mattyproduction'
INFO [2019-12-17 22:16:15] [username]  Link: b'https://www.instagram.com/p/B6MCcGcC3tU/'
INFO [2019-12-17 22:16:15] [username]  Description: b'Mal etwas anderes \xf0\x9f\x91\x80\xe2\x98\xba\xef\xb8\x8f Bald ist das komplette Video auf YouTube zu finden (n\xc3\xa4here Infos werden folgen). Vielen Dank an @patrick_jwki @thehuthlife  und @christic_  f\xc3\xbcr das bereitstellen der Autos \xf0\x9f\x94\xa5\xf0\x9f\x98\x8d#carporn#cars#tuning#bagged#bmw#m2#m2competition#focusrs#ford#mk3#e92#m3#panasonic#cinematic#gh5s#dji#roninm#adobe#videography#music#bimmer#fordperformance#night#shooting#'
INFO [2019-12-17 22:16:15] [username]  Location: b'K\xc3\xb6ln, Germany'
INFO [2019-12-17 22:16:51] [username]  --> Image Liked!
INFO [2019-12-17 22:16:56] [username]  --> Not commented
INFO [2019-12-17 22:16:57] [username]  --> Not following
INFO [2019-12-17 22:16:58] [username]  Like# [2/14]
INFO [2019-12-17 22:16:58] [username]  https://www.instagram.com/p/B6MDK1wJ-Kb/
INFO [2019-12-17 22:17:01] [username]  Image from: b'davs0'
INFO [2019-12-17 22:17:01] [username]  Link: b'https://www.instagram.com/p/B6MDK1wJ-Kb/'
INFO [2019-12-17 22:17:01] [username]  Description: b'Someone said cloud? \xf0\x9f\xa4\x94\xf0\x9f\xa4\xad\xf0\x9f\x98\x88 \xe2\x80\xa2\n\xe2\x80\xa2\n\xe2\x80\xa2\n\xe2\x80\xa2\n#bmw #bmwrepost #bmwm4 #bmwm4gts #f82 #bmwmrepost #bmwmsport #bmwmperformance #bmwmpower #bmwm4cs #austinyellow #davs0 #mpower_official #bmw_world_ua #bimmerworld #bmwfans #bmwfamily #bimmers #bmwpost #ultimatedrivingmachine #bmwgang #m3f80 #m5f90 #m4f82 #bmwmafia #bmwcrew #bmwlifestyle'
INFO [2019-12-17 22:17:34] [username]  --> Image Liked!
INFO [2019-12-17 22:17:37] [username]  --> Not commented
INFO [2019-12-17 22:17:38] [username]  --> Not following

By default, InstaPy will like the first nine top posts in addition to your amount value. In this case, that brings the total number of likes per tag to fourteen (nine top posts plus the five you specified in amount).

Also note that InstaPy logs every action it takes. As you can see above, it mentions which post it liked as well as its link, description, location, and whether the bot commented on the post or followed the author.

You may have noticed that there are delays after almost every action. That’s by design. It prevents your profile from getting banned on Instagram.

Now, you probably don’t want your bot liking inappropriate posts. To prevent that from happening, you can use set_dont_like():

from instapy import InstaPy

session = InstaPy(username="<your_username>", password="<your_password>")
session.login()
session.like_by_tags(["bmw", "mercedes"], amount=5)
session.set_dont_like(["naked", "nsfw"])

With this change, posts that have the words naked or nsfw in their descriptions won’t be liked. You can flag any other words that you want your bot to avoid.

Next, you can tell the bot to not only like the posts but also to follow some of the authors of those posts. You can do that with set_do_follow():

from instapy import InstaPy

session = InstaPy(username="<your_username>", password="<your_password>")
session.login()
session.like_by_tags(["bmw", "mercedes"], amount=5)
session.set_dont_like(["naked", "nsfw"])
session.set_do_follow(True, percentage=50)

If you run the script now, then the bot will follow fifty percent of the users whose posts it liked. As usual, every action will be logged.

You can also leave some comments on the posts. There are two things that you need to do. First, enable commenting with set_do_comment():

from instapy import InstaPy

session = InstaPy(username="<your_username>", password="<your_password>")
session.login()
session.like_by_tags(["bmw", "mercedes"], amount=5)
session.set_dont_like(["naked", "nsfw"])
session.set_do_follow(True, percentage=50)
session.set_do_comment(True, percentage=50)

Next, tell the bot what comments to leave with set_comments():

from instapy import InstaPy

session = InstaPy(username="<your_username>", password="<your_password>")
session.login()
session.like_by_tags(["bmw", "mercedes"], amount=5)
session.set_dont_like(["naked", "nsfw"])
session.set_do_follow(True, percentage=50)
session.set_do_comment(True, percentage=50)
session.set_comments(["Nice!", "Sweet!", "Beautiful :heart_eyes:"])

Run the script and the bot will leave one of those three comments on half the posts that it interacts with.

Now that you’re done with the basic settings, it’s a good idea to end the session with end():

from instapy import InstaPy

session = InstaPy(username="<your_username>", password="<your_password>")
session.login()
session.like_by_tags(["bmw", "mercedes"], amount=5)
session.set_dont_like(["naked", "nsfw"])
session.set_do_follow(True, percentage=50)
session.set_do_comment(True, percentage=50)
session.set_comments(["Nice!", "Sweet!", "Beautiful :heart_eyes:"])
session.end()

This will close the browser, save the logs, and prepare a report that you can see in the console output.

Additional Features in InstaPy

InstaPy is a sizable project that has a lot of thoroughly documented features. The good news is that if you’re feeling comfortable with the features you used above, then the rest should feel pretty similar. This section will outline some of the more useful features of InstaPy.

Quota Supervisor

You can’t scrape Instagram all day, every day. The service will quickly notice that you’re running a bot and will ban some of its actions. That’s why it’s a good idea to set quotas on some of your bot’s actions. Take the following for example:

session.set_quota_supervisor(enabled=True, peak_comments_daily=240, peak_comments_hourly=21)

The bot will keep commenting until it reaches its hourly and daily limits. It will resume commenting after the quota period has passed.

Headless Browser

This feature allows you to run your bot without the GUI of the browser. This is super useful if you want to deploy your bot to a server where you may not have or need the graphical interface. It’s also less CPU intensive, so it improves performance. You can use it like so:

session = InstaPy(username='test', password='test', headless_browser=True)

Note that you set this flag when you initialize the InstaPy object.

Using AI to Analyze Posts

Earlier you saw how to ignore posts that contain inappropriate words in their descriptions. What if the description is good but the image itself is inappropriate? You can integrate your InstaPy bot with ClarifAI, which offers image and video recognition services:

session.set_use_clarifai(enabled=True, api_key='<your_api_key>')
session.clarifai_check_img_for(['nsfw'])

Now your bot won’t like or comment on any image that ClarifAI considers NSFW. You get 5,000 free API-calls per month.

Relationship Bounds

It’s often a waste of time to interact with posts by people who have a lot of followers. In such cases, it’s a good idea to set some relationship bounds so that your bot doesn’t waste your precious computing resources:

session.set_relationship_bounds(enabled=True, max_followers=8500)

With this, your bot won’t interact with posts by users who have more than 8,500 followers.

For many more features and configurations in InstaPy, check out the documentation.

Conclusion

InstaPy allows you to automate your Instagram activities with minimal fuss and effort. It’s a very flexible tool with a lot of useful features.

In this tutorial, you learned:

Read the InstaPy documentation and experiment with your bot a little bit. Soon you’ll start getting new followers and likes with a minimal amount of effort. I gained a few new followers myself while writing this tutorial.

If there’s anything you’d like to ask or share, then please reach out in the comments below.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

April 06, 2020 02:00 PM UTC


Stack Abuse

Unpacking in Python: Beyond Parallel Assignment

Introduction

Unpacking in Python refers to an operation that consists of assigning an iterable of values to a tuple (or list) of variables in a single assignment statement. As a complement, the term packing can be used when we collect several values in a single variable using the iterable unpacking operator, *.

Historically, Python developers have generically referred to this kind of operation as tuple unpacking. However, since this Python feature has turned out to be quite useful and popular, it's been generalized to all kinds of iterables. Nowadays, a more modern and accurate term would be iterable unpacking.

In this tutorial, we'll learn what iterable unpacking is and how we can take advantage of this Python feature to make our code more readable, maintainable, and pythonic.

Additionally, we'll also cover some practical examples of how to use the iterable unpacking feature in the context of assignments operations, for loops, function definitions, and function calls.

Packing and Unpacking in Python

Python allows a tuple (or list) of variables to appear on the left side of an assignment operation. Each variable in the tuple can receive one value (or more, if we use the * operator) from an iterable on the right side of the assignment.

For historical reasons, Python developers used to call this tuple unpacking. However, since this feature has been generalized to all kind of iterable, a more accurate term would be iterable unpacking and that's what we'll call it in this tutorial.

Unpacking operations have been quite popular among Python developers because they can make our code more readable, and elegant. Let's take a closer look to unpacking in Python and see how this feature can improve our code.

Unpacking Tuples

In Python, we can put a tuple of variables on the left side of an assignment operator (=) and a tuple of values on the right side. The values on the right will be automatically assigned to the variables on the left according to their position in the tuple. This is commonly known as tuple unpacking in Python. Check out the following example:

>>> (a, b, c) = (1, 2, 3)
>>> a
1
>>> b
2
>>> c
3

When we put tuples on both sides of an assignment operator, a tuple unpacking operation takes place. The values on the right are assigned to the variables on the left according to their relative position in each tuple. As you can see in the above example, a will be 1, b will be 2, and c will be 3.

To create a tuple object, we don't need to use a pair of parentheses () as delimiters. This also works for tuple unpacking, so the following syntaxes are equivalent:

>>> (a, b, c) = 1, 2, 3
>>> a, b, c = (1, 2, 3)
>>> a, b, c = 1, 2, 3

Since all these variations are valid Python syntax, we can use any of them, depending on the situation. Arguably, the last syntax is more commonly used when it comes to unpacking in Python.

When we are unpacking values into variables using tuple unpacking, the number of variables on the left side tuple must exactly match the number of values on the right side tuple. Otherwise, we'll get a ValueError.

For example, in the following code, we use two variables on the left and three values on the right. This will raise a ValueError telling us that there are too many values to unpack:

>>> a, b = 1, 2, 3
Traceback (most recent call last):
  ...
ValueError: too many values to unpack (expected 2)

Note: The only exception to this is when we use the * operator to pack several values in one variable as we'll see later on.

On the other hand, if we use more variables than values, then we'll get a ValueError but this time the message says that there are not enough values to unpack:

>>> a, b, c = 1, 2
Traceback (most recent call last):
  ...
ValueError: not enough values to unpack (expected 3, got 2)

If we use a different number of variables and values in a tuple unpacking operation, then we'll get a ValueError. That's because Python needs to unambiguously know what value goes into what variable, so it can do the assignment accordingly.

Unpacking Iterables

The tuple unpacking feature got so popular among Python developers that the syntax was extended to work with any iterable object. The only requirement is that the iterable yields exactly one item per variable in the receiving tuple (or list).

Check out the following examples of how iterable unpacking works in Python:

>>> # Unpacking strings
>>> a, b, c = '123'
>>> a
'1'
>>> b
'2'
>>> c
'3'
>>> # Unpacking lists
>>> a, b, c = [1, 2, 3]
>>> a
1
>>> b
2
>>> c
3
>>> # Unpacking generators
>>> gen = (i ** 2 for i in range(3))
>>> a, b, c = gen
>>> a
0
>>> b
1
>>> c
4
>>> # Unpacking dictionaries (keys, values, and items)
>>> my_dict = {'one': 1, 'two':2, 'three': 3}
>>> a, b, c = my_dict  # Unpack keys
>>> a
'one'
>>> b
'two'
>>> c
'three'
>>> a, b, c = my_dict.values()  # Unpack values
>>> a
1
>>> b
2
>>> c
3
>>> a, b, c = my_dict.items()  # Unpacking key-value pairs
>>> a
('one', 1)
>>> b
('two', 2)
>>> c
('three', 3)

When it comes to unpacking in Python, we can use any iterable on the right side of the assignment operator. The left side can be filled with a tuple or with a list of variables. Check out the following example in which we use a tuple on the right side of the assignment statement:

>>> [a, b, c] = 1, 2, 3
>>> a
1
>>> b
2
>>> c
3

It works the same way if we use the range() iterator:

>>> x, y, z = range(3)
>>> x
0
>>> y
1
>>> z
2

Even though this is a valid Python syntax, it's not commonly used in real code and maybe a little bit confusing for beginner Python developers.

Finally, we can also use set objects in unpacking operations. However, since sets are unordered collection, the order of the assignments can be sort of incoherent and can lead to subtle bugs. Check out the following example:

>>> a, b, c = {'a', 'b', 'c'}
>>> a
'c'
>>> b
'b'
>>> c
'a'

If we use sets in unpacking operations, then the final order of the assignments can be quite different from what we want and expect. So, it's best to avoid using sets in unpacking operations unless the order of assignment isn't important to our code.

Packing With the * Operator

The * operator is known, in this context, as the tuple (or iterable) unpacking operator. It extends the unpacking functionality to allow us to collect or pack multiple values in a single variable. In the following example, we pack a tuple of values into a single variable by using the * operator:

>>> *a, = 1, 2
>>> a
[1, 2]

For this code to work, the left side of the assignment must be a tuple (or a list). That's why we use a trailing comma. This tuple can contain as many variables as we need. However, it can only contain one starred expression.

We can form a stared expression using the unpacking operator, *, along with a valid Python identifier, just like the *a in the above code. The rest of the variables in the left side tuple are called mandatory variables because they must be filled with concrete values, otherwise, we'll get an error. Here's how this works in practice.

Packing the trailing values in b:

>>> a, *b = 1, 2, 3
>>> a
1
>>> b
[2, 3]

Packing the starting values in a:

>>> *a, b = 1, 2, 3
>>> a
[1, 2]
>>> b
3

Packing one value in a because b and c are mandatory:

>>> *a, b, c = 1, 2, 3
>>> a
[1]
>>> b
2
>>> c
3

Packing no values in a (a defaults to []) because b, c, and d are mandatory:

>>> *a, b, c, d = 1, 2, 3
>>> a
[]
>>> b
1
>>> c
2
>>> d
3

Supplying no value for a mandatory variable (e), so an error occurs:

>>> *a, b, c, d, e = 1, 2, 3
 ...
ValueError: not enough values to unpack (expected at least 4, got 3)

Packing values in a variable with the * operator can be handy when we need to collect the elements of a generator in a single variable without using the list() function. In the following examples, we use the * operator to pack the elements of a generator expression and a range object to a individual variable:

>>> gen = (2 ** x for x in range(10))
>>> gen
<generator object <genexpr> at 0x7f44613ebcf0>
>>> *g, = gen
>>> g
[1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
>>> ran = range(10)
>>> *r, = ran
>>> r
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In these examples, the * operator packs the elements in gen, and ran into g and r respectively. With his syntax, we avoid the need of calling list() to create a list of values from a range object, a generator expression, or a generator function.

Notice that we can't use the unpacking operator, *, to pack multiple values into one variable without adding a trailing comma to the variable on the left side of the assignment. So, the following code won't work:

>>> *r = range(10)
  File "<input>", line 1
SyntaxError: starred assignment target must be in a list or tuple

If we try to use the * operator to pack several values into a single variable, then we need to use the singleton tuple syntax. For example, to make the above example works, we just need to add a comma after the variable r, like in *r, = range(10).

Using Packing and Unpacking in Practice

Packing and unpacking operations can be quite useful in practice. They can make your code clear, readable, and pythonic. Let's take a look at some common use-cases of packing and unpacking in Python.

Assigning in Parallel

One of the most common use-cases of unpacking in Python is what we can call parallel assignment. Parallel assignment allows you to assign the values in an iterable to a tuple (or list) of variables in a single and elegant statement.

For example, let's suppose we have a database about the employees in our company and we need to assign each item in the list to a descriptive variable. If we ignore how iterable unpacking works in Python, we can get ourself writing code like this:

>>> employee = ["John Doe", "40", "Software Engineer"]
>>> name = employee[0]
>>> age = employee[1]
>>> job = employee[2]
>>> name
'John Doe'
>>> age
'40'
>>> job
'Software Engineer'

Even though this code works, the index handling can be clumsy, hard to type, and confusing. A cleaner, more readable, and pythonic solution can be coded as follows:

>>> name, age, job = ["John Doe", "40", "Software Engineer"]
>>> name
'John Doe'
>>> age
40
>>> job
'Software Engineer'

Using unpacking in Python, we can solve the problem of the previous example with a single, straightforward, and elegant statement. This tiny change would make our code easier to read and understand for newcomers developers.

Swapping Values Between Variables

Another elegant application of unpacking in Python is swapping values between variables without using a temporary or auxiliary variable. For example, let's suppose we need to swap the values of two variables a and b. To do this, we can stick to the traditional solution and use a temporary variable to store the value to be swapped as follows:

>>> a = 100
>>> b = 200
>>> temp = a
>>> a = b
>>> b = temp
>>> a
200
>>> b
100

This procedure takes three steps and a new temporary variable. If we use unpacking in Python, then we can achieve the same result in a single and concise step:

>>> a = 100
>>> b = 200
>>> a, b = b, a
>>> a
200
>>> b
100

In statement a, b = b, a, we're reassigning a to b and b to a in one line of code. This is a lot more readable and straightforward. Also, notice that with this technique, there is no need for a new temporary variable.

Collecting Multiple Values With *

When we're working with some algorithms, there may be situations in which we need to split the values of an iterable or a sequence in chunks of values for further processing. The following example shows how to uses a list and slicing operations to do so:

>>> seq = [1, 2, 3, 4]
>>> first, body, last = seq[0], seq[1:3], seq[-1]
>>> first, body, last
(1, [2, 3], 4)
>>> first
1
>>> body
[2, 3]
>>> last
4

Even though this code works as we expect, dealing with indices and slices can be a little bit annoying, difficult to read, and confusing for beginners. It has also the drawback of making the code rigid and difficult to maintain. In this situation, the iterable unpacking operator, *, and its ability to pack several values in a single variable can be a great tool. Check out this refactoring of the above code:

>>> seq = [1, 2, 3, 4]
>>> first, *body, last = seq
>>> first, body, last
(1, [2, 3], 4)
>>> first
1
>>> body
[2, 3]
>>> last
4

The line first, *body, last = seq makes the magic here. The iterable unpacking operator, *, collects the elements in the middle of seq in body. This makes our code more readable, maintainable, and flexible. You may be thinking, why more flexible? Well, suppose that seq changes its length in the road and you still need to collect the middle elements in body. In this case, since we're using unpacking in Python, no changes are needed for our code to work. Check out this example:

>>> seq = [1, 2, 3, 4, 5, 6]
>>> first, *body, last = seq
>>> first, body, last
(1, [2, 3, 4, 5], 6)

If we were using sequence slicing instead of iterable unpacking in Python, then we would need to update our indices and slices to correctly catch the new values.

The use of the * operator to pack several values in a single variable can be applied in a variety of configurations, provided that Python can unambiguously determine what element (or elements) to assign to each variable. Take a look at the following examples:

>>> *head, a, b = range(5)
>>> head, a, b
([0, 1, 2], 3, 4)
>>> a, *body, b = range(5)
>>> a, body, b
(0, [1, 2, 3], 4)
>>> a, b, *tail = range(5)
>>> a, b, tail
(0, 1, [2, 3, 4])

We can move the * operator in the tuple (or list) of variables to collect the values according to our needs. The only condition is that Python can determine to what variable assign each value.

It's important to note that we can't use more than one stared expression in the assignment If we do so, then we'll get a SyntadError as follows:

>>> *a, *b = range(5)
  File "<input>", line 1
SyntaxError: two starred expressions in assignment

If we use two or more * in an assignment expression, then we'll get a SyntaxError telling us that two-starred expression were found. This is that way because Python can't unambiguously determine what value (or values) we want to assign to each variable.

Dropping Unneeded Values With *

Another common use-case of the * operator is to use it with a dummy variable name to drop some useless or unneeded values. Check out the following example:

>>> a, b, *_ = 1, 2, 0, 0, 0, 0
>>> a
1
>>> b
2
>>> _
[0, 0, 0, 0]

For a more insightful example of this use-case, suppose we're developing a script that needs to determine the Python version we're using. To do this, we can use the sys.version_info attribute. This attribute returns a tuple containing the five components of the version number: major, minor, micro, releaselevel, and serial. But we just need major, minor, and micro for our script to work, so we can drop the rest. Here's an example:

>>> import sys
>>> sys.version_info
sys.version_info(major=3, minor=8, micro=1, releaselevel='final', serial=0)
>>> mayor, minor, micro, *_ = sys.version_info
>>> mayor, minor, micro
(3, 8, 1)

Now, we have three new variables with the information we need. The rest of the information is stored in the dummy variable _, which can be ignored by our program. This can make clear to newcomer developers that we don't want to (or need to) use the information stored in _ cause this character has no apparent meaning.

Note: By default, the underscore character _ is used by the Python interpreter to store the resulting value of the statements we run in an interactive session. So, in this context, the use of this character to identify dummy variables can be ambiguous.

Returning Tuples in Functions

Python functions can return several values separated by commas. Since we can define tuple objects without using parentheses, this kind of operation can be interpreted as returning a tuple of values. If we code a function that returns multiple values, then we can perform iterable packing and unpacking operations with the returned values.

Check out the following example in which we define a function to calculate the square and cube of a given number:

>>> def powers(number):
...     return number, number ** 2, number ** 3
...
>>> # Packing returned values in a tuple
>>> result = powers(2)
>>> result
(2, 4, 8)
>>> # Unpacking returned values to multiple variables
>>> number, square, cube = powers(2)
>>> number
2
>>> square
4
>>> cube
8
>>> *_, cube = powers(2)
>>> cube
8

If we define a function that returns comma-separated values, then we can do any packing or unpacking operation on these values.

Merging Iterables With the * Operator

Another interesting use-case for the unpacking operator, *, is the ability to merge several iterables into a final sequence. This functionality works for lists, tuples, and sets. Take a look at the following examples:

>>> my_tuple = (1, 2, 3)
>>> (0, *my_tuple, 4)
(0, 1, 2, 3, 4)
>>> my_list = [1, 2, 3]
>>> [0, *my_list, 4]
[0, 1, 2, 3, 4]
>>> my_set = {1, 2, 3}
>>> {0, *my_set, 4}
{0, 1, 2, 3, 4}
>>> [*my_set, *my_list, *my_tuple, *range(1, 4)]
[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
>>> my_str = "123"
>>> [*my_set, *my_list, *my_tuple, *range(1, 4), *my_str]
[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, '1', '2', '3']

We can use the iterable unpacking operator, *, when defining sequences to unpack the elements of a subsequence (or iterable) into the final sequence. This will allow us to create sequences on the fly from other existing sequences without calling methods like append(), insert(), and so on.

The last two examples show that this is also a more readable and efficient way to concatenate iterables. Instead of writing list(my_set) + my_list + list(my_tuple) + list(range(1, 4)) + list(my_str) we just write [*my_set, *my_list, *my_tuple, *range(1, 4), *my_str].

Unpacking Dictionaries With the ** Operator

In the context of unpacking in Python, the ** operator is called the dictionary unpacking operator. The use of this operator was extended by PEP 448. Now, we can use it in function calls, in comprehensions and generator expressions, and in displays.

A basic use-case for the dictionary unpacking operator is to merge multiple dictionaries into one final dictionary with a single expression. Let's see how this works:

>>> numbers = {"one": 1, "two": 2, "three": 3}
>>> letters = {"a": "A", "b": "B", "c": "C"}
>>> combination = {**numbers, **letters}
>>> combination
{'one': 1, 'two': 2, 'three': 3, 'a': 'A', 'b': 'B', 'c': 'C'}

If we use the dictionary unpacking operator inside a dictionary display, then we can unpack dictionaries and combine them to create a final dictionary that includes the key-value pairs of the original dictionaries, just like we did in the above code.

An important point to note is that, if the dictionaries we're trying to merge have repeated or common keys, then the values of the right-most dictionary will override the values of the left-most dictionary. Here's an example:

>>> letters = {"a": "A", "b": "B", "c": "C"}
>>> vowels = {"a": "a", "e": "e", "i": "i", "o": "o", "u": "u"}
>>> {**letters, **vowels}
{'a': 'a', 'b': 'B', 'c': 'C', 'e': 'e', 'i': 'i', 'o': 'o', 'u': 'u'}

Since the a key is present in both dictionaries, the value that prevail comes from vowels, which is the right-most dictionary. This happens because Python starts adding the key-value pairs from left to right. If, in the process, Python finds keys that already exit, then the interpreter updates that keys with the new value. That's why the value of the a key is lowercased in the above example.

Unpacking in For-Loops

We can also use iterable unpacking in the context of for loops. When we run a for loop, the loop assigns one item of its iterable to the target variable in every iteration. If the item to be assigned is an iterable, then we can use a tuple of target variables. The loop will unpack the iterable at hand into the tuple of target variables.

As an example, let's suppose we have a file containing data about the sales of a company as follows:

Product Price Sold Units
Pencil 0.25 1500
Notebook 1.30 550
Eraser 0.75 1000
... ... ...

From this table, we can build a list of two-elements tuples. Each tuple will contain the name of the product, the price, and the sold units. With this information, we want to calculate the income of each product. To do this, we can use a for loop like this:

>>> sales = [("Pencil", 0.22, 1500), ("Notebook", 1.30, 550), ("Eraser", 0.75, 1000)]
>>> for item in sales:
...     print(f"Income for {item[0]} is: {item[1] * item[2]}")
...
Income for Pencil is: 330.0
Income for Notebook is: 715.0
Income for Eraser is: 750.0

This code works as expected. However, we're using indices to get access to individual elements of each tuple. This can be difficult to read and to understand by newcomer developers.

Let's take a look at an alternative implementation using unpacking in Python:

>>> for product, price, sold_units in sales:
...     print(f"Income for {product} is: {price * sold_units}")
...
Income for Pencil is: 330.0
Income for Notebook is: 715.0
Income for Eraser is: 750.0

We're now using iterable unpacking in our for loop. This makes our code way more readable and maintainable because we're using descriptive names to identify the elements of each tuple. This tiny change will allow a newcomer developer to quickly understand the logic behind the code.

It's also possible to use the * operator in a for loop to pack several items in a single target variable:

>>> for first, *rest in [(1, 2, 3), (4, 5, 6, 7)]:
...     print("First:", first)
...     print("Rest:", rest)
...
First: 1
Rest: [2, 3]
First: 4
Rest: [5, 6, 7]

In this for loop, we're catching the first element of each sequence in first. Then the * operator catches a list of values in its target variable rest.

Finally, the structure of the target variables must agree with the structure of the iterable. Otherwise, we'll get an error. Take a look at the following example:

>>> data = [((1, 2), 2), ((2, 3), 3)]
>>> for (a, b), c in data:
...     print(a, b, c)
...
1 2 2
2 3 3
>>> for a, b, c in data:
...     print(a, b, c)
...
Traceback (most recent call last):
  ...
ValueError: not enough values to unpack (expected 3, got 2)

In the first loop, the structure of the target variables, (a, b), c, agrees with the structure of the items in the iterable, ((1, 2), 2). In this case, the loop works as expected. In contrast, the second loop uses a structure of target variables that don't agree with the structure of the items in the iterable, so the loop fails and raises a ValueError.

Packing and Unpacking in Functions

We can also use Python's packing and unpacking features when defining and calling functions. This is a quite useful and popular use-case of packing and unpacking in Python.

In this section, we'll cover the basics of how to use packing and unpacking in Python functions either in the function definition or in the function call.

Note: For a more insightful and detailed material on these topics, check out Variable-Length Arguments in Python with *args and **kwargs.

Defining Functions With * and **

We can use the * and ** operators in the signature of Python functions. This will allow us to call the function with a variable number of positional arguments (*) or with a variable number of keyword arguments, or both. Let's consider the following function:

>>> def func(required, *args, **kwargs):
...     print(required)
...     print(args)
...     print(kwargs)
...
>>> func("Welcome to...", 1, 2, 3, site='StackAbuse.com')
Welcome to...
(1, 2, 3)
{'site': 'StackAbuse.com'}

The above function requires at least one argument called required. It can accept a variable number of positional and keyword arguments as well. In this case, the * operator collects or packs extra positional arguments in a tuple called args and the ** operator collects or packs extra keyword arguments in a dictionary called kwargs. Both, args and kwargs, are optional and automatically default to () and {} respectively.

Even though the names args and kwargs are widely used by the Python community, they're not a requirement for these techniques to work. The syntax just requires * or ** followed by a valid identifier. So, if you can give meaningful names to these arguments, then do it. That will certainly improve your code's readability.

Calling Functions With * and **

When calling functions, we can also benefit from the use of the * and ** operator to unpack collections of arguments into separate positional or keyword arguments respectively. This is the inverse of using * and ** in the signature of a function. In the signature, the operators mean collect or pack a variable number of arguments in one identifier. In the call, they mean unpack an iterable into several arguments.

Here's a basic example of how this works:

>>> def func(welcome, to, site):
...     print(welcome, to, site)
...
>>> func(*["Welcome", "to"], **{"site": 'StackAbuse.com'})
Welcome to StackAbuse.com

Here, the * operator unpacks sequences like ["Welcome", "to"] into positional arguments. Similarly, the ** operator unpacks dictionaries into arguments whose names match the keys of the unpacked dictionary.

We can also combine this technique and the one covered in the previous section to write quite flexible functions. Here's an example:

>>> def func(required, *args, **kwargs):
...     print(required)
...     print(args)
...     print(kwargs)
...
>>> func("Welcome to...", *(1, 2, 3), **{"site": 'StackAbuse.com'})
Welcome to...
(1, 2, 3)
{'site': 'StackAbuse.com'}

The use of the * and ** operators, when defining and calling Python functions, will give them extra capabilities and make them more flexible and powerful.

Conclusion

Iterable unpacking turns out to be a pretty useful and popular feature in Python. This feature allows us to unpack an iterable into several variables. On the other hand, packing consists of catching several values into one variable using the unpacking operator, *.

In this tutorial, we've learned how to use iterable unpacking in Python to write more readable, maintainable, and pythonic code.

With this knowledge, we are now able to use iterable unpacking in Python to solve common problems like parallel assignment and swapping values between variables. We're also able to use this Python feature in other structures like for loops, function calls, and function definitions.

April 06, 2020 12:56 PM UTC


Python Software Foundation

Python Software Foundation Fellow Members for Q1 2020

We are happy to announce our newest PSF Fellow Members for Q1 2020!

Q1 2020


Al Sweigart
Website, Twitter, GitHub

Alexandre Savio
Twitter, Website

Darya Chyzhyk
Twitter, GitHub, LinkedIn

Kenneth Love

Kevin O'Brien
Twitter, GitHub, LinkedIn

Serhiy Storchaka

Thea Flowers
Website, Blog, GitHub

Tom Christie
Website, Twitter

Congratulations! Thank you for your continued contributions. We have added you to our Fellow roster online.

The above members have contributed to the Python ecosystem by teaching Python, creating education material, contributing to circuitpython, contributing to and maintaining packaging, organizing Python events and conferences, starting Python communities in their home countries, and overall being great mentors in our community. Each of them continues to help make Python more accessible around the world. To learn more about the new Fellow members, check out their links above.

Let's continue to recognize Pythonistas all over the world for their impact on our community. The criteria for Fellow members is available online: https://www.python.org/psf/fellows/. If you would like to nominate someone to be a PSF Fellow, please send a description of their Python accomplishments and their email address to psf-fellow at python.org. We are accepting nominations for quarter 2 through May 20, 2020.

Help Wanted!


The Fellow Work Group is looking for more members from all around the world! If you are a PSF Fellow and would like to help review nominations, please email us at psf-fellow at python.org. More information is available at: https://www.python.org/psf/fellows/.

April 06, 2020 11:02 AM UTC


Python Insider

Python 2.7.18 release candidate 1 available

A first release candidate for Python 2.7.18 is now available for download. Python 2.7.18 will be the last release of the Python 2.7 series, and thus Python 2.

April 06, 2020 10:40 AM UTC


PyBites

Effective Developers Leverage Their Toolset

Compound interest is the 8th wonder of the world. He who understands it, earns it; he who doesn’t, pays it. - Albert Einstein

Last week I did a couple of shared screen sessions debugging and teaching.

I paused and reflected on the tools I used and how I sharpened my sword over the years.

This is not an article on how to deploy software with Docker, how to use git, or how to set up your env, although it has some shell and Vim goodness.

It's more about how small tweaks made me more productive as a programmer and learner.

Command line

My favorite tool of all time!

A silly example: what day is my birthday?

I could search the web.

I could write a Python program.

Or just use Unix:

$ cal 4 2020
    April 2020
Su Mo Tu We Th Fr Sa
        1  2  3  4
5  6  7  8  9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30

Nice, this year the 18th will be on a Saturday :)

I often go to my terminal and use simple shell commands before even writing a script (e.g. to iterate over files is as easy as: for i in *; do ls $i; done - the ls can be any operation).

Combining this with some regex (perl -pe 's///g') can be pretty powerful.

Another example are shell aliases, some of my common ones:

$ alias pvenv
alias pvenv='/Library/Frameworks/Python.framework/Versions/3.8/bin/python3.8 -m venv venv && source venv/bin/activate'
$ alias ae
alias ae='source venv/bin/activate'
$ alias brc
alias brc='vim ~/.bashrc'
$ alias lt
alias lt='ls -lrth'

I even use it to go to my Kindle highlights:

$ alias kindle
alias kindle='open https://read.amazon.com/notebook'

I can just type kindle and get to my book notes instantly.

(Talking about notes, I was actually going to write some code to download my notes, but there is already a tool for this: Bookcision, so sometimes the best code is the code you never write!)


OK one more.

I have been a nerd about terminal search for a while, but an easier way is to just use a clever Python tool called howdoi (demonstrated for its elegant code in The Hitchhiker's Guide to Python):

$ alias howdoi
alias howdoi='cd $HOME/code/howdoi && source venv/bin/activate && howdoi'

Check this out:

$ howdoi argparse
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("a")
args = parser.parse_args()

if args.a == 'magic.name':
    print 'You nailed it!'

Text editing

I never felt more awkward having to use Vim in a terminal when I started at Sun Microsystems.

However when I got past the steep learning curve, I became quite fast, using repeated replacement (dot (.) or :s/string/replace/g), leverage settings (.vimrc) and later on even Python syntax checking (preventing saving for various errors in VIM -> this is really cool!)

Here are some more Vim tricks if interested.

Navigating code bases

At first I was typing find . -name '*string*' | xargs grep ... the whole time.

Pretty inefficient.

Till somebody suggested ag (aka The Silver Searcher) and my dev life became so much better.


There are many more examples, the key lessons though are:


Again these are sometimes just small tweaks, but they shave off minutes a day, hours a week, days a year.

Of course for the more serious work I would write shell or Python scripts (preferably latter).

It feels awesome knowing the time and effort you invested learning the tools of the craft is seriously paying off.


Keep Calm and Code in Python!

-- Bob

With so many avenues to pursue in Python it can be tough to know what to do. If you're looking for some direction or want to take your Python code and career to the next level, schedule a call with us now. We can help you!

April 06, 2020 08:00 AM UTC


BreadcrumbsCollector

What is Celery beat and how to use it – part 2, patterns and caveats

Celery beat is a nice Celery’s add-on for automatic scheduling periodic tasks (e.g. every hour). For more basic information, see part 1 – What is Celery beat and how to use it.

In this part, we’re gonna talk about common applications of Celery beat, reoccurring patterns and pitfalls waiting for you.

Ensuring a task is only executed one at a time

There are particular tasks we want to run periodically up to one at a time. For example, data synchronization with an external service or doing calculations on our own data. In both cases, it does not make sense to have two identical tasks running at the same time. It may either result in considerable load spike or lead to data corruption.

The first one will be a nightmare from a maintenance and performance point of view since one has to kill manually duplicated tasks. For example, two or more tasks doing heavy operations on a relational database may consume so many resources that the database will not be able to service usual requests from clients.

When it comes to data corruption, consequences are even more dreadful. Such a situation may lead to money loss and potentially time-consuming manual intervention in data to undo the damage. For example, imagine we synchronize payments from external service in a periodic task. If our implementation does not secure us against parsing the same payment twice, we could end up giving someone twice as they should have.

Luckily, there are at least two ways to secure against such unpleasant situations:

Separate queue with a dedicated Celery worker with a single process

This solution requires no serious code changes in Celery tasks. Instead, we have to:

How does it work? Well, a single worker process ensures there is exactly one task processed at a time. If for some reason task takes longer than anticipated (e.g. longer than intervals for Celery Beat scheduler), another one is not started. It may still pile up in the broker, though.

For the complete example, see github repo – Celery Beat Example.

On the bright side, this solution is very reliable and trivial. No extra code for handling locks is needed. No code changes in tasks are required. This is a natural solution by design.

However, this solution has also serious disadvantages. One of them is the maintenance of additional celery worker. If it is idle for most of the time, it is pure waste. On the other hand, if we have more tasks that could use execution one at a time, we may reuse the same worker. It may still require a bit of fine-tuning plus monitoring if we are under- or over-utilizing our dedicated worker.

Use a lock to guarantee only a single task execution at a time

Second solution uses classic pessimistic lock. When a Celery gets a task from the queue, we need to acquire a lock first. If it fails, we abort. If we acquired the lock successfully, we apply timeout on it (so lock automatically disappears if a worker crashes) and start work. When we are finished, we can release the lock. This guarantees us to have only one worker at a time processing a given task.

Recipe from Celery Task Cookbook (and why it is not something you want in production)

Celery Task Cookbook provides an explanation and code snippet for achieving that, but continue reading before you put this in production! This implementation is flawed for two reasons:

Using the cache is problematic because it is designed for a different application – caching data. The cache can be unavailable (and this is 100% normal situation). As far as I remember, Django’s cache.add does not inform whether it failed because a given key exists (lock has been already acquired) or cache server (Memcached in the example) is not working. To mitigate this, we need monitoring & alerting set up to see if our cache is up and running. Otherwise, we would be seeing in logs that processing is already taking place, which is obviously false. Another problem with cache is that it does not provide appropriate API for doing locking. Note that the author of the recipe mentions one has to use cache backend that has add operation implemented as an atomic operation. What is more, a cache is supposed to be safe to delete at any time. This is a sign it is a hacky and brittle solution. In the example to come, I will show you how one can leverage Redis to achieve reliable locking.

Before jumping into Redis locking, let’s talk about a scenario when a lock-protected task takes longer than a lock timeout. With code in the example, the lock would expire and if another such task was waiting in the queue, it would start executing even before the previous one finished (but its lock expired meh)! That’s a catastrophe. Removing timeout from a lock is not an option because every time a task crashes it would require manual intervention to remove the lock. We may also approach this by introducing very long expiry time on the lock. That’s an option provided we can guarantee the task will finish in a shorter time. When I think guarantee, I do not mean guessing (Oh, it SHOULD last for 4 minutes). Either use timeouts internally or use Celery’s feature Time Limits.

Tailored Redis-based lock

Knowing that using Django cache is not really a reliable option, let’s explore an alternative. Redis is much better in terms of locking because a) it has atomic commands to set a value if a key does not exist and apply a timeout at once b) one may use Lua script to atomically release the lock but only if you own it. If you are just interested in the algorithm, see Distributed locks with Redis.

.gist table { margin-bottom: 0; }

Celery Beat tasks running very often (e.g. every few seconds)

Now, for tasks that are scheduled to run every few seconds, we must be very cautious. The solution with a dedicated worker in Celery does not really work great there, because tasks will quickly pile up in the queue, leading ultimately to the broker failure. The situation is a bit better for lock-protected tasks because multiple workers can quickly empty the queue of tasks if they ever pile up. It will just make your CPU cores red for a moment.

If you, however, think about preventing piling up tasks in a queue with expiration feature, beware the scenario with a single dedicated worker. Tasks do not magically disappear from the queue once they expire. They are still there until the worker gets them and only then they are revoked. In other words, if you have tasks piling up in queues but you applied expiry times, you are not safe yet. If tasks are getting produced faster than you consume them, it will lead to broker failure.

It may happen if you have created a dedicated worker for a calculation task that should run every few seconds. This is a very short time frame and it is easy to cross. Celery beat will keep on putting tasks in the queue until it fills up completely and bang, the broker is down.

Let me share with you one last recipe for dealing with the scenario – create a separate Python script (completely outside Celery) that will be doing that work. Its code will look as follows:

INTERVAL = 5

while True:
   do_the_work()
   sleep(INTERVAL)

For such often calculations, it makes no sense to stress Celery and put extra work in securing this against concurrent executions.

Summary

Even though Celery and Celery Beat may look trivial to use, there are many potential pitfalls you have to avoid.

It is not Celery’s fault, though. It is just that queues are not simple 😉 If you are more interested about what may go wrong in such systems, have a read about Queueing theory. It is applicable not only in computer programs!

PS: Don’t forget to checkout a repository with working examples from this post: Celery Beat Example

The post What is Celery beat and how to use it – part 2, patterns and caveats appeared first on Breadcrumbs Collector.

April 06, 2020 08:00 AM UTC


Codementor

Flask Delicious Tutorial : Building a Library Management System Part 1 - Planning

Learn Python Web Dev By Building A Library Management System

April 06, 2020 07:09 AM UTC


Mike Driscoll

PyDev of the Week: Pablo Galindo Salgado

This week we welcome Pablo Galindo Salgado (@pyblogsal) as our PyDev of the Week! Pablo is a core developer of the Python programming language. He is also a speaker at several Python related conferences. If you’d like to see what projects he is contributing to, you can check out his Github profile.

Let’s spend some time getting to know Pablo better!

Can you tell us a little about yourself (hobbies, education, etc):

I am currently working at Bloomberg L.P. in the Python infrastructure team, supporting all our Python developers and providing critical infrastructure and libraries to make sure everyone has better experience programming in Python. But before working on the Software industry I used to be in academia as a theoretical physicist researching general relativity and in particular, black hole physics. This is something that I still do as a hobby (although without the pressures of publication and funding) because I still love it! For instance, I have given some talks in some Python conferences related to this
(https://www.youtube.com/watch?v=p0Fc2jWVbrk) and I continue developing and researching improved algorithms to simulate and visualize different spacetimes. For example, here you have some simulated Kerr Newman black holes with accretion disks around them I have worked on recently:

Accretion Disks
and here with some textures mapped to the sky sphere and the accretion disk:
Texture mapped accretion disks

When I am not burning my CPU cores doing core dev work in CPython I burn them doing more and more simulations. I love to still work on this because it brings together two of my passions: theoretical physics and coding! Some times, to optimize your code you need a better equation instead of a better algorithm!

Why did you start using Python?

I started using Python when I started working on my PHD in order to orchestrate and glue some simulation code in C (and some times FORTRAN 77!) ) and to do data processing and analysis. I immediately fell in love with the language and later with the community and
the ecosystem (as a famous phrase “Come for the language, stay for the community”). I had been exposed to many other languages before but I started using more and more Python because it had something mesmerizing that made programming very very fun. Also, in the scientific world (and excluding the huge world of data science and machine learning) it allows for fast prototyping of ideas and integration of pre-existing systems
in a way that other people can use and extend easily and intuitively. Also, as opposed to many other scientific tools and suites: is free and open-source, which I think is fundamental for making science more transparent, available and reproducible!

What other programming languages do you know and which is your favourite?

Apart from Python, I am fluent in C and Fortran (for real!) and I am confident coding in Rust and C++. Apart from that, I can proudly say that I have code several times in some form of Javascript without making things explode and I can copy some pre-existing HTML and CSS and modify it to make some cool front ends. I coded many many years in the Wolfram Language (Mathematica) but this is something that I don’t do anymore, although sometimes I miss some of the functional patterns that it has.

Even if is not very used today and has its fair amount of logical criticism, I still love C. It may be the Stockholm syndrome talking but I find it very simple and (more often than not) elegant. I also find that i has a good level of abstraction when I need to reason about some low-level effects or I need to be “closer to the metal”. Also when I started coding in C I had a fair amount of experience with FORTRAN (a.k.a. FORTRAN77) and let me tell you something: finding for the first time that you can code without leaving the first five columns of every line empty is a life-changing experience. Another life-changing experience is when you find much later in your life that such nonsense was for compatibility with punch-cards.

What projects are you working on now?

All my open source work is mainly on CPython, but apart from my general work as a core dev, since last year I have been also working a lot in a project together with Guido van Rossum, Lysandros Nikolaou, Emily Morehouseto and others to replace the current parser in CPython for a new shiny PEG parser!

I am very passionate about parsers and formal languages (I have given several talks about them and how the one in CPython works as https://www.youtube.com/watch?v=1_23AVsiQEc&t=2018s) so this project is very exciting to explore because with it we will be able to not only eliminate several ugly hacks in the current LL(1) grammar but also it will allow writing some constructs that now are impossible. In particular, I have been fighting for a while trying to allow grouping parenthesis in context managers like

with (
    something_very_long_here as A,
   something_else_ very_long_here as B,
   more_long_lines as C,
):
   ...

but this sadly is not possible with the current parsing machinery. Believe me: I have tried every trick in the book. Also, we hope that with the new parser many pars of the existing grammar will be written in a more maintainable and readable way. Stay tuned to know more!

What other parts of Python core do you work on?

As a Python core developer, I mainly work on the Python runtime and VM, especially in the parser, the AST,the compiler and the garbage collector. But apart from those focus areas,
I have worked all around the place in the standard library: posix, multiprocessing, functools, math, builtins…. I also spend a huge amount of the time squashing bugs and race conditions and taking care of the CPython CI and the buildbot system (check out this blog post in the PSF blog about it http://pyfound.blogspot.com/2019/06/pablo-galindo-salgado-nights-watch-is.html).

Talking to some other core devs, recently I found out that I am among the 3 “most active” core developers since I was promoted (https://github.com/python/cpython/graphs/contributors?from=2018-05-07&to=2020-03-25&type=c)!

I also focus a lot on making CPython more approachable to contributors and future core devs. For instance, I have recently written a very complete design document for one of the most undocumented areas of CPython (https://devguide.python.org/garbage_collector/). I am also spending a lot of time mentoring contributors with the hope that some of them become future core devs! Mentoring is very demanding but I think is a very important part of making sure that Python stays alive and we have an open and inclusive community.

I am also very grateful for having some of the most incredible, talented and candid people around me in the core dev team as they are one of the main reasons I contribute every day!

Which Python libraries are your favorite (core or 3rd party)?

This is a hard question! I will abuse the fact that the question does not set a limit of libraries to list a few:

From the standard library: gc, ast, posix (¯\_(?)_/¯), functools, tracemalloc and itertools.
3rd party: Cython and numpy.

I see you were part of the team behind PEP 570. How were you involved?

I did the full implementation, shepherded the discussion and wrote most of the PEP document in a not very impressive English, but thanks to my other co-authors and Carol Willing (who is my role model for several things, including documenting and explaining
complex things in an easy way) the document has improved massively since the first version I wrote.

Do you have a favourite obscure Python feature or module?

I love chained comparisons! For instance when you write:

if a < c < d:
   ...

instead of:

if a < c and d < d:
   ...

I also love that you can use them in a less intuitive way like:

>>> x = "'Hola'"
>>> x[-1] == x[0] in {"'", "other", "string"}
True

This feature has a slightly dark side that can confuse a lot of people, especially when written with other operators:

>>> False == False in [False]
True

Sadly, using chained comparisons is marginally slower 🙁

Is there anything else you’d like to say?

Thanks a lot for asking me to do this interview, and also thanks to everyone that has survived until the end of it 🙂

Thanks for doing the interview, Pablo!

The post PyDev of the Week: Pablo Galindo Salgado appeared first on The Mouse Vs. The Python.

April 06, 2020 05:05 AM UTC

April 04, 2020


Codementor

How TO GET STARTED WITH Machine Learning

this is tutorial to do image classification from basics and complete code with explanation .

April 04, 2020 04:09 PM UTC


Weekly Python StackOverflow Report

(ccxxii) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2020-04-04 13:57:17 GMT


  1. Complete set of punctuation marks for Python (not just ASCII) - [21/4]
  2. Python round to next highest power of 10 - [16/2]
  3. Check if pandas column contains all elements from a list - [13/6]
  4. Add ID found in list to new column in pandas dataframe - [10/5]
  5. list of lists to list of ints python - [8/6]
  6. How can I create a nested JSON file from a Pandas dataframe in Python? - [8/3]
  7. Why does creating a list of tuples using list comprehension requires parentheses? - [8/3]
  8. Match all combinations of list with other list - [5/4]
  9. Downloading files from ftp server using python but files not opening after downloading - [5/3]
  10. Applying/Composing a function N times to a pandas column, N being different for each row - [5/2]

April 04, 2020 01:57 PM UTC


Talk Python to Me

#258 Thriving in a remote developer environment

If you are listening to this episode when it came out, April 4th, 2020, there's a good chance you are listening at home, or on a walk. But it's probably not while commuting to an office as much of the world is practicing social distancing and working from home. Maybe this is a new experience, brought upon quickly by the global lockdowns, or maybe it's something you've been doing for awhile. <br/> <br/> Either way, being effective while working remotely, away from the office, is an increasingly valuable skill that most of us in the tech industry have to quickly embrace. <br/> <br/> On this episode, I'll exchange stories about working from home with Jayson Phillips. He's been writing code and managing a team from his home office for years and has brought a ton of great tips to share with us all.<br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>Jayson on Twitter</b>: <a href="https://twitter.com/_jjphillips/" target="_blank" rel="noopener">@_jjphillips</a><br/> <b>Jayson's twitter thread on remote work</b>: <a href="https://twitter.com/_jjphillips/status/1216158067491889152" target="_blank" rel="noopener">twitter.com</a><br/> <b>Clockwise</b>: <a href="https://www.getclockwise.com/" target="_blank" rel="noopener">getclockwise.com</a><br/> <b>Calendly</b>: <a href="https://calendly.com" target="_blank" rel="noopener">calendly.com</a><br/> <b>Ideas on Making Remote Work... Work For You</b>: <a href="https://jaysonjphillips.com/blog/2020/02/ideas-on-making-remote-work-work-for-you/" target="_blank" rel="noopener">jaysonjphillips.com</a><br/> <b>[Book] Remote - Office Not Required</b>: <a href="https://www.amazon.com/Remote-Office-Required-Jason-Fried-ebook/dp/B00C0ALZ0W/ref=tmm_kin_swatch_0?_encoding=UTF8&qid=&sr=" target="_blank" rel="noopener">amazon.com</a><br/></div><br/> <strong>Sponsors</strong><br/> <br/> <a href='https://talkpython.fm/brilliant'>Brilliant</a><br> <a href='https://talkpython.fm/linode'>Linode</a><br> <a href='https://talkpython.fm/training'>Talk Python Training</a>

April 04, 2020 08:00 AM UTC


Catalin George Festila

Python 2.7.8 : Using python scripts with Revit Dynamo.

Dynamo is a visual programming tool that extends the power of the Revit by providing access to Revit API (Application Programming Interface. Dynamo works with node, each node have inputs and outputs and performs a specific task.This is a short tutorial about how you can use your python skills with Revit and Dynamo software. First, you need to start the Revit. I used Revit 2020 version. Then from

April 04, 2020 04:14 AM UTC


Amjith Ramanujam

Examples are Awesome

There are two things I look for whenever I check out an Opensource project or library that I want to use.

1. Screenshots (A picture is worth a thousand words).

2. Examples (Don't tell me what to do, show me how to do it).

Having a fully working example (or many examples) helps me shape my thought process.

Here are a few projects that are excellent examples of this.

1. https://github.com/prompt-toolkit/python-prompt-toolkit

A CLI framework for building rich command line interfaces. The project comes with a collection of small self-sufficient examples that showcase every feature available in the framework and a nice little tutorial.

2. https://github.com/coleifer/peewee

A small ORM for Python that ships with multiple web projects to showcase how to use the ORM effectively. I'm always overwhelmed by SqlAlchemy's documentation site. PeeWee is a breath of fresh air with a clear purpose and succinct documentation.

3. https://github.com/coleifer/huey

An asynchronous task queue for Python that is simpler than Celery and more featureful than RQ. This project also ships with an awesome set of examples that show how to integrate the task queue with Django, Flask or standalone use case.

The beauty of these examples is that they're self-documenting and show us how the different pieces in the library work with each other as well as external code outside of their library such as Flask, Django, Asyncio etc.

Examples save the users hours of sifting through documentation to piece together how to use a library.

Please include examples in your project.

April 04, 2020 01:53 AM UTC

April 03, 2020


Codementor

Analysis of the progress of COVID-19 in the world with Data Science.

Analysis of the progress of COVID-19 in the world with Data Science.

April 03, 2020 10:48 PM UTC


Paweł Fertyk

Getting started with Django middleware

Django comes with a lot of useful features. One of them is middleware. In this post I'll give a short explanation how middleware works and how to start writing your own.

The source code included in this post is available on GitHub.

General concept

Middleware allows you to process requests from a browser before they reach a Django view, as well as responses from views before they reach a browser. Django keeps a list of middleware for each project. You can find it in settings, under the name MIDDLEWARE. Each new Django project already has a bunch of middleware added to that list, and in most cases you should not remove anything from that list. You can, however, add your own.

Middleware is applied in the same order it is added to the list in Django settings. When a browser sends a request, it is processed like this:

Browser -> M_1 -> M_2 -> ... -> M_N -> View

A view receives a request, performs some operations, and returns a response. On its way to the browser, the response has to go through each middleware again, but in reversed order:

Browser <- M_1 <- M_2 <- ... <- M_N <- View

This is a very brief explanation. More detailed description can be found in Django documentation.

A simple example

We will start with a simple middleware that measures the time it takes to process a request. All examples in this post use Django 3.0.5 and Python 3.6.9.

Project setup

First, create a Django project with a single application. Ignore the migrations, examples from this post will not use a database. Create a file called middleware.py in your application: that's where we will put most of the code.

django-admin startproject django_middleware
cd django_middleware
python manage.py startapp intro
touch intro/middleware.py

Your project should look like this:

django_middleware/
├── django_middleware
│   ├── asgi.py
│   ├── __init__.py
│   ├── settings.py
│   ├── urls.py
│   └── wsgi.py
├── intro
│   ├── admin.py
│   ├── apps.py
│   ├── __init__.py
│   ├── middleware.py
│   ├── migrations
│   │   └── __init__.py
│   ├── models.py
│   ├── tests.py
│   └── views.py
└── manage.py

Don't forget to register your application in django_middleware/settings.py:

INSTALLED_APPS = [
    'intro',
    ...
]

Now you can run the project:

python manage.py runserver

Writing Django middleware

According to Django documentation, there are 2 ways of creating a middleware: as a function and as a class. We will use the first method, but the last example will show you how to create a class too.

The general structure of a middleware in Django looks like this (example copied from Django docs):

def simple_middleware(get_response):
    # One-time configuration and initialization.
    def middleware(request):
        # Code to be executed for each request before
        # the view (and later middleware) are called.
        response = get_response(request)
        # Code to be executed for each request/response after
        # the view is called.
        return response
    return middleware

The simple_middleware function is called once, when Django initializes the middleware and adds it to the list of all middleware used in a project. The middleware function is called for every request made to the server. Everything before the line response = get_response(request) is called when the request goes from the browser to the server. Everything after this line is called when the response goes from the server back to the browser.

What does the line respone = get_response(request) do? In short, it calls the next middleware on the list. If this is the last middleware, the view gets called: it receives the request, performs some operations, and generates the response. That response is then returned to the last middleware on the list, which in turn sends it to the previous one, until there is no more middleware and the response is sent to the browser.

In our example, we want to check how long the whole process of handling a request takes. Edit intro/middleware.py file like this:

import time


def timing(get_response):
    def middleware(request):
        t1 = time.time()
        response = get_response(request)
        t2 = time.time()
        print("TOTAL TIME:", (t2 - t1))
        return response
    return middleware

In this example, we measure the time in seconds (time.time()) before and after the request, and we print the difference.

The next step is to install the middleware, to let Django know that we are going to use it. All we have to do is to add it to django_middleware/settings.py:

MIDDLEWARE = [
    'intro.middleware.timing',
    ...
]

Note: in this example, intro is the name of our Django application, middleware is the name of a Python file that contains our code, and timing is the name of a middleware function in that file.

Now we are ready to test it. Open your browser and navigate to localhost:8000. In the browser you should see the default Django project page (the one with the rocket). In the command line (where you called python manage.py runserver) you should see something similar to this:

TOTAL TIME: 0.0013387203216552734
[04/Apr/2020 17:15:34] "GET / HTTP/1.1" 200 16351

Modifying the request

Our middleware does quite well, printing information to the command line. But we can go a step further: how about adding something to the request, so that our views can use it later? Since we are in the timing business, how about adding the date and time the request took place?

This modification will be quite easy. Edit intro/middleware.py file like this:

import time
import datetime


def timing(get_response):
    def middleware(request):
        request.current_time = datetime.datetime.now()
        t1 = time.time()
        response = get_response(request)
        t2 = time.time()
        print("TOTAL TIME:", (t2 - t1))
        return response
    return middleware

We've added 2 lines: import datetime and request.current_time = datetime.datetime.now(). Together, they will add the current time to our request. Now, we need a view to display that time. Edit intro/views.py:

from django.http import HttpResponse


def showtime(request):
    return HttpResponse('Request time is: {}'.format(request.current_time))

For such a simple example we do not need a template, we can create a HttpResponse object directly in our code.

Now we need a URL for our view. Create a file intro/urls.py and edit it:

from django.urls import path
from .views import showtime

urlpatterns = [
    path('', showtime),
]

Remember to edit django_middleware/urls.py too:

from django.contrib import admin
from django.urls import include, path

urlpatterns = [
    path('', include('intro.urls')),
    path('admin/', admin.site.urls),
]

Let's test it. Open localhost:8000 in your browser. You should see something like this:

Request time shown in browser

Refresh the page several times, to check that you will get different results (the time should be updated for each request).

Something more useful: processing exceptions

It's time for a bit more interesting example. Consider this real-life situation: you write a program and it doesn't work. Happens to the best of us, don't worry. What do you usually do then? Do you check Stack Overflow for answers? You probably do, practically all coders do. How about we create a middleware that would do the search for us?

Django middleware can include a function that will be called each time an exception is raised. That function is called process_exception and it takes 2 arguments: a request that caused the exception and the exception itself.

If our middleware is defined as a function, than we can implement process_exception like this:

def simple_middleware(get_response):
    def middleware(request):
        return get_response(request)

    def process_exception(request, exception):
        # Do something useful with the exception
        pass

    middleware.process_exception = process_exception
    return middleware

In our case, we want to send our exception to Stack Overflow and get links to the most relevant questions.

Short introduction to APIs

If you haven't used APIs before, don't worry. The general idea is: just like you send question to the internet using a web browser, API is a way for your code to send questions automatically.

Stack Exchange is kind enough to host an API for querying their websites. The base URL is https://api.stackexchange.com/2.2/search, after which you can put search params. And so, if you want to check 3 top results (sorted by votes) from Stack Overflow, tagged as "python" and dealing with Django, you can send a request like this: https://api.stackexchange.com/2.2/search?site=stackoverflow&pagesize=3&sort=votes&order=desc&tagged=python&intitle=django. Go ahead and check it in your browser. You should see something like this:

Stack Exchange API results

In Python, to send a request like this, we will use a module called requests

Stack Overflow middleware

Let's create a new middleware called stackoverflow:

import requests
from django.http import HttpResponse

# Previous imports and timing middleware should remain unchanged


def stackoverflow(get_response):
    def middleware(request):
        # This method does nothing, all we want is exception processing
        return get_response(request)

    def process_exception(request, exception):
        url = 'https://api.stackexchange.com/2.2/search'
        params = {
            'site': 'stackoverflow',
            'order': 'desc',
            'sort': 'votes',
            'pagesize': 3,
            'tagged': 'python;django',
            'intitle': str(exception),
        }
        response = requests.get(url, params=params)
        html = ''
        for question in response.json()['items']:
            html += '<h2><a href="{link}">{title}</a></h2>'.format(**question)
        return HttpResponse(html)

    middleware.process_exception = process_exception

    return middleware

Every time a view raises an exception, our process_exception method will be called. We use the requests module to call Stack Exchange API. Most parameters are self-explanatory. They are the same as we used in the browser example, but instead of putting them all in a URL manually, we let the requests module do it for us. We just changed the tags (to search for Python and Django) and we use our exception as a string (str(exception)) to search the title of available questions. After we get a response from Stack Overflow, we put together a HTML containing a link to each relevant question. Hopefully, we can find an answer to our problem there. Finally, that HTML is returned to the browser.

Please note that the response from Stack Overflow is not a normal web page, but instead it is a bunch of information in a format called JSON. That's why we call response.json() to get our results.

Of course, we need to install this new middleware:

MIDDLEWARE = [
    'intro.middleware.stackoverflow',
    'intro.middleware.timing',
    ...
]

The only problem we have now is that our view works perfectly. We need to break it a bit if we want our new middleware to have some exceptions to process. Edit intro/views.py:

def showtime(request):
    raise Exception('Django middleware')
    # return HttpResponse('Request time is: {}'.format(request.current_time))

Keep in mind that process_exception method will be called only for actual exceptions. Returning HttpResponseServerError or any other error code does not count.

It's time to test it. Open localhost:8000 in your browser. You should see something like this:

Stack Overflow answers our call

The middleware we just created is a bit more complicated than the initial examples. As your code grows, it might be a better idea to manage middleware as classes, not functions. Our Stack Overflow middleware as a class would look like this:

class StackOverflow():
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        return self.get_response(request)

    def process_exception(self, request, exception):
        url = 'https://api.stackexchange.com/2.2/search'
        params = {
            'site': 'stackoverflow',
            'order': 'desc',
            'sort': 'votes',
            'pagesize': 3,
            'tagged': 'python;django',
            'intitle': str(exception),
        }
        response = requests.get(url, params=params)
        html = ''
        for question in response.json()['items']:
            html += '<h2><a href="{link}">{title}</a></h2>'.format(**question)
        return HttpResponse(html)

Most of the code looks similar, but for a class we need to store the get_response callback in our instance and use it for every __call__ method call. If you prefer this version, don't forget to change the settings:

MIDDLEWARE = [
    'intro.middleware.StackOverflow',
    ...
]

Conclusion

These were very simple examples, but middleware can be used for many other things, like checking an authorization token, finding a proper user, and attaching that user to the request. I'm sure you can find a lot of ideas on your own, and hopefully this post will help you get started. If you think something is missing or if you spot a mistake, please let me know!

April 03, 2020 10:00 PM UTC


Real Python

The Real Python Podcast – Episode #3: Effective Python and Python at Google Scale

In this episode, Christopher interviews Brett Slatkin about the 2nd edition of his book Effective Python. Brett talks about the revisions he made for the book, and updating it for the newest versions of Python 3. Christopher asks who is the intended developer for the book. Brett also discusses working on Google App Engine, and what it's like to develop and maintain Python applications at Google Scale. Brett mentions a brief anecdote about working with Guido van Rossum, while they both worked at Google. He also provides advice about maintaining a large and aging Python code base.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

April 03, 2020 12:00 PM UTC


Python Software Foundation

Announcing a new Sponsorship Program for Python Packaging

The Packaging Working Group of the Python Software Foundation is launching an all-new sponsorship program to sustain and improve Python's packaging ecosystem. Funds raised through this program will go directly towards improving the tools that your company uses every day and sustaining the continued operation of the Python Package Index.
With this program we are asking companies that rely on Python, its ecosystem of packaging tools, and PyPI to help us build a dependable basis to continue our efforts. 

Improving the packaging ecosystem

Since 2017, the Packaging Working Group has secured multiple grants, completed one contract, and received a generous gift -- all with the goal of improving the Python packaging ecosystem for all users. Most of these projects were funded by not-for-profit organizations and all of them were one-time awards with specific objectives.
Results from these funded projects include:
Companies have asked us how they can help fund the platform they depend on. With this new sponsorship program, the Working Group can sustainably fund packaging improvements not directed by a specific grant or contract and benefit millions of Python users around the world. Greater budget flexibility and a deeper reserve will help us invest in what the community needs.

Sustaining PyPI

As of April 2020, the Python Package Index responds to 800 million requests and delivers 200 million packages totalling 400 terabytes, during the typical day. Our users include hobbyists, scientists, companies, students, governments, nonprofits, and more.
Existing sponsors donate their services, which keeps PyPI free to users and to the PSF, aside from a subset of one staff member's time. Without these donations, the costs to operate PyPI each month would be staggering.
These critical service donations must not be taken for granted. Sponsoring the Packaging Working Group through this new program creates and maintains a stable reserve. We'll need that reserve in the event that we lose any of these in-kind service donations and must pay some or all of PyPI's operating costs.

Show your support!

As a company, your team can review the details of this new sponsorship program in our prospectus. Should you have any questions you can contact us at sponsorship@pypi.org. When you're ready, apply here. We are excited to hear from you!
If your company cannot donate: Even as an individual, your contributions count! No matter the size or frequency, please support us if you are able at donate.pypi.org.

April 03, 2020 11:41 AM UTC


Codementor

30 Days Of Python | Day 3 Project: A Simple Earnings Calculator

Welcome to the first project in 30 Days Of Python! Today we'll be creating an earnings calculator for employees. Check it out, and the rest of the series too!

April 03, 2020 08:27 AM UTC


BreadcrumbsCollector

When to use the Clean Architecture?

Enthusiasm, doubt, opposition

There are few possible reactions after learning about the Clean Architecture or Hexagonal Architecture (AKA Ports & Adapters) or even merely innocent service layer in Django. Some developers are enthusiastic and try to apply these techniques immediately, some are hesitant, full of doubts. The rest is strongly opposing, declaring openly this is an abomination. Then they say we already have excellent tools, like Django. Then they argue others don’t know about the advanced features of common tools. Then they call you Java developer in disguise.

As a speaker and an author of the book Implementing the Clean Architecture , I have faced all the reactions from this spectrum. What two extremes fail to do, is to ask the right question – WHEN? When the Clean Architecture should be used?

Frankly, the Clean Architecture (or other technique) is not what most overenthusiastic adopters think it is. On the other hand, headstrong opponents see this as a bluff and try to call it. There is no bluff. It is not meant to be a silver bullet.

By the way, if you are an evangelist of one of these techniques and do not mention they are not one-size-fits-all solutions, start immediately. Please.

No silver bullet

So the Clean Architecture is like a hammer – applicable for a specific set of problems. And no, it does not deprecate our current toolset, in particular Django.

Speaking of silver bullets… during my talks about the Clean Architecture, I quote No Silver Bullet – Essence and Accident in Software Engineering – a paper written by Frederick Brooks. You can easily find it online. The clue is that there is no magic trick, single technique or approach (title silver bullet) that will make the efficiency of software development to improve dramatically.

The other worthy takeaway from the paper is a division of complexity into two categories – accidental and essential. Accidental complexity is something we can eradicate with a finite effort, e.g. refactor code, use 3rd party library (so we don’t have to implement the solution entirely on our own). So it can be dealt with. Essential complexity is a completely different creature. It can not be avoided UNLESS we negotiate a change in business requirements.

Essential complexity will catch you

In other words, if business requirements are complex, so will be the code – no matter how many programming aces one has up their sleeve. If a program has 50 features, it will be inherently complex. This gives us the first reason to interest in the Clean Architecture which is…

Managing essential complexity

Assuming one failed to reduce the scope of business requirements, essential complexity is inevitable. Since we cannot avoid it, we have to learn how to manage it. One is no longer able to model a solution with a database browser-like application.

On the other hand, if one does not have to manage essential complexity, the Clean Architecture is simply unnecessary. And it makes things worse because that’s an extra burden. This is introducing accidental complexity.

How to tell if one is dealing with essential complexity? Talk with the right people, the ones who call the shots. The ones who drive business requirements. Remember, they learn along the way the same way you do.

What you are looking for is called Core Domain (Domain-Driven Design terminology). Core Domain is a set of business rules that are the main reason for the project to be built. Take cloud services that provide authorization & identity management, like Auth0. For our projects, login/logout is rarely something we want to sweat over. It is critical, but they are quality libraries for that and look, cloud services. What one project may treat as Generic Domain (DDD again), for others is a core of their business. Auth0 site even claims that “Identity is Complex”. Well, for them it certainly is. And they make it easy for others.

Core Domain is often part of the project that stands for a competitive advantage. If it is going to outperform competitors, simple solutions may not suffice. So the only piece of advice I can give is to recognize different aspects or functionalities the project you work on has or is meant to have.

Where else to look for essential complexity?

Core Domain may not be the only place which deserves more sophisticated techniques. There is a relatively easy heuristic to tell:

the chances are you have a perfect match to apply the Clean Architecture or other technique.

An example: our project had a feature of subscriptions. A member:

Even though it was not the main feature, still quite complex and crucial – because it was making extra money.

But you won’t catch many of such subtleties unless you listen and talk with stakeholders or their representative, e.g. product owner/project manager. Do it, your effectiveness and possibly a success of the project depends on it. And someone thought that being a programmer means less human interaction. Which smoothly brings us to the second reason for using the Clean Architecture…

Facilitate communication

For simple CRUD-flavoured applications, one can be comfortable with translating business requirements into database tables operations on the go. Once the project grows and more essential complexity appears, it is no longer that easy.

That’s pure biology. Our brains are too small to comprehend too much information at once. Which makes it harder and harder to represent domain knowledge in the code if it gets scattered in all corners of the application. That’s why we focus on business rules and provide special places to put the logic.

Communication is crucial for effective work, for example reducing the risk of misunderstanding or increasing chances for discovering edge cases. It will also help you find out if the problem is essentially complex (use the Clean Architcture) or not.

In the end, we talk and think in a more abstract way. We think and code Entities or Use Cases. Eventually, data has to be saved in the database and read from it, but it’s not our main concern. We realise it’s more like a side effect.

Software Development is a learning process, Working code is a side effect.

Alberto Brandolini

“But this is not Pythonic! You’re imitating Java/C# etc in our pure language!”

(Yes, I actually heard this twice and a friend of mine at least once).

It is not a matter of programming language, it is ALWAYS a matter of what problem one tries to solve with it. Then, only after recognizing the problem, one uses appropriate techniques.

If there was some mastermind, responsible for dispatching projects to companies specialized in particular technologies, perhaps all we would be getting were projects where Django shines. But it doesn’t work that way. And that’s good, really good. You are challenged, so you have to learn new things and as a result, we all thrive.

Look at the history of node.js. A few years back no one would suppose that this funny and often astonishing (see WAT talk by Gary Bernhardt) browser language would be a considerable competence for Python on the backend. Yeah, the language lacked a proper ecosystem, it was dynamically and weakly typed. Look what’s happening now.

Now, look at Python. It also advanced in the last decade. We have type annotations, mypy, dataclasses just to name a few. Finally, dependency management got some traction (pipenv, poetry, PSF financed pip development…). Languages progress is simply a response to the needs of the market. Or rather it is a future created by contributors who see a need for tools that would make their life easier while working on real-world projects. Python is no different than other languages in that matter.

All this is fine, but the majority of my project does not have that essential complexity

The truth is that the Clean Architecture is not applicable throughout the entire project. Luckily, we are not doomed. It does not mean we have to choose between overengineering in most places or underengineering in our Core Domain.

It is just an indication of another need. Need for modularization. We need a way to use different architectural styles throughout the project. It doesn’t really matter if it will be a modular monolith or microservices. We just need a little flexibility to choose the most appropriate style for different parts of the project.

However, it is unlikely you will ever see a need for modularization (not to mention the Clean Architecture) if you switch companies or projects every few months (say, about 2-3 months, definitely less than 6). You will not have enough exposure to a single business domain to learn intricacies. You will simply have no occasion to learn. While software developers tend to think at some stages of their careers that clean code can save the world, the fact is our code is merely a derivative of communication with stakeholders. Programming skills are a cure only for accidental complexity.

Should I bother even if I don’t need the Clean Architecture now?

Yes, you should. It will still advance you as a programmer. And if you are lucky, you will work on more challenging projects in the future without sweating too much.

Even if it’s not helpful for you now and you decide you don’t want to commit to it, why not just remember there is such a technique and dive into it when there is a need for that?

The post When to use the Clean Architecture? appeared first on Breadcrumbs Collector.

April 03, 2020 08:00 AM UTC


Randy Zwitch

Building pyarrow with CUDA support

The other day I was looking to read an Arrow buffer on GPU using Python, but as far as I could tell, none of the provided pyarrow packages on conda or pip are built with CUDA support. Like many of the packages in the compiled-C-wrapped-by-Python ecosystem, Apache Arrow is thoroughly documented, but the number of permutations of how you could choose to build pyarrow with CUDA support quickly becomes overwhelming.

In this post, I’ll show how to build pyarrow with CUDA support on Ubuntu using Docker and virtualenv. These directions are approximately the same as the official Apache Arrow docs, just that I explain them step-by-step and show only the single build toolchain I used.

Step 1: Docker with GPU support

Even though I use Ubuntu 18.04 LTS on a workstation with an NVIDIA GPU, whenever I undertake a project like this, I like to use a Docker container to keep everything isolated. The last thing you want to do is to debug environment errors, changing dependencies for one project and breaking something else. Thankfully, NVIDIA Docker developer images are available via DockerHub:

docker run -it --gpus=all --rm nvidia/cuda:10.1-devel-ubuntu18.04 bash

Here, the -it flag puts us inside the container at a bash prompt, --gpus=all allows the Docker container to access my workstation’s GPUs and --rm deletes the container after we’re done to save space.

Step 2: Setting up the Ubuntu Docker container

When you pull Docker containers from DockerHub, frequently they are bare-bones in terms of libraries included, and usually can also be updated. For building pyarrow, it’s useful to install the following:

apt update && apt upgrade -y

apt install git \
wget \
libssl-dev \
autoconf \
flex \
bison \
llvm-7 \
clang \
cmake \
python3-pip \
libjemalloc-dev \
libboost-dev \
libboost-filesystem-dev \
libboost-system-dev \
libboost-regex-dev  \
python3-dev -y

In a later step, we’ll use the Arrow third-party dependency script to ensure all needed dependencies are present, but these are a good start.

Step 3: Cloning Apache Arrow from GitHub

Cloning Arrow from GitHub is pretty straightforward. The git checkout apache-arrow-0.15.0 line is optional; I needed version 0.15.0 for the project I was exploring, but if you want to build from the master branch of Arrow, you can omit that line.

git clone https://github.com/apache/arrow.git /repos/arrow
cd /repos/arrow
git submodule init && git submodule update
git checkout apache-arrow-0.15.0
export PARQUET_TEST_DATA="${PWD}/cpp/submodules/parquet-testing/data"
export ARROW_TEST_DATA="${PWD}/testing/data"

Step 4: Installing remaining Apache Arrow dependencies

As mentioned in Step 2, some of the dependencies for building Arrow are system-level and can be installed via apt. To ensure that we have all the remaining third-party dependencies, we can use the provided script in the Arrow repository:

pip3 install virtualenv
virtualenv pyarrow
source ./pyarrow/bin/activate
pip install six numpy pandas cython pytest hypothesis
mkdir dist
export ARROW_HOME=$(pwd)/dist
export LD_LIBRARY_PATH=$(pwd)/dist/lib:$LD_LIBRARY_PATH

cd cpp
./thirdparty/download_dependencies.sh $HOME/arrow-thirdparty

The script downloads all of the necessary libraries as well as sets environment variables that are picked up later, which is amazingly helpful.

Step 5: Building Apache Arrow C++ library

pyarrow links to the Arrow C++ bindings, so it needs to be present before we can build the pyarrow wheel:

mkdir build && cd build

cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
-DCMAKE_INSTALL_LIBDIR=lib \
-DARROW_FLIGHT=ON \
-DARROW_GANDIVA=ON \
-DARROW_ORC=ON \
-DARROW_WITH_BZ2=ON \
-DARROW_WITH_ZLIB=ON \
-DARROW_WITH_ZSTD=ON \
-DARROW_WITH_LZ4=ON \
-DARROW_WITH_SNAPPY=ON \
-DARROW_WITH_BROTLI=ON \
-DARROW_PARQUET=ON \
-DARROW_PYTHON=ON \
-DARROW_PLASMA=ON \
-DARROW_BUILD_TESTS=ON \
-DARROW_CUDA=ON \
..

make -j
make install

This is a pretty standard workflow for building a C or C++ library. We create a build directory, call cmake from inside of that directory to set up the options we want to use, then use make and then make install to compile and install the library, respectively. I chose all of the -DARROW_* options above just as a copy/paste from the Arrow documentation; Arrow doesn’t take long to build using these options, but it’s possibly the case that only -DARROW_PYTHON=ON and -DARROW_CUDA=ON are truly necessary to build pyarrow.

Step 6: Building pyarrow wheel

With the Apache Arrow C++ bindings built, we can now build the Python wheel:

cd /repos/arrow/python
export PYARROW_WITH_PARQUET=1
export PYARROW_WITH_CUDA=1
python setup.py build_ext --build-type=release --bundle-arrow-cpp bdist_wheel

As cmake and make run, you’ll eventually see the following in the build logs, which shows that we’re getting the behavior we want:

cmake --build . --config release --
[  5%] Compiling Cython CXX source for _cuda...
[  5%] Built target _cuda_pyx
Scanning dependencies of target _cuda
[ 11%] Building CXX object CMakeFiles/_cuda.dir/_cuda.cpp.o
[ 16%] Linking CXX shared module release/_cuda.cpython-36m-x86_64-linux-gnu.so
[ 16%] Built target _cuda

When the process finishes, the final wheel will be in the /repos/arrow/python/dist directory.

Step 7 (optional): Validate build

If you want to validate that your pyarrow wheel has CUDA installed, you can run the following:

(pyarrow) root@9260485caca3:/repos/arrow/python/dist# pip install pyarrow-0.15.1.dev0+g40d468e16.d20200402-cp36-cp36m-linux_x86_64.whl
Processing ./pyarrow-0.15.1.dev0+g40d468e16.d20200402-cp36-cp36m-linux_x86_64.whl
Requirement already satisfied: six>=1.0.0 in /repos/arrow/pyarrow/lib/python3.6/site-packages (from pyarrow==0.15.1.dev0+g40d468e16.d20200402) (1.14.0)
Requirement already satisfied: numpy>=1.14 in /repos/arrow/pyarrow/lib/python3.6/site-packages (from pyarrow==0.15.1.dev0+g40d468e16.d20200402) (1.18.2)
Installing collected packages: pyarrow
Successfully installed pyarrow-0.15.1.dev0+g40d468e16.d20200402
(pyarrow) root@9260485caca3:/repos/arrow/python/dist# python
Python 3.6.9 (default, Nov  7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyarrow import cuda
>>>

When the line from pyarrow import cuda runs without error, then we know that our pyarrow build with CUDA was successful.

April 03, 2020 12:00 AM UTC


Armin Ronacher

App Assisted Contact Tracing

I don't know how I thought the world would look like 10 years ago, but a pandemic that prevents us from going outside was not what I was picturing. It's about three weeks now that I and my family are spending at home in Austria instead of going to work or having the kids at daycare, two of those weeks were under mandatory social distancing because of SARS-CoV-2.

And as cute as social distancing and “flattening the curve” sounds at first, the consequences to our daily lives are beyond anything I could have imagined would happen in my lifetime.

What is still conveniently forgotten is that the curve really only stays flat if we're doing this for a very, very long time. And quite frankly, I'm not sure for how long our society will be able to do this. Even just closing restaurants is costing tens of thousands of jobs and closing schools is going to set back the lives of many children growing up. Many people are currently separated from their loved ones with no easy way to get to them because international travel grinded to a halt.

Technology to the Rescue

So to cut a very long story short: we can get away without social distancing with the help of technology. This is why: the most efficient way to fight the outbreak of a pandemic is isolating cases. If you can catch them before they can infect others you can starve the virus. Now the issue with this is obviously that you have people running around with the virus who can infect others but are not symptomatic. So we can only do the second next best thing: if we can find all the people they had contact with when they finally become symptomatic, we can narrow down the search radius for tests.

So a very successful approach could be:

  1. find a covid-19 suspect
  2. test the person
  3. when they are positive, test all of their close contacts

So how do we find their cases? The tool of choice in many countries already are apps. They send out a beacon signal and collect beacon signals of other users around. When someone tests positive, healthcare services can notice contacts.

Avoiding Orwell

Now this is where it gets interesting. Let's take Austria for instance where I live. We have around 9 million residents here. Let's assume we're aiming for 60% of resident using that app. That sounds like a surveillance state and scalability nightmare for a country known for building scalable apps.

But let's think for a moment what is actually necessary to achieve our goal: it turns out we could largely achieve what we want without a centralized infrastructure.

Let's set the window of people we care about to something like 5 days. This means that if someone tests positive, that person's contacts of the last 5 days ideally get informed about a covid case they had contact with. How do we design such a system that it's not a privacy invading behemoth?

The app upon installation would roll a random ID and store it. Then it encrypts the ID it just created with the public key of a central governmental authority and broadcasts it to other people around via bluetooth. It then cycles this ID in regular intervals.

When another device (the infected person) sees this ID it measures signal strength and time observed. When enough time was spent with the other person and that contact was “close enough” it records the broadcast (encrypted ID) on the device. The device also just deletes records older than 5 days.

When person is identified as infected they need to export the contacts from their app and send it to the health ministry. They could use their private key to decrypt the IDs and then get in contact with the potential contacts.

How do they do that? One option does involve a system like a push notification service. That would obviously require the device to register their unique ID with a central server and a push notification channel but this would not reveal much.

Another option could be to do the check in manually which would work for non connected IoT type of solutions. You could implement such a system as a token you need to regularly bring to a place to check if you are now considered a contact person. For instance one could deploy check-in stations at public transport hubs where you hold your token against and if one of your contacts was infected it would beep.

Either way the central authority would not know who you are. Your only point of contact would be when you become a covid case. Most importantly this system could be created in a way where it's completely useless for tracking people but still be useful for contact tracing.

The Phone in your Pocket

I had conversations with a lot of people over the last few days about contact tracing apps and I noticed — particularly from technically minded people — an aversion against the idea of contact tracing via apps. This does not surprise me, because it's an emotional topic. However it does hammer home a point that people are very good at misjudging data privacy.

Almost every person I know uses Google maps on their phone with location history enabled. With that, they also participate in a large data collection project where their location is constantly being transmitted to Google. They use this information to judge how fluid traffic is on the road, how many people are at stores, how busy public transit is etc. All that data is highly valuable and people love to use this data. I know I do. I'm also apparently entirely okay with that, even though I know there is an associated risk.

The Future

My point here is a simple one: contact tracing if done well is significantly less privacy infringing than what many tech companies already do where we're okay with.

I also believe that contact tracing via apps or hardware tokens is our best chance to return to a largely normal life without giving up all our civil liberties. I really hope that we're going to have informed and reasonable technical discussions about how to do contact tracing right and give this a fair chance.

April 03, 2020 12:00 AM UTC