skip to navigation
skip to content

Planet Python

Last update: February 20, 2020 07:48 AM UTC

February 20, 2020


Programiz

Python Programming

February 20, 2020 04:37 AM UTC


Moshe Zadka

Forks and Threats

What is a threat? From a game-theoretical perspective, a threat is an attempt to get a better result by saying: "if you do not give me this result, I will do something that is bad for both of us". Note that it has to be bad for both sides: if it is good for the threatening side, they would do it anyway. While if it is good for the threatened side, it is not a threat.

Threats rely on credibility and reputation: the threatening side has to be believed for the threat to be useful. One way to gain that reputation is to follow up on threats, and have that be a matter of public record. This means that the threatening side needs to take into account that they might have to act on the threat, thereby doing something against their own interests. This leads to the concept of a "credible" or "proportionate" threat.

For most of our analysis, we will use the example of a teacher union striking. Similar analysis can be applied to nuclear war, or other cases. People mostly have positive feelings for teachers, and when teacher unions negotiate, they want to take advantage of those feelings. However, the one thing that leads people to be annoyed with teachers is a strike: this causes large amounts of unplanned scheduling crisis in people's lives.

In our example, a teacher union striking over, say, a minor salary raise disagreement is not credible: the potential harm is small, while the strike will significantly harm the teachers' image.

However, strikes are, to a first approximation, the only tool teacher unions have in their arsenal. Again, take the case of a minor salary raise. Threatening with a strike is so disproportional that there is no credibility. We turn to one of the fundamental insights of game theory: rational actors treat utility as linear in probability. So, while starting a strike that is twice as long is not twice as bad, increasing the probability of starting a strike from 0 to 1 is twice as bad (exactly!) as increasing the probability from 0 to 0.5.

(If you are a Bayesian who does not believe in 0 and 1 as probabilities, note that the argument works with approximations too: increasing the probability from a small e to 0.5 is approximately twice as bad as increasing it from e to 1-e.)

All one side has is a strike. Assume the disutility of a strike to that side is -1,000,000. Assume the utility of winning the salary negotiation is 1. They can threaten that if their position is not accepted, they will generate a random number, and if it is below 1/1,000,000, they will start the strike. Now the threat is credible. But to be gain that reputation, this number has to be generated in public, in an uncertain way: otherwise, no reputation is gained for following up on threats.

In practice, usually the randomness is generated by "inflaming the base". The person in charge will give impassioned speeches on how important this negotiation is. With some probability, their base will pressure them to start the strike, without them being able to resist it.

Specifically, note that often a strike is determined by a direct vote of the members, not the union leaders. This means that union leaders can credibly say, "please do not vote for the strike, we are against it". With some probability, that depends on how much they inflamed the base, the membership will ignore the request. The more impassioned the speech, the higher the probability. By limiting their direct control over the decision to strike, union leaders gain the ability to threaten probabilistically.

Nuclear war and union strikes are both well-studied topics in applied game theory. The explanation above is a standard part of many text books: in my case, I summarized the explanation from Games of Strategy, pg. 487.

What is not well studied are the dynamics of open source projects. There, we have a set of owners who can directly influence such decisions as which patches land, and when versions are released. More people will offer patches, or ask for a release to happen. The only credible threat they have is to fork the project if they do not like how it is managed. But forking is often a disproportinate threat: a patch not landing often just means an ugly work-around in user code. There is a cost, but the cost of maintaining a fork is much greater.

But similar to a union strike, or launching a nuclear war, we can consider a "probabilistic fork". Rant on twitter, or appropriate mailing lists. Link to the discussion, especially to places which make the owners not in the best light. Someone might decide to "rage-fork". More rants, or more extreme rants, increase the probability. A fork has to be possible in the first place: this is why the best way to evaluate whether something is open source is to consider "how possible is a fork".

This is why the possibility of a fork changes the dynamics of a project, even if forks are rare: because the main thing that happens are "low-probability maybe-forks".

February 20, 2020 04:00 AM UTC


Mike Driscoll

Python 101 2nd Edition Fully Funded + Stretch Goals

The second edition of my book, Python 101, has been successfully funded on Kickstarter. As is tradition, I have added a couple of stretch goals for adding more content to this already hefty book.

Python 101

Here are the goals:

1) $5000 – Get 4 Bonus Chapters

These chapters would cover the following topics:

  • Assignment Expressions
  • How to Create a GUI
  • How to Create Graphs
  • How to Work with Images in Python

2) $7500 – Add Chapter Review Questions

The additional chapters are pretty exciting to me as they are fun things to do with Python while also being useful. The assignment expression chapter is also something that is new in Python and may be of use to you soon.

Adding chapter review questions was something I have always wanted to do with Python 101. Hopefully you will find that idea interesting as well.

If you are interested in getting the book or supporting this site, you can head over to Kickstarter now. There are some really good deals for some of my other books there too!

The post Python 101 2nd Edition Fully Funded + Stretch Goals appeared first on The Mouse Vs. The Python.

February 20, 2020 02:12 AM UTC

February 19, 2020


Real Python

Null in Python: Understanding Python's NoneType Object

If you have experience with other programming languages, like C or Java, then you’ve probably heard of the concept of null. Many languages use this to represent a pointer that doesn’t point to anything, to denote when a variable is empty, or to mark default parameters that you haven’t yet supplied. null is often defined to be 0 in those languages, but null in Python is different.

Python uses the keyword None to define null objects and variables. While None does serve some of the same purposes as null in other languages, it’s another beast entirely. As the null in Python, None is not defined to be 0 or any other value. In Python, None is an object and a first-class citizen!

In this tutorial, you’ll learn:

Free Bonus: Click here to get a Python Cheat Sheet and learn the basics of Python 3, like working with data types, dictionaries, lists, and Python functions.

Understanding Null in Python

None is the value a function returns when there is no return statement in the function:

>>>
>>> def has_no_return():
...     pass
>>> has_no_return()
>>> print(has_no_return())
None

When you call has_no_return(), there’s no output for you to see. When you print a call to it, however, you’ll see the hidden None it returns.

In fact, None so frequently appears as a return value that the Python REPL won’t print None unless you explicitly tell it to:

>>>
>>> None
>>> print(None)
None

None by itself has no output, but printing it displays None to the console.

Interestingly, print() itself has no return value. If you try to print a call to print(), then you’ll get None:

>>>
>>> print(print("Hello, World!"))
Hello, World!
None

It may look strange, but print(print("...")) shows you the None that the inner print() returns.

None also often used as a signal for missing or default parameters. For instance, None appears twice in the docs for list.sort:

>>>
>>> help(list.sort)
Help on method_descriptor:

sort(...)
    L.sort(key=None, reverse=False) -> None -- stable sort *IN PLACE*

Here, None is the default value for the key parameter as well as the type hint for the return value. The exact output of help can vary from platform to platform. You may get different output when you run this command in your interpreter, but it will be similar.

Using Python’s Null Object None

Often, you’ll use None as part of a comparison. One example is when you need to check and see if some result or parameter is None. Take the result you get from re.match. Did your regular expression match a given string? You’ll see one of two results:

  1. Return a Match object: Your regular expression found a match.
  2. Return a None object: Your regular expression did not find a match.

In the code block below, you’re testing if the pattern "Goodbye" matches a string:

>>>
>>> import re
>>> match = re.match(r"Goodbye", "Hello, World!")
>>> if match is None:
...     print("It doesn't match.")
It doesn't match.

Here, you use is None to test if the pattern matches the string "Hello, World!". This code block demonstrates an important rule to keep in mind when you’re checking for None:

The equality operators can be fooled when you’re comparing user-defined objects that override them:

>>>
>>> class BrokenComparison:
...     def __eq__(self, other):
...         return True
>>> b = BrokenComparison()
>>> b == None  # Equality operator
True
>>> b is None  # Identity operator
False

Here, the equality operator == returns the wrong answer. The identity operator is, on the other hand, can’t be fooled because you can’t override it.

Note: For more info on how to compare with None, check out Do’s and Dont’s: Python Programming Recommendations.

None is falsy, which means not None is True. If all you want to know is whether a result is falsy, then a test like the following is sufficient:

>>>
>>> some_result = None
>>> if some_result:
...     print("Got a result!")
... else:
...     print("No result.")
...
No result.

The output doesn’t show you that some_result is exactly None, only that it’s falsy. If you must know whether or not you have a None object, then use is and is not.

The following objects are all falsy as well:

For more on comparisons, truthy, and falsy values, check out How to Use the Python or Operator.

Declaring Null Variables in Python

In some languages, variables come to life from a declaration. They don’t have to have an initial value assigned to them. In those languages, the initial default value for some types of variables might be null. In Python, however, variables come to life from assignment statements. Take a look at the following code block:

>>>
>>> print(bar)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'bar' is not defined
>>> bar = None
>>> print(bar)
None

Here, you can see that a variable with the value None is different from an undefined variable. All variables in Python come into existence by assignment. A variable will only start life as null in Python if you assign None to it.

Using None as a Default Parameter

Very often, you’ll use None as the default value for an optional parameter. There’s a very good reason for using None here rather than a mutable type such as a list. Imagine a function like this:

def bad_function(new_elem, starter_list=[]):
    starter_list.append(new_elem)
    return starter_list

bad_function() contains a nasty surprise. It works fine when you call it with an existing list:

>>>
>>> my_list = ['a', 'b', 'c']
>>> bad_function('d', my_list)
['a', 'b', 'c', 'd']

Here, you add `’d” to the end of the list with no problems.

But if you call this function a couple times with no starter_list parameter, then you start to see incorrect behavior:

>>>
>>> bad_function('a')
['a']
>>> bad_function('b')
['a', 'b']
>>> bad_function('c')
['a', 'b', 'c']

The default value for starter_list evaluates only once at the time the function is defined, so the code reuses it every time you don’t pass an existing list.

The right way to build this function is to use None as the default value, then test for it and instantiate a new list as needed:

>>>
 1 >>> def good_function(new_elem, starter_list=None):
 2 ...     if starter_list is None:
 3 ...         starter_list = []
 4 ...     starter_list.append(new_elem)
 5 ...     return starter_list
 6 ...
 7 >>> good_function('e', my_list)
 8 ['a', 'b', 'c', 'd', 'e']
 9 >>> good_function('a')
10 ['a']
11 >>> good_function('b')
12 ['b']
13 >>> good_function('c')
14 ['c']

good_function() behaves as you want by making a new list with each call where you don’t pass an existing list. It works because your code will execute lines 2 and 3 every time it calls the function with the default parameter.

Using None as a Null Value in Python

What do you do when None is a valid input object? For instance, what if good_function() could either add an element to the list or not, and None was a valid element to add? In this case, you can define a class specifically for use as a default, while being distinct from None:

>>>
>>> class DontAppend: pass
...
>>> def good_function(new_elem=DontAppend, starter_list=None):
...     if starter_list is None:
...         starter_list = []
...     if new_elem is not DontAppend:
...         starter_list.append(new_elem)
...     return starter_list
...
>>> good_function(starter_list=my_list)
['a', 'b', 'c', 'd', 'e']
>>> good_function(None, my_list)
['a', 'b', 'c', 'd', 'e', None]

Here, the class DontAppend serves as the signal not to append, so you don’t need None for that. That frees you to add None when you want.

You can use this technique when None is a possibility for return values, too. For instance, dict.get returns None by default if a key is not found in the dictionary. If None was a valid value in your dictionary, then you could call dict.get like this:

>>>
>>> class KeyNotFound: pass
...
>>> my_dict = {'a':3, 'b':None}
>>> for key in ['a', 'b', 'c']:
...     value = my_dict.get(key, KeyNotFound)
...     if value is not KeyNotFound:
...         print(f"{key}->{value}")
...
a->3
b->None

Here you’ve defined a custom class KeyNotFound. Now, instead of returning None when a key isn’t in the dictionary, you can return KeyNotFound. That frees you to return None when that’s the actual value in the dictionary.

Deciphering None in Tracebacks

When NoneType appears in your traceback, it means that something you didn’t expect to be None actually was None, and you tried to use it in a way that you can’t use None. Almost always, it’s because you’re trying to call a method on it.

For instance, you called append() on my_list many times above, but if my_list somehow became anything other than a list, then append() would fail:

>>>
>>> my_list.append('f')
>>> my_list
['a', 'b', 'c', 'd', 'e', None, 'f']
>>> my_list = None
>>> my_list.append('g')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'append'

Here, your code raises the very common AttributeError because the underlying object, my_list, is not a list anymore. You’ve set it to None, which doesn’t know how to append(), and so the code throws an exception.

When you see a traceback like this in your code, look for the attribute that raised the error first. Here, it’s append(). From there, you’ll see the object you tried to call it on. In this case, it’s my_list, as you can tell from the code just above the traceback. Finally, figure out how that object got to be None and take the necessary steps to fix your code.

Checking for Null in Python

There are two type checking cases where you’ll care about null in Python. The first case is when you’re returning None:

>>>
>>> def returns_None() -> None:
...     pass

This case is similar to when you have no return statement at all, which returns None by default.

The second case is a bit more challenging. It’s where you’re taking or returning a value that might be None, but also might be some other (single) type. This case is like what you did with re.match above, which returned either a Match object or None.

The process is similar for parameters:

from typing import Any, List, Optional
def good_function(new_elem:Any, starter_list:Optional[List]=None) -> List:
    pass

You modify good_function() from above and import Optional from typing to return an Optional[Match].

Taking a Look Under the Hood

In many other languages, null is just a synonym for 0, but null in Python is a full-blown object:

>>>
>>> type(None)
<class 'NoneType'>

This line shows that None is an object, and its type is NoneType.

None itself is built into the language as the null in Python:

>>>
>>> dir(__builtins__)
['ArithmeticError', ..., 'None', ..., 'zip']

Here, you can see None in the list of __builtins__ which is the dictionary the interpreter keeps for the builtins module.

None is a keyword, just like True and False. But because of this, you can’t reach None directly from __builtins__ as you could, for instance, ArithmeticError. However, you can get it with a getattr() trick:

>>>
>>> __builtins__.ArithmeticError
<class 'ArithmeticError'>
>>> __builtins__.None
  File "<stdin>", line 1
    __builtins__.None
                    ^
SyntaxError: invalid syntax
>>> print(getattr(__builtins__, 'None'))
None

When you use getattr(), you can fetch the actual None from __builtins__, which you can’t do by simply asking for it with __builtins__.None.

Even though Python prints the word NoneType in many error messages, NoneType is not an identifier in Python. It’s not in builtins. You can only reach it with type(None).

None is a singleton. That is, the NoneType class only ever gives you the same single instance of None. There’s only one None in your Python program:

>>>
>>> my_None = type(None)()  # Create a new instance
>>> print(my_None)
None
>>> my_None is None
True

Even though you try to create a new instance, you still get the existing None.

You can prove that None and my_None are the same object by using id():

>>>
>>> id(None)
4465912088
>>> id(my_None)
4465912088

Here, the fact that id outputs the same integer value for both None and my_None means they are, in fact, the same object.

Note: The actual value produced by id will vary across systems, and even between program executions. Under CPython, the most popular Python runtime, id() does its job by reporting the memory address of an object. Two objects that live at the same memory address are the same object.

If you try to assign to None, then you’ll get a SyntaxError:

>>>
>>> None = 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SyntaxError: can't assign to keyword
>>> None.age = 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'age'
>>> setattr(None, 'age', 5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'age'
>>> setattr(type(None), 'age', 5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't set attributes of built-in/extension type 'NoneType'

All the examples above show that you can’t modify None or NoneType. They are true constants.

You can’t subclass NoneType, either:

>>>
>>> class MyNoneType(type(None)):
...     pass
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: type 'NoneType' is not an acceptable base type

This traceback shows that the interpreter won’t let you make a new class that inherits from type(None).

Conclusion

None is a powerful tool in the Python toolbox. Like True and False, None is an immutable keyword. As the null in Python, you use it to mark missing values and results, and even default parameters where it’s a much better choice than mutable types.

Now you can:

How do you use the null in Python? Leave a comment down in the comments section below!


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

February 19, 2020 02:00 PM UTC


Peter Bengtsson

Build pyenv Python versions on macOS Catalina 10.15

I'm still working on getting pyenv in my bloodstream. It seems like totally the right tool for having different versions of Python available on macOS that don't suddenly break when you run brew upgrade periodically. But every thing I tried failed with an error similar to this:

python-build: use openssl from homebrew
python-build: use readline from homebrew
Installing Python-3.7.0...
python-build: use readline from homebrew

BUILD FAILED (OS X 10.15.x using python-build 20XXXXXX)

Inspect or clean up the working tree at /var/folders/mw/0ddksqyn4x18lbwftnc5dg0w0000gn/T/python-build.20190528163135.60751
Results logged to /var/folders/mw/0ddksqyn4x18lbwftnc5dg0w0000gn/T/python-build.20190528163135.60751.log

Last 10 log lines:
./Modules/posixmodule.c:5924:9: warning: this function declaration is not a prototype [-Wstrict-prototypes]
    if (openpty(&master_fd, &slave_fd, NULL, NULL, NULL) != 0)
        ^
./Modules/posixmodule.c:6018:11: error: implicit declaration of function 'forkpty' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
    pid = forkpty(&master_fd, NULL, NULL, NULL);
          ^
./Modules/posixmodule.c:6018:11: warning: this function declaration is not a prototype [-Wstrict-prototypes]
2 warnings and 2 errors generated.
make: *** [Modules/posixmodule.o] Error 1
make: *** Waiting for unfinished jobs....

I read through the Troubleshooting FAQ and the "Common build problems" documentation. xcode was up to date and I had all the related brew packages upgraded. Nothing seemed to work.

Until I saw this comment on an open pyenv issue: "Unable to install any Python version on MacOS"

All I had to do was replace the 10.14 for 10.15 and now it finally worked here on Catalina 10.15. So, the magical line was this:

SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk \
MACOSX_DEPLOYMENT_TARGET=10.15 \
PYTHON_CONFIGURE_OPTS="--enable-framework" \
pyenv install -v 3.7.6

Hopefully, by blogging about it you'll find this from Googling and I'll remember the next time I need it because it did eat 2 hours of precious evening coding time.

February 19, 2020 01:11 PM UTC


Test and Code

101: Application Security - Anthony Shaw

Application security is best designed into a system from the start.
Anthony Shaw is doing something about it by creating an editor plugin that actually helps you write more secure application code while you are coding.

On today's Test & Code, Anthony and I discuss his security plugin, but also application security in general, as well as other security components you need to consider.

Security is something every team needs to think about, whether you are a single person team, a small startup, or a large corporation.

Anthony and I also discuss where to start if it's just a few of you, or even just one of you.

Topics include:

Special Guest: Anthony Shaw.

Sponsored By:

Support Test & Code: Python Software Testing & Engineering

Links:

<p>Application security is best designed into a system from the start.<br> Anthony Shaw is doing something about it by creating an editor plugin that actually helps you write more secure application code while you are coding.</p> <p>On today&#39;s Test &amp; Code, Anthony and I discuss his security plugin, but also application security in general, as well as other security components you need to consider.</p> <p>Security is something every team needs to think about, whether you are a single person team, a small startup, or a large corporation.</p> <p>Anthony and I also discuss where to start if it&#39;s just a few of you, or even just one of you.</p> <p>Topics include:</p> <ul> <li>Finding security risks while writing code.</li> <li>What are the risks for your applications.</li> <li>Thinking about attack surfaces.</li> <li>Static and dynamic code analysis.</li> <li>Securing the environment an app is running in.</li> <li>Tools for scanning live sites for vulnerabilities.</li> <li>Secret management.</li> <li>Hashing algorithms.</li> <li>Authentication systems.</li> <li>and Anthony&#39;s upcoming cPython Internals book.</li> </ul><p>Special Guest: Anthony Shaw.</p><p>Sponsored By:</p><ul><li><a href="https://oxylabs.io/testandcode" rel="nofollow">Oxylabs</a>: <a href="https://oxylabs.io/testandcode" rel="nofollow">Visit oxylabs.io/testandcode to find out more about their services and to apply for a free trial of their Next-Generation Residential Proxies.</a></li></ul><p><a href="https://www.patreon.com/testpodcast" rel="payment">Support Test & Code: Python Software Testing & Engineering</a></p><p>Links:</p><ul><li><a href="https://plugins.jetbrains.com/plugin/13609-python-security" title="Python Security - plugin for PyCharm" rel="nofollow">Python Security - plugin for PyCharm</a></li><li><a href="https://bandit.readthedocs.io/en/latest/" title="Bandit" rel="nofollow">Bandit</a></li><li><a href="https://www.hackthebox.eu/" title="Hack The Box " rel="nofollow">Hack The Box </a></li></ul>

February 19, 2020 08:00 AM UTC


Python Bytes

#169 Jupyter Notebooks natively on your iPad

February 19, 2020 08:00 AM UTC


Catalin George Festila

Python 3.7.5 : The PyQtChart from python Qt5.

The PyQtChart is a set of Python bindings for The Qt Company’s Qt Charts library and is implemented as a single module. Let's install this python package with the pip3 tool: [mythcat@desk ~]$ pip3 install PyQtChart --user ... Installing collected packages: PyQtChart Successfully installed PyQtChart-5.14.0 Let's test with a simple example: from PyQt5.QtWidgets import QApplication, QMainWindow

February 19, 2020 07:56 AM UTC


Kushal Das

Which verison of Python are you running?

The title of the is post is misleading.

I actually want to ask you which version of Python3 are you running? Yes, it is a question I have to ask myself based on projects I am working on. I am sure there are many more people in the world who are also in the similar situation.

Just to see what all versions of Python(3) I am running in different places:

What about you?

February 19, 2020 04:44 AM UTC

February 18, 2020


PyCoder’s Weekly

Issue #408 (Feb. 18, 2020)

#408 – FEBRUARY 18, 2020
View in Browser »

The PyCoder’s Weekly Logo


Finding the Perfect Python Code Editor

Find your perfect Python development setup with this review of Python IDEs and code editors. Writing Python using IDLE or the Python REPL is great for simple things, but not ideal for larger programming projects. With this course you’ll get an overview of the most common Python coding environments to help you make an informed decision.
REAL PYTHON video

Overloading Functions in Python

Python does not natively support function overloading (having multiple functions with the same name.) See how you can implement and add this functionality using common language constructs like decorators and dictionaries. Related discussion on Hacker News.
ARPIT BHAYANI

Python Developers Are in Demand on Vettery

alt

Vettery is an online hiring marketplace that’s changing the way people hire and get hired. Ready for a bold career move? Make a free profile, name your salary, and connect with hiring managers from top employers today →
VETTERY sponsor

PEP 614 (Draft): Relaxing Grammar Restrictions on Decorators

“Python currently requires that all decorators consist of a dotted name, optionally followed by a single call. This PEP proposes removing these limitations and allowing decorators to be any valid expression.” For example, this would become a valid decoration: @buttons[1].clicked.connect
PYTHON.ORG

Building Good Python Tests

A collection of testing maxims, tips, and gotchas, with a few pytest-specific notes. Things to do and not to do when it comes to writing automated tests.
CHRIS NEJAME • Shared by Chris NeJame

Types at the Edges in Python

Adding more strict typing around the edges of a Python system for better error messages and design, using Pydantic and mypy. Interesting read!
STEVE BRAZIER

Python 3.9 StatsProfile

The author of the profiling API improvements coming to Python 3.9 demonstrates the feature and explains how it was added to CPython.
DANIEL OLSHANSKY

Robots and Generative Art and Python, Oh My!

How to make cool looking plotter art with NumPy, SciPy, and Matplotlib.
GEOFFREY BRADWAY

PyCon US 2020 Tutorial Schedule

PYCON.ORG

Python 3.8.2rc2 Is Now Available for Testing

PYTHON INSIDER

Python Jobs

Senior Python/Django Engineer (London, UK)

Zego

Python Developer (Malta)

Gaming Innovation Group

Senior Python Software Engineer (London, UK)

Tessian

Senior Backend Engineer (Denver, CO)

CyberGRX

Senior Software Developer (Vancouver, BC, Canada)

AbCellera

More Python Jobs >>>

Articles & Tutorials

Python Community Interview With Brett Slatkin

Brett Slatkin is a principal software engineer at Google and the author of the Python programming book Effective Python. Join us as we discuss Brett’s experience working with Python at Google, refactoring, and the challenges he faced when writing the second edition of his book.
REAL PYTHON

My Unpopular Opinion About Black Code Formatter

“In this post, I will try to gather all my thoughts on the topic of automatic code formatting and why I personally don’t like this approach.”
LUMINOUSMEN.COM

How to Build a Blockchain in Python

alt

Blockchain, the system behind Bitcoin, is immutable, unhackable, persistent and distributed, and has many potential applications. Check out ActiveState’s tutorial on how to build a blockchain in Python and Flask using a pre-built runtime environment →
ACTIVESTATE sponsor

Uniquely Managing Test Execution Resources Using WebSockets

Learn about managing resources for test execution, while building an asynchronous WebSocket client-server application that tracks them using Python and Sanic.
CRISTIAN MEDINA

Refactoring and Asking for Forgiveness

“Recently, I had a great interaction with one of my coworkers that I think is worth sharing, with the hope you may learn a bit about refactoring and Python.”
CHRIS MAY

Guide to Python’s Newer String Format Techniques

In the last tutorial in this series, you learned how to format string data using the string modulo operator. In this tutorial, you’ll see two more items to add to your Python string formatting toolkit. You’ll learn about Python’s string format method and the formatted string literal, or f-string.
REAL PYTHON

Full Text Search With Postgres and Django [2017]

“In this post I will walk through the process of building decent search functionality for a small to medium sized website using Django and Postgres.”
SCOTT CZEPIEL

Python Tools for Record Linking and Fuzzy Matching

Useful Python tools for linking record sets and fuzzy matching on text fields. These concepts can also be used to deduplicate data.
CHRIS MOFFITT

Classify Valentine’s Day Texts With TensorFlow and Twilio

Use TensorFlow and Machine Learning to classify Twilio texts into two categories: “loves me” and “loves me not.”
LIZZIE SIEGLE • Shared by Lizzie Siegle

Tour of Python Itertools

Explore the itertools and more_itertools Python libraries and see how to leverage them for data processing.
MARTIN HEINZ • Shared by Martin Heinz

Python Static Analysis Tools

Find and fix the bugs and code smells in your Python code with the popular tools for analyzing code.
LUMINOUSMEN.COM

Getting the Most Out of Python Collections

A guide to comprehensions, generators and useful functions and classes.
NICK THAPEN

Blackfire Profiler Public Beta Open—Get Started in Minutes

Blackfire Profiler now supports Python, through a Public Beta. Profile Python code with Blackfire’s intuitive developer experience and appealing user interface. Spot bottlenecks in your code, and compare code iterations profiles.
BLACKFIRE sponsor

Autoencoders With Keras, TensorFlow, and Deep Learning

ADRIAN ROSEBROCK

Why Python Is the Best Programming Language for a Startup

GLEB PUSHKOV

Dissecting a Web Stack

LEONARDO GIORDANI

Experience Report on a Large Python-To-Go Translation

ERIC S. RAYMOND

Introduction to Kafka With Docker and Python

DORON VAINRUB

Guide to Reading Excel (xlsx) Files in Python

ERIK MARSJA

Modularity for Maintenance

GLYPH

Projects & Code

pycharm-security: PyCharm Plugin to Find Security Holes in Your Python Projects

GITHUB.COM/TONYBALONEY

Deadsnakes PPA Builds for Debian in Docker

The Deadsnakes PPA project builds older and newer Python versions not found on a specific Ubuntu release. Originally based on the Debian source packages, they can still be built on Debian and not just on Ubuntu.
GITHUB.COM/JHERMANN • Shared by Jürgen Hermann

stdlib-property-tests: Property-Based Tests for the Python Standard Library (And Builtins)

GITHUB.COM/ZAC-HD

django-guid: Inject a GUID (Correlation-ID) Into Every Log Message in a Django Request

GITHUB.COM/JONASKS

ursina: A Game Engine Powered by Python and Panda3d

GITHUB.COM/POKEPETTER

VaultSSH: CLI Tool for Signing SSH Public Keys Using the Vault SSH Endpoint

GITHUB.COM/JMGILMAN

icalendar_light: iCalendar Event Reader

GITHUB.COM/IDLESIGN • Shared by pythonz

IRedis: Terminal Client for Redis With Auto-Complete and Syntax Highlighting

IREDIS.IO

Events

PyCon Namibia 2020

February 18 to February 21, 2020
PYCON.ORG

PyData Bristol Meetup

February 20, 2020
MEETUP.COM

Python Northwest

February 20, 2020
PYNW.ORG.UK

PyLadies Dublin

February 20, 2020
PYLADIES.COM

Open Source Festival

February 20 to February 23, 2020
OSCAFRICA.ORG

PyCon Belarus 2020

February 21 to February 23, 2020
PYCON.ORG


Happy Pythoning!
This was PyCoder’s Weekly Issue #408.
View in Browser »

alt

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

February 18, 2020 07:30 PM UTC


Roberto Alsina

Airflow By Example (II)

Second twitter thread about following a simple example to learn Apache Airflow.

Read more… (2 min remaining to read)

February 18, 2020 07:21 PM UTC


PyCon

The Hatchery Returns with Nine Events!

Since its start in 2018, the PyCon US Hatchery Program has become a fundamental part of how PyCon as a conference adapts to best serve the Python community as it grows and changes with time. In keeping with that focus on innovation, the Hatchery Program itself has continued to evolve.
Initially we wanted to gauge community interest for this type of program, and in 2018 we launched our first trial program to learn more about what kind of events the community might propose. At the end of that inaugural program, we accepted the PyCon Charlas as our first Hatchery event and it has grown into a permanent track offered at PyCon US.
PyCon US 2019 presented three new hatchery programs, Mentored Sprints, the Maintainers Summit, and the Art of Python. Those events were quite different from one another, but they all foreshadowed trends we are seeing now.
For PyCon US 2020 there were a dozen proposals, which set us to thinking about how we could accommodate as many events as possible. In addition to the return of the successful offerings from last year we received several other proposals which seem to show three general directions - summits, code events, and artistic presentations. In response to those trends, we decided to tweak the structure a bit, and starting in 2020 the Hatchery will reflect those three broad areas.
While we will always be open to new and innovative proposals, this framework will allow us to better plan for coming years. Within our venue's limits to the space available, thinking in terms of these categories will make us better able to provide resources for as many events as possible. We're excited to see this year's hatchlings!
Here are the Hatchery areas and their events for 2020

The Hatchery Codes

Events where members of the community come together to teach, learn, and practice the art of coding in Python.
This year there will be three events in this area:

Mentored Sprints for Diverse Beginners

A newcomer’s introduction to contributing to an open source project. These are mentored sprints for individuals from underrepresented groups willing to start contributing to Python projects. This event will provide a supportive, friendly, and safe environment for all the attendees and partner open source projects.To achieve this goal, we are seeking to work with a number of Python projects and their maintainers interested in providing mentorship to these individuals. In return, we will provide guidance and advise on how to prepare the projects for the day and to better serve a diverse range of contributors.
To learn more about how to take part in this event see https://us.pycon.org/2020/hatchery/mentoredsprints/.

Trans*Code Hackday

Trans*Code aims to help draw attention to trans/non-binary issues and community through a topic-focused hackday. Any person is welcomed to participate, regardless if they are from tech or not. This event is free of expectations, and free of schedule. You can come, present an idea that you have, or listen to other ideas, or just get together with other participants to explore a new technology, or brainstorm something.
For more info see https://us.pycon.org/2020/hatchery/transcode/.

Beginners Data Workshop for Minorities

Would you like to learn to code but don’t know where to start? Learning how to code can seem like an impossible task so we’ve decided to put on a workshop to show 60 beginners how it can be done and get them excited about the world of technology! Join us on 18th April 2020 for a workshop where you’ll learn the basics of programming in Python, as well as how to use tools such as Jupyter Notebook to analyze data.
Learn more at https://us.pycon.org/2020/hatchery/beginnersdata/.

The Hatchery Summits

There are many smaller sub-communities within the larger Python community that struggle to find a dedicated time and space to meet and discuss their issues. PyCon, as one of the largest gatherings of Python folk in the world seems like a great place to offer this option.
This year there will be four summits as part of the Hatchery:

Regional PyCon Organizers' Summit

The Regional Conference Organizers’ Summit is a place for people who run or are interested in running Python conferences to gather to share knowledge, seek advice, and work together to help build better Python conferences throughout the world. The Summit is a half-day “unconference”-style event. That means we aren’t calling for prepared presentations. Instead we’ll have moderated discussions where anyone attending can contribute: from experienced organizers offering advice, to new and interested organizers asking questions about where to start. To guide the day, we will have a set agenda, so you can prepare questions, or come along with ideas.
To learn more see https://us.pycon.org/2020/hatchery/organizers/.

Python Trainers Summit

This summit seeks to forge connections within the trainers from all practices to formalize their community of practice and connect them with others working within education contexts. This summit will provide the space and platform for professional trainers to engage with the more formal education practices, and those educators to connect with the needs of corporate and professional development audiences.
To learn more or submit a talk proposal visit https://us.pycon.org/2020/hatchery/trainers/.

Maintainers Summit

Python is much more than a programming language. It is a vibrant community made up of individuals with diverse skills and backgrounds. Maintainers Summit at PyCon USA is where the community comes together to discuss and build a community of practice for Python project maintainers and key contributors. Come and learn from your peers how to maintain and develop sustainable projects and thriving communities.
We are inviting Python community members to get on stage and share their insight and experience with the PyCon 2020 audience. Talk proposals from first-time speakers and Pythonistas from the underrepresented groups within the tech community are strongly encouraged.
For more details visit: https://us.pycon.org/2020/hatchery/maintainers/.

Python Packaging Summit

The Python Packaging Summit is an event primarily to people contributing to the python packaging ecosystem, any interpreter or distribution (CPython, PyPy, Conda, and so on) to share information, discuss our shared problems, and — hopefully — come up with solutions to tackle them. These issues might be related to any of the python packaging projects, independently if it’s hosted or not under the PyPa umbrella. The Summit focuses on discussions and consensus-seeking on problems faced, and how we should solve them.
We welcome developers who maintain any of the python packaging tools, or active contributors to these tools.
If you want to tackle some problem in the world of Python packaging please visit https://us.pycon.org/2020/hatchery/packaging/ to learn more about how to submit your topic, and sign up to attend.

The Hatchery Presents

Finally, it should come as no surprise to anyone that the Python world is full of creative people, and that PyCon is a natural place for them to express that creativity. Building on the success of The Art of Python last year, this year we will have two events with an artistic flair:

The Art of Python

The event this year will be roughly 2 hours, the evening of Friday, April 17th. The first half will be composed of five 5-15 minute performances. The second half will involve creative exercises to inspire and workshopping new pieces. There are lots of performances and venues for code that creates art. However, Art of Python is a venue for technologists to create creative works from their experience of working in technology. Remember, the goal of this festival is to impart perspective about the emotional and challenging work of programming Python through the medium of entertainment.
To find out more visit https://us.pycon.org/2020/hatchery/artofpython/.

No Signal: Python for Computational Arts

The purpose of this project is to showcase the artistic possibilities that exist when creative coders leverage Python and computer science. Making art with code is an engaging way for beginner technologists to start learning new skills, and also allows folks with existing skills to express themselves in new ways. Python has become a prominent language for scripting in game engines and 3D creation suites, graphic design, sound design, and circuitry; along with its active user community and so many additional add-on libraries, Python is an approachable tool to use for artistic purposes. We hope to give pythonistas, career computer scientists, and all patrons of the conference a chance to see exciting work and projects, and in turn find inspiration for their own projects and learning.
To find out how to submit your project go to https://us.pycon.org/2020/hatchery/computationalarts/.

February 18, 2020 03:03 PM UTC


Codementor

Wanted: Microsoft Dynamics data in Python scripts

There is plenty of exciting stuff in Microsoft Dynamics that you can use for data analysis or data visualization projects in large organizations. So how do you get your hands on it using a Python script? Good news, it is not that complicated!

February 18, 2020 02:57 PM UTC


Podcast.__init__

APIs, Sustainable Open Source and The Async Web With Tom Christie

Tom Christie is probably best known as the creator of Django REST Framework, but his contributions to the state the web in Python extend well beyond that. In this episode he shares his story of getting involved in web development, his work on various projects to power the asynchronous web in Python, and his efforts to make his open source contributions sustainable. This was an excellent conversation about the state of asynchronous frameworks for Python and the challenges of making a career out of open source.

Summary

Tom Christie is probably best known as the creator of Django REST Framework, but his contributions to the state the web in Python extend well beyond that. In this episode he shares his story of getting involved in web development, his work on various projects to power the asynchronous web in Python, and his efforts to make his open source contributions sustainable. This was an excellent conversation about the state of asynchronous frameworks for Python and the challenges of making a career out of open source.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, node balancers, a 40 Gbit/s public network, and a brand new managed Kubernetes platform, all controlled by a convenient API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they’ve got dedicated CPU and GPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to pythonpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
  • Your host as usual is Tobias Macey and today I’m interviewing Tom Christie about the Encode organization and the work he is doing to drive the state of the art in async for Python

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what the Encode organization is and how it came to be?
    • What are some of the other approaches to funding and sustainability that you have tried in the past?
    • What are the benefits to the developers provided by an organization which you were unable to achieve through those other means?
    • What benefits are realized by your sponsors as compared to other funding arrangements?
  • What projects are part of the Encode organization?
  • How do you determine fund allocation for projects and participants in the organization?
  • What is the process for becoming a member of the Encode organization and what benefits and responsibilities does that entail?
  • A large number of the projects that are part of the organization are focused on various aspects of asynchronous programming in Python. Is that intentional, or just an accident of your own focus and network?
  • For those who are familiar with Python web programming in the context of WSGI, what are some of the practices that they need to unlearn in an async world, and what are some new capabilities that they should be aware of?
  • Beyond Encode and your recent work on projects such as Starlette you are also well known as the creator of Django Rest Framework. How has your experience building and growing that project influenced your current focus on a technical, community, and professional level?
  • Now that Python 2 is officially unsupported and asynchronous capabilities are part of the core language, what future directions do you foresee for the community and ecosystem?
    • What are some areas of potential focus that you think are worth more attention and energy?
  • What do you have planned for the future of Encode, your own projects, and your overall engagement with the Python ecosystem?

Keep In Touch

Picks

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

February 18, 2020 02:35 PM UTC


EuroPython

EuroPython 2020: Presenting our conference logo for Dublin

We’re pleased to announce our official conference logo for EuroPython 2020, July 20-26, in Dublin, Ireland:

image

The logo is inspired by the colors and symbols often associated with Ireland: the shamrock and the Celtic harp. It was again created by our designer Jessica Peña Moro from Simétriko, who had already helped us in previous years with the conference design.

Some more updates:

Enjoy,

EuroPython 2020 Team
https://ep2020.europython.eu/

February 18, 2020 02:10 PM UTC


Real Python

Finding the Perfect Python Code Editor

Find your perfect Python development setup with this review of Python IDEs and code editors. Writing Python using IDLE or the Python REPL is great for simple things, but not ideal for larger programming projects. With this course you’ll get an overview of the most common Python coding environments to help you make an informed decision.

By the end of this course, you’ll know how to:


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

February 18, 2020 02:00 PM UTC


Stack Abuse

Integrating MongoDB with Python Using PyMongo

Introduction

In this post, we will dive into MongoDB as a data store from a Python perspective. To that end, we'll write a simple script to showcase what we can achieve and any benefits we can reap from it.

Web applications, like many other software applications, are powered by data. The organization and storage of this data are important as they dictate how we interact with the various applications at our disposal. The kind of data handled can also have an influence on how we undertake this process.

Databases allow us to organize and store this data, while also controlling how we store, access, and secure the information.

NoSQL Databases

There are two main types of databases - relational and non-relational databases.

Relational databases allow us to store, access, and manipulate data in relation to another piece of data in the database. Data is stored in organized tables with rows and columns with relationships linking the information among tables. To work with these databases, we use the Structured Query Language (SQL) and examples include MySQL and PostgreSQL.

Non-relational databases store data in neither relation or tabular, as in relational databases. They are also referred to as NoSQL databases since we do not use SQL to interact with them.

Furthermore, NoSQL databases can be divided into Key-Value stores, Graph stores, Column stores, and Document Stores, which MongoDB falls under.

MongoDB and When to Use it

MongoDB is a document store and non-relational database. It allows us to store data in collections that are made up of documents.

In MongoDB, a document is simply a JSON-like binary serialization format referred to as a BSON, or Binary-JSON, and has a maximum size of 16 megabytes. This size limit is in place to ensure efficient memory and bandwidth usage during transmission.

MongoDB also provides the GridFS specification in case there is a need to store files larger than the set limit.

Documents are made up of field-value pairs, just like in regular JSON data. However, this BSON format can also contain more data types, such as Date types and Binary Data types. BSON was designed to be lightweight, easily traversable, and efficient when encoding and decoding data to and from BSON.

Being a NoSQL datastore, MongoDB allows us to enjoy the advantages that come with using a non-relational database over a relational one. One advantage is that it offers high scalability by efficiently scaling horizontally through sharding or partitioning of the data and placing it on multiple machines.

MongoDB also allows us to store large volumes of structured, semi-structured, and unstructured data without having to maintain relationships between it. Being open-source, the cost of implementing MongoDB is kept low to just maintenance and expertise.

Like any other solution, there are downsides to using MongoDB. The first one is that it does not maintain relationships between stored data. Due to this, it is hard to perform ACID transactions that ensure consistency.

Complexity is increased when trying to support ACID transactions. MongoDB, like other NoSQL data stores, is not as mature as relational databases and this can make it hard to find experts.

The non-relational nature of MongoDB makes it ideal for the storage of data in specific situations over its relational counterparts. For instance, a scenario where MongoDB is more suitable than a relational database is when the data format is flexible and has no relations.

With flexible/non-relational data, we don't need to maintain ACID properties when storing data as opposed to relational databases. MongoDB also allows us to easily scale data into new nodes.

However, with all its advantages, MongoDB is not ideal when our data is relational in nature. For instance, if we are storing customer records and their orders.

In this situation, we will need a relational database to maintain the relationships between our data, which are important. It is also not suitable to use MongoDB if we need to comply with ACID properties.

Interacting with MongoDB via Mongo Shell

To work with MongoDB, we will need to install the MongoDB Server, which we can download from the official homepage. For this demonstration, we will use the free Community Server.

The MongoDB server comes with a Mongo Shell that we can use to interact with the server via the terminal.

To activate the shell, just type mongo in your terminal. You'll be greeted with information about the MongoDB server set-up, including the MongoDB and Mongo Shell version, alongside the server URL.

For instance, our server is running on:

mongodb://127.0.0.1:27017

In MongoDB, a database is used to hold collections that contains documents. Through the Mongo shell, we can create a new database or switch to an existing one using the use command:

> use SeriesDB

Every operation we execute after this will be effected in our SeriesDB database. In the database, we will store collections, which are similar to tables in relational databases.

For example, for the purposes of this tutorial, let's add a few series to the database:

> db.series.insertMany([
... { name: "Game of Thrones", year: 2012},
... { name: "House of Cards", year: 2013 },
... { name: "Suits", year: 2011}
... ])

We're greeted with:

{
    "acknowledged" : true,
    "insertedIds" : [
        ObjectId("5e300724c013a3b1a742c3b9"),
        ObjectId("5e300724c013a3b1a742c3ba"),
        ObjectId("5e300724c013a3b1a742c3bb")
    ]
}

To fetch all the documents stored in our series collection, we use db.inventory.find({}), whose SQL equivalent is SELECT * FROM series. Passing an empty query (i.e. {}) will return all the documents:

> db.series.find({})

{ "_id" : ObjectId("5e3006258c33209a674d1d1e"), "name" : "The Blacklist", "year" : 2013 }
{ "_id" : ObjectId("5e300724c013a3b1a742c3b9"), "name" : "Game of Thrones", "year" : 2012 }
{ "_id" : ObjectId("5e300724c013a3b1a742c3ba"), "name" : "House of Cards", "year" : 2013 }
{ "_id" : ObjectId("5e300724c013a3b1a742c3bb"), "name" : "Suits", "year" : 2011 }

We can also query data using the equality condition, for instance, to return all the TV series that premiered in 2013:

> db.series.find({ year: 2013 })
{ "_id" : ObjectId("5e3006258c33209a674d1d1e"), "name" : "The Blacklist", "year" : 2013 }
{ "_id" : ObjectId("5e300724c013a3b1a742c3ba"), "name" : "House of Cards", "year" : 2013 }

The SQL equivalent would be SELECT * FROM series WHERE year=2013.

MongoDB also allows us to update individual documents using db.collection.UpdateOne(), or perform batch updates using db.collection.UpdateMany(). For example, to update the release year for Suits:

> db.series.updateOne(
{ name: "Suits" },
{
    $set: { year: 2010 }
}
)
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }

Finally, to delete documents, the Mongo Shell offers the db.collection.deleteOne() and db.collection.deleteMany() functions.

For instance, to delete all the series that premiered in 2012, we'd run:

> db.series.deleteMany({ year: 2012 })
{ "acknowledged" : true, "deletedCount" : 2 }

More information on the CRUD operations on MongoDB can be found in the online reference including more examples, performing operations with conditions, atomicity, and mapping of SQL concepts to MongoDB concepts and terminology.

Integrating Python with MongoDB

MongoDB provides drivers and tools for interacting with a MongoDB datastore using various programming languages including Python, JavaScript, Java, Go, and C#, among others.

PyMongo is the official MongoDB driver for Python, and we will use it to create a simple script that we will use to manipulate data stored in our SeriesDB database.

With Python 3.6+ and Virtualenv installed in our machines, let us create a virtual environment for our application and install PyMongo via pip:

$ virtualenv --python=python3 env --no-site-packages
$ source env/bin/activate
$ pip install pymongo

Using PyMongo, we are going to write a simple script that we can execute to perform different operations on our MongoDB database.

Connecting to MongoDB

First, we import pymongo in our mongo_db_script.py and create a client connected to our locally running instance of MongoDB:

import pymongo

# Create the client
client = MongoClient('localhost', 27017)

# Connect to our database
db = client['SeriesDB']

# Fetch our series collection
series_collection = db['series']

So far, we have created a client that connects to our MongoDB server and used it to fetch our 'SeriesDB' database. We then fetch our 'series' collection and store it in an object.

Creating Documents

To make our script more convenient, we will write functions that wrap around PyMongo to enable us to easily manipulate data. We will use Python dictionaries to represent documents and we will pass these dictionaries to our functions. First, let us create a function to insert data into our 'series' collection:

# Imports truncated for brevity

def insert_document(collection, data):
    """ Function to insert a document into a collection and
    return the document's id.
    """
    return collection.insert_one(data).inserted_id

This function receives a collection and a dictionary of data and inserts the data into the provided collection. The function then returns an identifier that we can use to accurately query the individual object from the database.

We should also note that MongoDB adds an additional _id key to our documents, when they are not provided, when creating the data.

Now let's try adding a show using our function:

new_show = {
    "name": "FRIENDS",
    "year": 1994
}
print(insert_document(series_collection, new_show))

The output is:

5e4465cfdcbbdc68a6df233f

When we run our script, the _id of our new show is printed on the terminal and we can use this identifier to fetch the show later on.

We can provide an _id value instead of having it assigned automatically, which we'd provide in the dictionary:

new_show = {
    "_id": "1",
    "name": "FRIENDS",
    "year": 1994
}

And if we were to try and store a document with an existing _id, we'd be greeted with an error similar to the following:

DuplicateKeyError: E11000 duplicate key error index: SeriesDB.series.$id dup key: { : 1}

Retrieving Documents

To retrieve documents from the database we'll use find_document(), which queries our collection for single or multiple documents. Our function will receive a dictionary that contains the elements we want to filter by, and an optional argument to specify whether we want one document or multiple documents:

# Imports and previous code truncated for brevity

def find_document(collection, elements, multiple=False):
    """ Function to retrieve single or multiple documents from a provided
    Collection using a dictionary containing a document's elements.
    """
    if multiple:
        results = collection.find(elements)
        return [r for r in results]
    else:
        return collection.find_one(elements)

And now, let's use this function to find some documents:

result = find_document(series_collection, {'name': 'FRIENDS'})
print(result)

When executing our function, we did not provide the multiple parameter and the result is a single document:

{'_id': ObjectId('5e3031440597a8b07d2f4111'), 'name': 'FRIENDS', 'year': 1994}

When the multiple parameter is provided, the result is a list of all the documents in our collection that have a name attribute set to FRIENDS.

Updating Documents

Our next function, update_document(), will be used to update a single specific document. We will use the _id of the document and the collection it belongs to when locating it:

# Imports and previous code truncated for brevity

def update_document(collection, query_elements, new_values):
    """ Function to update a single document in a collection.
    """
    collection.update_one(query_elements, {'$set': new_values})

Now, let's insert a document:

new_show = {
    "name": "FRIENDS",
    "year": 1995
}
id_ = insert_document(series_collection, new_show)

With that done, let's update the document, which we'll specify using the _id returned from adding it:

update_document(series_collection, {'_id': id_}, {'name': 'F.R.I.E.N.D.S'})

And finally, let's fetch it to verify that the new value has been put in place and print the result:

result = find_document(series_collection, {'_id': id_})
print(result)

When we execute our script, we can see that our document has been updated:

{'_id': ObjectId('5e30378e96729abc101e3997'), 'name': 'F.R.I.E.N.D.S', 'year': 1995}

Deleting Documents

And finally, let's write a function for deleting documents:

# Imports and previous code truncated for brevity

def delete_document(collection, query):
    """ Function to delete a single document from a collection.
    """
    collection.delete_one(query)

Since we're using the delete_one method, only one document can be deleted per call, even if the query matches multiple documents.

Now, let's use the function to delete an entry:

delete_document(series_collection, {'_id': id_})

If we try retrieving that same document:

result = find_document(series_collection, {'_id': id_})
print(result)

We're greeted with the expected result:

None

Next Steps

We have highlighted and used a few of PyMongo's methods to interact with our MongoDB server from a Python script. However, we have not utilized all the methods available to us through the module.

All the available methods can be found in the official PyMongo documentation and are classified according to the submodules.

We've written a simple script that performs rudimentary CRUD functionality on a MongoDB database. While we could import the functions in a more complex codebase, or into a Flask/Django application for example, these frameworks have libraries to achieve the same results already. These libraries make it easier, more conventient, and help us connect more securely to MongoDB.

For example, with Django we can use libraries such as Django MongoDB Engine and Djongo, while Flask has Flask-PyMongo that helps bridge the gap between Flask and PyMongo to facilitate seamless connectivity to a MongoDB database.

Conclusion

MongoDB is a document store and falls under the category of non-relational databases (NoSQL). It has certain advantages compared to relational databases, as well as some disadvantages.

While it is not suitable for all situations, we can still use MongoDB to store data and manipulate the data from our Python applications using PyMongo among other libraries - allowing us to harness the power of MongoDB in situations where it is best suited.

It is therefore up to us to carefully examine our requirements before making the decision to use MongoDB to store data.

The script we have written in this post can be found on GitHub.

February 18, 2020 01:15 PM UTC


Chris Moffitt

Python Tools for Record Linking and Fuzzy Matching

Introduction

Record linking and fuzzy matching are terms used to describe the process of joining two data sets together that do not have a common unique identifier. Examples include trying to join files based on people’s names or merging data that only have organization’s name and address.

This problem is a common business challenge and difficult to solve in a systematic way - especially when the data sets are large. A naive approach using Excel and vlookup statements can work but requires a lot of human intervention. Fortunately, python provides two libraries that are useful for these types of problems and can support complex matching algorithms with a relatively simple API.

The first one is called fuzzymatcher and provides a simple interface to link two pandas DataFrames together using probabilistic record linkage. The second option is the appropriately named Python Record Linkage Toolkit which provides a robust set of tools to automate record linkage and perform data deduplication.

This article will discuss how to use these two tools to match two different data sets based on name and address information. In addition, the techniques used to do matching can be applied to data deduplication and will be briefly discussed.

The problem

Anyone that has tried to merge disparate data sets together has likely run across some variation of this challenge. In the simple example below, we have a customer record in our system and need to determine the data matches - without the use of a common identifier.

Simple manual lookup

With a small sample set and our intuition, it looks like account 18763 is the same as account number A1278. We know that Brothers and Bro as well as Lane and LN are equivalent so this process is relatively easy for a person. However, trying to program logic to handle this is a challenge.

In my experience, most people start using excel to vlookup the various components of the address and try to find the best match based on the state, street number or zip code. In some cases, this can work. However there are more sophisticated ways to perform string comparisons that we might want to use. For example, I wrote briefly about a package called fuzzy wuzzy several years ago.

The challenge is that these algorithms (e.g. Levenshtein, Damerau-Levenshtein, Jaro-Winkler, q-gram, cosine) are computationally intensive. Trying to do a lot of matching on large data sets is not scaleable.

If you are interested in more mathematical details on these concepts, wikipedia is a good place to start and this article contains much more additional detail. Finally, this blog post discusses some of the string matching approaches in more detail.

Fortunately there are python tools that can help us implement these methods and solve some of these challenging problems.

The data

For this article, we will be using US hospital data. I chose this data set because hospital data has some unique qualities that make it challenging to match:

  • Many hospitals have similar names across different cities (Saint Lukes, Saint Mary, Community Hospital)
  • In urban areas, hospitals can occupy several city blocks so addresses can be ambiguous
  • Hospitals tend to have many clinics and other associated and related facilities nearby
  • Hospitals also get acquired and name changes are common - making this process even more difficult
  • Finally, there are a thousands of medical facilities in the US so the problem is challenging to scale

In these examples, I have two data sets. The first is an internal data set that contains basic hospital account number, name and ownership information.

Hospital account data

The second data set contains hospital information (called provider) as well as the number of discharges and Medicare payment for a specific Heart Failure procedure.

Hospital account data

The full data sets are available from Medicare.gov and CMS.gov and the simplified and cleaned version are available on github.

The business scenario is that we want to match up the hospital reimbursement information with our internal account data so we have more information to analyze our hospital customers. In this instance we have 5339 hospital accounts and 2697 hospitals with reimbursement information. Unfortunately we do not have a common ID to join on so we will see if we can use these python tools to merge the data together based on a combination of name and address information.

Approach 1 - fuzzymatcher

For the first approach, we will try using fuzzymatcher. This package leverages sqlite’s full text search capability to try to match records in two different DataFrames.

To install fuzzy matcher, I found it easier to conda install the dependencies (pandas, metaphone, fuzzywuzzy) then use pip to install fuzzymatcher. Given the computational burden of these algorithms you will want to use the compiled c components as much as possible and conda made that easiest for me.

If you wish to follow along, this notebook contains a summary of all the code.

After everything is setup, let’s import and get the data into our DataFrames:

import pandas as pd
from pathlib import Path
import fuzzymatcher
hospital_accounts = pd.read_csv('hospital_account_info.csv')
hospital_reimbursement = pd.read_csv('hospital_reimbursement.csv')

Here is the hospital account information:

Hospital account data

Here is the reimbursement information:

Hospital account data

Since the columns have different names, we need to define which columns to match for the left and right DataFrames. In this case, our hospital account information will be the left DataFrame and the reimbursement info will be the right.

left_on = ["Facility Name", "Address", "City", "State"]

right_on = [
    "Provider Name", "Provider Street Address", "Provider City",
    "Provider State"
]

Now we let fuzzymatcher try to figure out the matches using fuzzy_left_join :

matched_results = fuzzymatcher.fuzzy_left_join(hospital_accounts,
                                            hospital_reimbursement,
                                            left_on,
                                            right_on,
                                            left_id_col='Account_Num',
                                            right_id_col='Provider_Num')

Behind the scenes, fuzzymatcher determines the best match for each combination. For this data set we are analyzing over 14 million combinations. On my laptop, this takes about 2 min and 11 seconds to run.

The matched_results DataFrame contains all the data linked together as well as as best_match_score which shows the quality of the link.

Here’s a subset of the columns rearranged in a more readable format for the top 5 best matches:

cols = [
    "best_match_score", "Facility Name", "Provider Name", "Address", "Provider Street Address",
    "Provider City", "City", "Provider State", "State"
]

matched_results[cols].sort_values(by=['best_match_score'], ascending=False).head(5)
Matched information

The first item has a match score of 3.09 and certainly looks like a clean match. You can see that the Facility Name and Provider Name for the Mayo Clinic in Red Wing has a slight difference but we were still able to get a good match.

We can check on the opposite end of the spectrum to see where the matches don’t look as good:

matched_results[cols].sort_values(by=['best_match_score'], ascending=True).head(5)

Which shows some poor scores as well as obvious mismatches:

Bad matches

This example highlights that part of the issue is that one set of data includes data from Puerto Rico and the other does not. This discrepancy highlights the need to make sure you really understand your data and what cleaning and filtering you may need to do before trying to match.

We’ve looked at the extreme cases, let’s take a look at some of the matches that might be a little more challenging by looking at scores < 80:

matched_results[cols].query("best_match_score <= .80").sort_values(
    by=['best_match_score'], ascending=False).head(5)
Partial Matches

This example shows how some of the matches get a little more ambiguous. For example, is ADVENTIST HEALTH UKIAH VALLEY the same as UKIAH VALLEY MEDICAL CENTER? Depending on your data set and your needs, you will need to find the right balance of automated and manual match review.

Overall, fuzzymatcher is a useful tool to have for medium sized data sets. As you start to get to 10,000’s of rows, it will take a lot of time to compute, so plan accordingly. However the ease of use - especially when working with pandas makes it a great first place to start.

Approach 2 - Python Record Linkage Toolkit

The Python Record Linkage Toolkit provides another robust set of tools for linking data records and identifying duplicate records in your data.

The Python Record Linkage Toolkit has several additional capabilities:

  • Ability to define the types of matches for each column based on the column data types
  • Use “blocks” to limit the pool of potential matches
  • Provides ranking of the matches using a scoring algorithm
  • Multiple algorithms for measuring string similarity
  • Supervised and unsupervised learning approaches
  • Multiple data cleaning methods

The trade-off is that it is a little more complicated to wrangle the results in order to do further validation. However, the steps are relatively standard pandas commands so do not let that intimidate you.

For this example, make sure you install the library using pip . We will use the same data set but we will read in the data with an explicit index column. This makes subsequent data joins a little easier to interpret.

import pandas as pd
import recordlinkage

hospital_accounts = pd.read_csv('hospital_account_info.csv', index_col='Account_Num')
hospital_reimbursement = pd.read_csv('hospital_reimbursement.csv', index_col='Provider_Num')

Because the Record Linkage Toolkit has more configuration options, we need to perform a couple of steps to define the linkage rules. The first step is to create a indexer object:

indexer = recordlinkage.Index()
indexer.full()
WARNING:recordlinkage:indexing - performance warning - A full index can result in large number of record pairs.

This WARNING points us to a difference between the record linkage library and fuzzymatcher. With record linkage, we have some flexibility to influence how many pairs are evaluated. By using full indexer all potential pairs are evaluated (which we know is over 14M pairs). I will come back to some of the other options in a moment. Let’s continue with the full index and see how it performs.

The next step is to build up all the potential candidates to check:

candidates = indexer.index(hospital_accounts, hospital_reimbursement)
print(len(candidates))
14399283

This quick check just confirmed the total number of comparisons.

Now that we have defined the left and right data sets and all the candidates, we can define how we want to perform the comparison logic using Compare()

compare = recordlinkage.Compare()
compare.exact('City', 'Provider City', label='City')
compare.string('Facility Name',
            'Provider Name',
            threshold=0.85,
            label='Hosp_Name')
compare.string('Address',
            'Provider Street Address',
            method='jarowinkler',
            threshold=0.85,
            label='Hosp_Address')
features = compare.compute(candidates, hospital_accounts,
                        hospital_reimbursement)

We can define several options for how we want to compare the columns of data. In this specific example, we look for an exact match on the city. I have also shown some examples of string comparison along with the threshold and algorithm to use for comparison. In addition to these options, you can define your own or use numeric, dates and geographic coordinates. Refer to the documentation for more examples.

The final step is to perform all the feature comparisons using compute . In this example, using the full index, this takes 3 min and 41 s.

Let’s go back and look at alternatives to speed this up. One key concept is that we can use blocking to limit the number of comparisons. For instance, we know that it is very likely that we only want to compare hospitals that are in the same state. We can use this knowledge to setup a block on the state columns:

indexer = recordlinkage.Index()
indexer.block(left_on='State', right_on='Provider State')
candidates = indexer.index(hospital_accounts, hospital_reimbursement)
print(len(candidates))
475830

With the block on state, the candidates will be filtered to only include those where the state values are the same. We have filtered down the candidates to only 475,830. If we run the same comparison code, it only takes 7 seconds. A nice speedup!

In this data set, the state data is clean but if it were a little messier, we could use another the blocking algorithm like SortedNeighborhood to add some flexibility for minor spelling mistakes.

For instance, what if the state names contained “Tenessee” and “Tennessee”? Using blocking would fail but sorted neighborhood would handle this situation more gracefully.

indexer = recordlinkage.Index()
indexer.sortedneighbourhood(left_on='State', right_on='Provider State')
candidates = indexer.index(hospital_accounts, hospital_reimbursement)
print(len(candidates))
998860

In this case, sorted neighbors takes 15.9 seconds on 998,860 candidates which seems like a reasonable trade-off.

Regardless of which option you use, the result is a features DataFrame that looks like this:

Feature feature_matrix

This DataFrame shows the results of all of the comparisons. There is one row for each row in the account and reimbursement DataFrames. The columns correspond to the comparisons we defined. A 1 is a match and 0 is not.

Given the large number of records with no matches, it is a little hard to see how many matches we might have. We can sum up the individual scores to see about the quality of the matches.

features.sum(axis=1).value_counts().sort_index(ascending=False)
3.0      2285
2.0       451
1.0      7937
0.0    988187
dtype: int6

Now we know that there are 988,187 rows with no matching values whatsoever. 7937 rows have at least one match, 451 have 2 and 2285 have 3 matches.

To make the rest of the analysis easier, let’s get all the records with 2 or 3 matches and add a total score:

potential_matches = features[features.sum(axis=1) > 1].reset_index()
potential_matches['Score'] = potential_matches.loc[:, 'City':'Hosp_Address'].sum(axis=1)
Match scoring

Here is how to interpret the table. For the first row, Account_Num 26270 and Provider_Num 868740 match on city, hospital name and hospital address.

Let’s look at these two and see how close they are:

hospital_accounts.loc[26270,:]
Facility Name         SCOTTSDALE OSBORN MEDICAL CENTER
Address                          7400 EAST OSBORN ROAD
City                                        SCOTTSDALE
State                                               AZ
ZIP Code                                         85251
County Name                                   MARICOPA
Phone Number                            (480) 882-4004
Hospital Type                     Acute Care Hospitals
Hospital Ownership                         Proprietary
Name: 26270, dtype: object
hospital_reimbursement.loc[868740,:]
Provider Name                SCOTTSDALE OSBORN MEDICAL CENTER
Provider Street Address                 7400 EAST OSBORN ROAD
Provider City                                      SCOTTSDALE
Provider State                                             AZ
Provider Zip Code                                       85251
Total Discharges                                           62
Average Covered Charges                               39572.2
Average Total Payments                                6551.47
Average Medicare Payments                             5451.89
Name: 868740, dtype: object

Yep. Those look like good matches.

Now that we know the matches, we need to wrangle the data to make it easier to review all the data together. I am going to make a concatenated name and address lookup for each of these source DataFrames.

hospital_accounts['Acct_Name_Lookup'] = hospital_accounts[[
    'Facility Name', 'Address', 'City', 'State'
]].apply(lambda x: '_'.join(x), axis=1)

hospital_reimbursement['Reimbursement_Name_Lookup'] = hospital_reimbursement[[
    'Provider Name', 'Provider Street Address', 'Provider City',
    'Provider State'
]].apply(lambda x: '_'.join(x), axis=1)

account_lookup = hospital_accounts[['Acct_Name_Lookup']].reset_index()
reimbursement_lookup = hospital_reimbursement[['Reimbursement_Name_Lookup']].reset_index()

Now merge in with the account data:

account_merge = potential_matches.merge(account_lookup, how='left')
Account merge

Finally, merge in the reimbursement data:

final_merge = account_merge.merge(reimbursement_lookup, how='left')

Let’s see what the final data looks like:

cols = ['Account_Num', 'Provider_Num', 'Score',
        'Acct_Name_Lookup', 'Reimbursement_Name_Lookup']
final_merge[cols].sort_values(by=['Account_Num', 'Score'], ascending=False)
Final account lookup

One of the differences between the toolkit approach and fuzzymatcher is that we are including multiple matches. For instance, account number 32725 could match two providers:

final_merge[final_merge['Account_Num']==32725][cols]
Account num 32725 matches

In this case, someone will need to investigate and figure out which match is the best. Fortunately it is easy to save all the data to Excel and do more analysis:

final_merge.sort_values(by=['Account_Num', 'Score'],
                    ascending=False).to_excel('merge_list.xlsx',
                                              index=False)

As you can see from this example, the Record Linkage Toolkit allows a lot more flexibility and customization than fuzzymatcher. The downside is that there is a little more manipulation to get the data stitched back together in order to hand the data over to a person to complete the comparison.

Deduplicating data with Record Linkage Toolkit

Yo Dawg

One of the additional uses of the Record Linkage Toolkit is for finding duplicate records in a data set. The process is very similar to matching except you pass match a single DataFrame against itself.

Let’s walk through an example using a similar data set:

hospital_dupes = pd.read_csv('hospital_account_dupes.csv', index_col='Account_Num')

Then create our indexer with a sorted neighbor block on State .

dupe_indexer = recordlinkage.Index()
dupe_indexer.sortedneighbourhood(left_on='State')
dupe_candidate_links = dupe_indexer.index(hospital_dupes)

We should check for duplicates based on city, name and address:

compare_dupes = recordlinkage.Compare()
compare_dupes.string('City', 'City', threshold=0.85, label='City')
compare_dupes.string('Phone Number',
                    'Phone Number',
                    threshold=0.85,
                    label='Phone_Num')
compare_dupes.string('Facility Name',
                    'Facility Name',
                    threshold=0.80,
                    label='Hosp_Name')
compare_dupes.string('Address',
                    'Address',
                    threshold=0.85,
                    label='Hosp_Address')
dupe_features = compare_dupes.compute(dupe_candidate_links, hospital_dupes)

Because we are only comparing with a single DataFrame, the resulting DataFrame has an Account_Num_1 and Account_Num_2 :

Dupe Detect

Here is how we score:

dupe_features.sum(axis=1).value_counts().sort_index(ascending=False)
3.0         7
2.0       206
1.0      7859
0.0    973205
dtype: int64

Add the score column:

potential_dupes = dupe_features[dupe_features.sum(axis=1) > 1].reset_index()
potential_dupes['Score'] = potential_dupes.loc[:, 'City':'Hosp_Address'].sum(axis=1)

Here’s a sample:

High likelihood of dupes

These 9 records have a high likelihood of being duplicated. Let’s look at an example to see if they might be dupes:

hospital_dupes.loc[51567, :]
Facility Name                SAINT VINCENT HOSPITAL
Address                      835 SOUTH VAN BUREN ST
City                                      GREEN BAY
State                                            WI
ZIP Code                                      54301
County Name                                   BROWN
Phone Number                         (920) 433-0112
Hospital Type                  Acute Care Hospitals
Hospital Ownership    Voluntary non-profit - Church
Name: 51567, dtype: object
hospital_dupes.loc[41166, :]
Facility Name                   ST VINCENT HOSPITAL
Address                          835 S VAN BUREN ST
City                                      GREEN BAY
State                                            WI
ZIP Code                                      54301
County Name                                   BROWN
Phone Number                         (920) 433-0111
Hospital Type                  Acute Care Hospitals
Hospital Ownership    Voluntary non-profit - Church
Name: 41166, dtype: object

Yes. That looks like a potential duplicate. The name and address are similar and the phone number is off by one digit. How many hospitals do they really need to treat all those Packer fans? :)

As you can see, this method can be a powerful and relatively easy tool to inspect your data and check for duplicate records.

Advanced Usage

In addition to the matching approaches shown here, the Record Linkage Toolkit contains several machine learning approaches to matching records. I encourage interested readers to review the documentation for examples.

One of the pretty handy capabilities is that there is a browser based tool that you can use to generate record pairs for the machine learning algorithms.

Both tools include some capability for pre-processing the data to make the matching more reliable. Here is the preprocessing content in the Record Linkage Toolkit. This example data was pretty clean so you will likely need to explore some of these capabilities for your own data.

Summary

Linking different record sets on text fields like names and addresses is a common but challenging data problem. The python ecosystem contains two useful libraries that can take data sets and use multiple algorithms to try to match them together.

Fuzzymatcher uses sqlite’s full text search to simply match two pandas DataFrames together using probabilistic record linkage. If you have a larger data set or need to use more complex matching logic, then the Python Record Linkage Toolkit is a very powerful set of tools for joining data and removing duplicates.

Part of my motivation for writing this long article is that there are lots of commercial options out there for these problems and I wanted to raise awareness about these python options. Before you engage with an expensive consultant or try to pay for solution, you should spend an afternoon with these two options and see if it helps you out. All of the relevant code examples to get you started are in this notebook.

I always like to hear if you find these topics useful and applicable to your own needs. Feel free to comment below and let me know if you use these or any other similar tools.

credits: Title image - Un compositeur à sa casse

February 18, 2020 01:12 PM UTC


Python Insider

Python 3.8.2rc2 is now available for testing

Python 3.8.2rc2 is the second release candidate of the second maintenance release of Python 3.8. Go get it here:

https://www.python.org/downloads/release/python-382rc2/


Why a second release candidate?

The major reason for RC2 is that GH-16839 has been reverted.

The original change was supposed to fix for some edge cases in urlparse (numeric paths, recognizing netlocs without //; details in BPO-27657). Unfortunately it broke third parties relying on the pre-existing undefined behavior.

Sadly, the reverted fix has already been released as part of 3.8.1 (and 3.7.6 where it’s also reverted now). As such, even though the revert is itself a bug fix, it is incompatible with the behavior of 3.8.1.

Please test.

Timeline

Assuming no critical problems are found prior to 2020-02-24, the currently scheduled release date for  3.8.2 (as well as 3.9.0 alpha 4!), no code changes are planned between this release candidate and the final release.

That being said, please keep in mind that this is a pre-release of 3.8.2 and as such its main purpose is testing.

Maintenance releases for the 3.8 series will continue at regular bi-monthly intervals, with 3.8.3 planned for April 2020 (during sprints at PyCon US).

What’s new?

The Python 3.8 series is the newest feature release of the Python language, and it contains many new features and optimizations. See the “What’s New in Python 3.8” document for more information about features included in the 3.8 series.

Detailed information about all changes made in version 3.8.2 specifically can be found in its change log.

We hope you enjoy Python 3.8!

Thanks to all of the many volunteers who help make Python Development and these releases possible! Please consider supporting our efforts by volunteering yourself or through organization contributions to the Python Software Foundation.

https://www.python.org/psf/

February 18, 2020 05:13 AM UTC


Anwesha Das

The scary digital world

The horror

Some years ago, my husband and I were looking for houses to rent. We both were in different cities and were having a telephone conversation. We had three or four phone calls to discuss this. After that, I opened my laptop and turned on my then browser, Google. Advertisements started popping up. Showing the adds of houses for rent at the very same location, the same budget I was looking for. A chill went down my bone. How did this particular website knows that we are looking for a house?

The internet

The internet was designed to give a home to the mind. It is the place for genuine independence and liberty. To create a new global social space exclusive of any authority, Government, sovereignty, and the “weary giants of flesh and steel,” the industrial area. Anonymity was there in the very ethos of the internet. It offered the opportunity to the users not to be discriminated against on religious, economic, and/or social background. It provided the platform to people to be themselves. And the Right to Privacy lies in the very core to the self-being of people. It was our chance to leave the world's nastiness, selfishness, and be an open and equal world. It was our chance to be better.

The last decade has seen a surge in the usage of the internet. The growth can most prominently be observed in the area of social media. The smartphones, smartwatch, every other smart device aided to that. There is a substantive number of people for whom using the internet is similar to using Facebook. There is a parallel universe built around Facebook and What's app. And every day, each second it is growing. According to the survey made by brandwatch.com Facebook adds 500,000 new users every day, six new profiles every second.

This Mammuthus growth of social media and our dependency on/over
the internet has blurred the line of individual privacy. What considered to be private once is now in the public domain. Be it our first date, our breakups, dinner plans, childbirth list goes on. This does not end here. Our behavior is also under watch.

Different types of Online activities

We give reactions to different situations, people, promote, support, reject issues on social media. We have conversations about what food we like to eat, where do we want to go shopping, or when we are getting married. It is the information-sharing aspect of the internet. The other principal usage of the internet is gaining knowledge. We browse to ask random things. Like - What is the total area of earth? What is global warming? What is the best brand of lingerie suitable for thin women?
Where in the first set of information (the activity on social media), we are sharing with our full consent and knowledge. For the second of information which we presume it to be private, between us, the browser and the website we are getting information from. But in reality, it is not.

Tracking and it’s kind

First party tracking

There are certain rights we waive, the information we give to avail a service over the internet. Like the name, contact details for Facebook, Instagram. Signing up for Instagram means voluntarily agreeing to their “Terms, Data Policy, and Cookies Policy.” It is precisely like an agreement in the real world — the primary distinction between the two lies in our approach. In case of an actual world agreement, we make sure we read it. But do rarely care about understanding what we are signing for a while signing something in the digital world. Here what we only care about is the service and nothing else. Signing up for the service then means letting the service provider (in this case, Instagram) insert cookies in our browser, get data from us. Also, these services, Facebook knows who our friends are? And what we “like” and how much?
Similarly, Amazon, Flipkart know - What we want to buy? When are we buying? This is called First Party Tracking, of which we are fully aware of and have agreed to.

Third-party tracking

There is another kind of Tracking. It happens behind our back, without our knowledge and consent, Third Party Tracking. These third-party trackers are there in almost all mobile apps, web pages, everywhere we go in online. Any regular mobile app collects and shares our private data, as sensitive as call records and location data with at least dozens third party companies. By third party company, we mean some other company than the service provider (in this case, the company making the mobile application). An average web page does the same thing to the user. The Physical world is also not spared by these trackers. Whenever we connect into the WiFi network into some coffee shop, hotel, restaurant, the service provider ( the coffee shop) can monitor our activity online. Also, they (coffee shop, hotel) use Bluetooth and WiFi beacons for passive monitoring of people in that locality.

Who performs these third party tracking?

The data brokers, advertisers, and tech companies are the ones who are tracking us behind our backs. A research paper published by EFF describes the situation aptly “ Corporations have built a hall of one-way mirrors: from the inside, you can see only apps, web pages, ads, and yourself reflected by social media. But in the shadows behind the glass, trackers quietly take notes on nearly everything you do. These trackers are not omniscient, but they are widespread and indiscriminate.” To know about the deep]down technical part of t)hird party tracking, go through the paper published by EFF. The data the trackers collect may be benign, but together with the public information, they tend to reveal a lot. Like if someone is political or not, ambitious or not, if you like to safe or prone to take risks. We, our lives, are being sold, and we are nothing but an accumulation of data for them.

Therefore with us feeding information and the third party tracking together subjects us to constant surveillance. Our profiles are being created based on these data. This profile makes

There is an invisible but unavoidable panopticon around us — a nearly unbreakable chain.

Why would someone want to track me? I have nothing to hide.

This is the general response we get when we initiate the discussion of and about privacy. To which Glen Greenworld has a great reply, ‘if you do not have to hide anything, please write down all your email ids, not just the work ones, the respectable ones but all, along with the passwords to me.’ Though people have nothing to hide no one has ever got back to him :)

Everyone needs privacy. We flourish our being and can be true to ourselves when we do not have the fear and knowledge of being watched by someone. Everyone cares about privacy. If they did not have, there would be no password on their accounts, no locker, no keys.

The evolution of physical self to digital self and protecting that

Now with us entering into the digital world, we are being measured by bits of data. Our data is an extension of our Physical being. The information forms a core part of a person. We are familiar with the norms of the Physical world. But the digital world is still a maze, where we are trying to find a way to safety, success, and survival. From this very blog post, I am starting a new series on saving ourselves in the digital world. This series is meant for beginners and newbies. In the coming posts, I will be dealing with

Till we meet the next time, stay safe.

February 18, 2020 01:51 AM UTC


Wing Tips

Using "python -m" in Wing 7.2

Wing version 7.2 has been released, and the next couple Wing Tips look at some of its new features. We've already looked at reformatting with Black and YAPF and Wing 7.2's expanded support for virtualenv.

Now let's look at how to set up debugging modules that need to be launched with python -m. This command line option for Python allows searching the Python Path for the name of a module or package, and then loading and executing it. This capability was introduced way back in Python 2.4, and then extended in Python 2.5 through PEP 338 . However, it only came into widespread use relatively recently, for example to launch venv, black, or other command line tools that are shipped as Python packages.

Launching Modules

To configure Wing to launch a module by name with python -m, create a Named Entry Point from the Debug menu, select Named Module, and enter the module or package name and any run arguments:

/images/blog/python-m/named-entry-point-module.png

The above is equivalent to this command line:

python -m mymodule one two

The named entry point can be set as the main entry point for your project under the Debug/Execute tab of Project Properties, from the Project menu:

/images/blog/python-m/main-entry-point.png

Or it can be launched from the Debug > Debug Named Entry Point menu or by assigning a key binding to it in the named entry point manager dialog.

Launching Packages

Packages can also be launched in this way, if they include a file named __main__.py to define the package's main entry point:

/images/blog/python-m/named-entry-point.png

Setting Python Path

Whether launching a module or package, the name has to be found on the Python Path that you've configured for your project. If Wing fails to find the module, add its parent directory to Python Path under the Environment tab in Project Properties:

/images/blog/python-m/python-path.png

That's it for now! We'll be back soon with more Wing Tips for Wing Python IDE.

As always, please don't hesitate to email support@wingware.com if you run into problems or have any questions.

February 18, 2020 01:00 AM UTC


Michał Bultrowicz

Universal app reload with entr

A useful feature many web frameworks have is auto-reload. Your app is running in the background, you change the code, and the app is restarted with those changes, so you can try them out immediately. What if you wanted that behavior for everything that you’re writing? And without any coding to implement it over and over in every little project?

February 18, 2020 12:00 AM UTC

February 17, 2020


PyBites

Productivity Mondays - 5 tips that will boost your performance

The following things are relatively easy to do, but also easy not to do. Do them consistently and they can change your career and life.

1. Follow up

How many interactions die after the first meeting? Not if you follow up.

It shows you're interested (to be interesting, be interested - Carnegie), it keeps the momentum going, and it creates ongoing opportunities.

Keep nurturing your network, you never know where the next opportunity will come from.

2. Audit your time

How often you feel exhausted at the end of the day asking "where did my time go?". Like money and calories, what gets measured gets managed (Drucker).

Be in control of your time, or somebody else inevitably will!

3. Control your mood

Willpower and positive energy are finite. Start your day early using an empowering ritual.

For me that is steps and listening to an inspiring podcast or audiobook.

It sets the tone and confidence of the day. Improving our morning routine has been a game changer for us this year.

4. Use the right tool

Mobiles are great but highly interruptive. Avoid email (social) first thing in the morning.

Flight mode is not only for airplanes, it can be your new best friend in the early morning, especially when you have to eat that ugly frog :)

Talking about the right tool for the job: calls can be great. Sometimes email is just not the right tool. It starts clean but people get cc'd and some people go off on a tangent resulting in long, unfocused email chains.

Break the pattern: host a 15 min call with a clear agenda and come out with follow up actions. Win/win: not only will you save a lot of time, people will see you as a leader.

5. Just ask

There are no dumb questions! (Unless you did not do your homework of course.)

There is nothing more frustrating than being stuck on a problem for hours while somebody else can guide you in the right direction in minutes.

Don't let self imposed ceilings hold you back from asking for help.

You are not bothering the other person, you actually give him/her an opportunity to feel great by helping you!

Another reason to be assertive is to stay focused on your longer term goal. Without speaking up, your manager/team/audience does not know where you can be(come) more valuable and therefore risk getting stuck in a rut.


I hope this gives you some healthy inspiration to start off your week.

Now go crush it this week and comment below which of these tips boosted your productivity/ motivation/ moved you closer to your goal. See you next week.

-- Bob

With so many avenues to pursue in Python it can be tough to know what to do. If you're looking for some direction or want to take your Python code and career to the next level, schedule a call with us now. We can help you!

February 17, 2020 10:00 PM UTC


Roberto Alsina

Learning Serverless in GCP

Usually, when I want to learn how to use a tool, the thing that works best for me is to try to build something using it. Watching someone build something instead is the second best thing.

So, join me while I build a little thing using "serverless" Google Cloud Platform, Python and some other bits and pieces.


Caveat: this was originally a twitter thread, so there will be typos and things. Sorry! Also it's possible that it will look better here in threaderapp


Read more… (7 min remaining to read)

February 17, 2020 07:20 PM UTC


Mike Driscoll

Python 101 2nd Edition Kickstarter is Live!

I am excited to announce that my newest book, Python 101, 2nd Edition is launching on Kickstarter today!

Python 101 2nd Ed KickstarterClick the Photo to Jump to Kickstarter

Python 101 holds a special place in my heart as it was the very first book I ever wrote. Frankly, I don’t think I would have even written a book if it weren’t for the readers of this blog who encouraged me to do so.

The new edition of Python 101 will be an entirely new, rewritten from scratch, book. While I will be covering most of the same things as in the original, I have reorganized the book a lot and I am adding all new content. I have also removed old content that is no longer relevant.

I hope you will join me by backing the book and giving me feedback as I write it so that I can put together a really great learning resource for you!

The post Python 101 2nd Edition Kickstarter is Live! appeared first on The Mouse Vs. The Python.

February 17, 2020 02:01 PM UTC