skip to navigation
skip to content

Planet Python

Last update: July 08, 2020 04:46 AM UTC

July 07, 2020


Quansight Labs Blog

Writing docs is not just writing docs

I joined the Spyder team almost two years ago, and I never thought I was going to end up working on docs. Six months ago I started a project with CAM Gerlach and Carlos Cordoba to improve Spyder’s documentation. At first, I didn’t actually understand how important docs are for software, especially for open source projects. However, during all this time I’ve learned how documentation has a huge impact on the open-source community and I’ve been thankful to have been able to do this. But, from the beginning, I asked myself “why am I the ‘right person’ for this?”

Read more… (3 min remaining to read)

July 07, 2020 10:00 PM UTC


PyCoder’s Weekly

Issue #428 (July 7, 2020)

#428 – JULY 7, 2020
View in Browser »

The PyCoder’s Weekly Logo


Announcing Pylance: Fast, Feature-Rich Language Support for Python in Visual Studio Code

Pylance is a new Python language server for VS Code based on Microsoft’s Pyright static type checking tool. With Pylance, you get type information in function signatures and when hovering on symbols, auto import suggestions, type checking diagnostics, and so much more!
SAVANNAH OSTROWSKI

Python Async Frameworks: Beyond Developer Tribalism

In light of some recent and, at times, heated discussions regarding asynchronous programming in Python, Django Rest Framework’s creator Tom Christie calls on the community to embrace a more collaborative spirit.
TOM CHRISTIE

“Learn Python Programming” Humble Bundle

alt

Everything you need to learn Python programming and make it stick. Support Pythonic charities like the PSF and get Python books, software, and video courses collectively valued at $1,400 for a pay-what-you-want price →
HUMBLEBUNDLE.COM sponsor

A Contrarian View on Closing Files

You’ve may have heard that the “right” way to open a file in Python is to use the open() function inside of a with statement. But is that always the right choice?
ARIC COADY opinion

Flask Project Setup: TDD, Docker, Postgres and More

Learn one way to set-up a Flask project, including how to handle project requirements, configuration and environment variables, writing and running tests, and containerizing the application with Docker. When you’re done reading part one at the link above, check out part two.
LEONARDO GIORDANI

Get Started With Django Part 2: Django User Management

In this step-by-step tutorial, you’ll learn how to extend your Django application with a user management system, complete with email sending and third-party authentication.
REAL PYTHON

Deploying and Hosting a Machine Learning Model With FastAPI and Heroku

Getting machine learning models into production is an often-overlooked topic. Learn how to serve a model in just a handful of lines of Python using FastAPI and Heroku.
MICHAEL HERMAN • Shared by Michael Herman

2020 Python Software Foundation Board of Directors Election Retrospective and Next Steps

ERNEST W. DURBIN III

Python 3.9.0b4 Is Now Ready for Testing

CPYTHON DEV BLOG

Discussions

What Is the Core of the Python Programming Language?

Last week we featured Brett Cannon’s article with the same title. Well, the post has generated quite a discussion on Hacker News.
HACKER NEWS

Why Do NaN Values Make min() and max() Sensitive to Order?

How NaNs compare to numerical values and the implications of that in min() and max() might be surprising.
STACK OVERFLOW

Why Do People Use .format() When f-Strings Exist?

f-Strings aren’t exactly a drop-in replacement for .format().
REDDIT

Python Jobs

Python Tutorial Authors Wanted (Remote)

Real Python

Senior Software Engineer - Python (Remote)

RampUp, Inc

Software Engineer (Java, Python) (San Diego, CA, USA)

Professional Search Group (PSG)

Splunk with Python (Philadelphia, PA, USA)

Vastika Inc.

More Python Jobs >>>

Articles & Tutorials

A Labyrinth of Lies

What should you do after watching 1986’s puppet-laden musical fantasy Labyrinth? Code up the guard scene in Python, of course! After you read Moshe’s solution to the infamous “Two Door Riddle” at the link above, check out Glyph Lefkowitz‘s “professionalized” version of the code.
MOSHE ZADKA

Massive Memory Overhead: Numbers in Python and How NumPy Helps

In Python, everything is an object. Even numbers. While this has advantages, objects have a memory overhead that might be unexpected. While this overhead is often negligible, it might be the difference between 8GB and 35GB in extreme cases.
ITAMAR TURNER-TRAURING

[Career Track] Data Scientist With Python

alt

Are you just learning Python, super experienced, or somewhere in between? Get hands-on experience with some of the most popular Python libraries and work with real-world datasets to learn statistics, machine learning techniques, and more →
DATACAMP INC sponsor

Ten Reasons to Use StaticFrame Instead of Pandas

For those coming from pandas, StaticFrame offers a more consistent interface and reduces opportunities for error. This article demonstrates ten reasons you might use StaticFrame instead of Pandas.
CHRISTOPHER ARIZA • Shared by Christopher Ariza

How to Use the Python Filter Function

Python’s built-in filter() function can be used to create a new iterator from an existing iterable with certain elements removed based on some criterion.
KATHRYN HANCOX

Profile, Understand & Optimize Python Code Performance

You can’t improve what you can’t measure. Profile and understand Python code’s behavior and performance (Wall-time, I/O, CPU, HTTP requests, SQL queries). Get up and running in minutes. Browse through appealing graphs.
BLACKFIRE sponsor

Thinking in Pandas: Python Data Analysis the Right Way

Are you using the Python library Pandas the right way? Do you wonder about getting better performance, or how to optimize your data for analysis? What does normalization mean? This week Hannah Stepanek joins the podcast to discuss her new book “Thinking in Pandas”.
REAL PYTHON podcast

Object-Oriented Programming (OOP) in Python 3

In this freshly updated OOP tutorial, you’ll learn all about object-oriented programming in Python. You’ll learn the basics of the OOP paradigm and cover concepts like classes and inheritance.
REAL PYTHON

Tutorial: Add a Column to a Pandas DataFrame Based on an If-Else Condition

If you’re new to pandas, you might be tempted to add a column to a DataFrame based on a condition using an if statement. But there’s a better way!
CHARLIE CUSTER • Shared by Charlie Custer

Darts: Time Series Made Easy in Python

Darts is a new library from Unit8 that offers a single package for end-to-end machine learning on time series.
JULIEN HERZEN

Projects & Code

EasyOCR: Ready-To-Use OCR With 40+ Languages Supported Including Chinese, Japanese, Korean and Thai

GITHUB.COM/JAIDEDAI

python-keyboard: A Hand-Wired USB & BLE Keyboard Powered by Python

GITHUB.COM/MAKERDIARY

static-frame: Immutable Data Structures for One- And Two-Dimensional Calculations With Self-Aligning, Labelled Axes

GITHUB.COM/INVESTMENTSYSTEMS

isort: A Python Utility for Sorting Imports

GITHUB.COM/TIMOTHYCROSLEY

darts: A Python Library for Easy Manipulation and Forecasting of Time Series

GITHUB.COM/UNIT8CO

pygooglenews: If Google News Had a Python Library

GITHUB.COM/KOTARTEMIY

guietta: A Tool for Making Simple Python GUIs

GITHUB.COM/ALFIOPUGLISI

texthero: Text Preprocessing, Representation and Visualization From Zero to Hero

GITHUB.COM/JBESOMI

ether-automaton: Pretty Pixel Animations via the Game of Life

GITHUB.COM/ETHER-AUTOMATON • Shared by anfederico

strongtyping: Runtime Type Checking Decorator for Your Python Functions

GITHUB.COM/FELIXTHEC

django-pgtrigger: Postgres Triggers Integrated With Django Models

GITHUB.COM/JYVEAPP • Shared by Wes Kendall

Events

SciPy 2020 (Virtual Conference)

July 6 to July 13, 2020
SCIPY.ORG

PyMNTos (Virtual Meetup)

July 9, 2020
PYTHON.MN

Python Atlanta (Virtual Meetup)

July 9, 2020
MEETUP.COM


Happy Pythoning!
This was PyCoder’s Weekly Issue #428.
View in Browser »

alt

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

July 07, 2020 07:30 PM UTC


PSF GSoC students blogs

Week 6 Blog

Hello everyone!
So its the beginning of the sixth week. The first evaluation results are out and fortunately, I made it till here. :D
This week I implemented the query functions present in DetourNavMeshQuery class. There are primarily two query functions: findPath() and findStraightPath(). Both have their own uses. The implementation initially looked to be easy, but got real complex when I started coding. There were many other function calls I had to make in order to implement these two functions, and in process taking care of the variables involved.

Both the functions have their usecases. FindPath() returns the polygons present in the path corridor. Then, its over to us how we connect polygons to complete the path. FindStraightPath() directly returns array of vertices. Here I present an example for each path finding query:

 

The green line shows the path in both the above images. The first image makes the findPath() query while the second one makes the findStraightPath() query.
Though, on the basis of these two images, you might conclude that findStraightPath() is obviously better among the two, but trust me, you won't always want to use findStraightPath(). You will know when you use it. :P

Apart from this, I spent time in dividing the library into 2 separate libraries: navmeshgen and navigation. The navmeshgen library provides tool to generate the polygon mesh, detail polygon mesh and navmesh. The navigation library provides tool to further process the navmesh and make queries over it. Recast, NavMeshBuilder are included in navmeshgen while Detour, NavMesh, NavMeshNode, NavMeshQuery are included in navigation.

This week I plan to make BAM serialization possible and work on the reviews made by mentor @moguri over my PR at Github.

Will return next week with more exciting stuff. Till then, stay safe!

 

 

July 07, 2020 07:01 PM UTC

Weekly Check-in #6

<meta charset="utf-8">

What did I do this week?

Completed the chatbot example and added documentation for the same.

 

<meta charset="utf-8">

What's next?

I'll start working on the second phase of the project, implementing a distributed orchestrator.

<meta charset="utf-8">

Did I get stuck somewhere?

Yes, I had some doubts regarding the implementation of the distributed orchestrator. I had a meeting with my mentor today, and we have decided on an initial path.

July 07, 2020 04:40 PM UTC


Paolo Amoroso

Repl.it Redesigned the Mobile Experience

The cloud IDE Repl.it was redesigned to improve the user experience on mobile devices.

On smartphones, now the focused REPL pane takes up most of the screen. The redesign takes advantage of native mobile design patterns and lets you switch to a different pane from the bottom navigation bar. There are panes for the code editor, the console, and the output.

Python REPL in Repl.it on a Pixel 2 XL phone
A Python REPL in Repl.it on my Pixel 2 XL phone.

Tapping the code in the editor brings up a contextual menu with some options of the desktop version. You can select, search, or paste text, or open the full command palette.

On my Pixel 2 XL phone in Chrome, lines with up to 42 characters fit in the editor’s width. The editor wraps longer lines. But most of the code usually keeps the original indentation and its structure is still clear at a glance. The console pane wraps text, too, so no horizontal scrolling is required.

You can get an idea of what Repl.it looks like on mobile by opening the browser on your device and visiting a Python REPL I set up for testing the mobile interface.  It’s an instance of Repl.it’s Multi-Page-Flask-Template, a Flask app that generates pages based on the slug entered as input.

Repl.it is a multi-language development environment in the cloud. It supports dozens of programming languages and frameworks. It’s my favorite IDE as it works fully in the cloud. This is a killer feature for a Chrome OS enthusiast like me.

July 07, 2020 03:36 PM UTC


PSF GSoC students blogs

Week 5 Check-in

What did you do this week?

I continued the PR started in the previous week by adding more multimethods for array manipulation. The following multimethods were added:

Tiling arrays

Adding and removing elements

Rearranging elements

Most of the work was adding default implementations for the above multimethods which relied heavily on array slicing. Contrary to what I mentioned in my last blog post, not much time was dedicated to writing overriden_class for other backends so I will try to compensate in the next weeks.

What is coming up next?

As described in my proposal's timeline I'll be starting a new PR that adds multimethods for mathematical functions. This will be the focus of this week's work as some of these multimethods are also needed by sangyx, another GSoC student working on uarray. He is working on the udiff library that uses uarray and unumpy for automatic differentiation. To avoid getting in his way I will try to finish this PR as soon as possible.

Did you get stuck anywhere?

There were times when I didn't know exactly how to implement a specific default but this was easily overcome with the help of my mentors who would point me in the right direction. Asking for help sooner rather than later has proven to be invaluable. Looking back I think there were no major blocks this week.

July 07, 2020 03:33 PM UTC


Codementor

Python Flask Tutorial: How to Make a Basic Page (Source Code Included!) 📃👨‍💻

Boilerplate app for Python Flask. Example source code in Python for template flask.

July 07, 2020 03:28 PM UTC


Doug Hellmann

beagle 0.3.0

beagle is a command line tool for querying a hound code search service such as http://codesearch.openstack.org What’s new in 0.3.0? Add repo-pattern usages examples in the doc (contributed by Hervé Beraud) Add an option to filter repositories in search results Refresh python’s versions and their usages (contributed by Hervé Beraud)

July 07, 2020 02:12 PM UTC


Mike Driscoll

Python 101 – Debugging Your Code with pdb

Mistakes in your code are known as “bugs”. You will make mistakes. You will make many mistakes, and that’s totally fine. Most of the time, they will be simple mistakes such as typos. But since computers are very literal, even typos prevent your code from working as intended. So they need to be fixed. The process of fixing your mistakes in programming is known as debugging.

The Python programming language comes with its own built-in debugger called pdb. You can use pdb on the command line or import it as a module. The name, pdb, is short for “Python debugger”.

Here is a link to the full documentation for pdb:

In this article, you will familiarize yourself with the basics of using pdb. Specifically, you will learn the following:

While pdb is handy, most Python editors have debuggers with more features. You will find the debugger in PyCharm or WingIDE to have many more features, such as auto-complete, syntax highlighting, and a graphical call stack.

A call stack is what your debugger will use to keep track of function and method calls. When possible, you should use the debugger that is included with your Python IDE as it tends to be a little easier to understand.

However, there are times where you may not have your Python IDE, for example when you are debugging remotely on a server. It is those times when you will find pdb to be especially helpful.

Let’s get started!

Starting pdb in the REPL

The best way to start is to have some code that you want to run pdb on. Feel free to use your own code or a code example from another article on this blog.

Or you can create the following code in a file named debug_code.py:

# debug_code.py

def log(number):
    print(f'Processing {number}')
    print(f'Adding 2 to number: {number + 2}')
    

def looper(number):
    for i in range(number):
        log(i)
        
if __name__ == '__main__':
    looper(5)

There are several ways to start pdb and use it with your code. For this example, you will need to open up a terminal (or cmd.exe if you’re a Windows user). Then navigate to the folder that you saved your code to.

Now start Python in your terminal. This will give you the Python REPL where you can import your code and run the debugger, pdb. Here’s how:

>>> import debug_code
>>> import pdb
>>> pdb.run('debug_code.looper(5)')
> <string>(1)<module>()
(Pdb) continue
Processing 0
Adding 2 to number: 2
Processing 1
Adding 2 to number: 3
Processing 2
Adding 2 to number: 4
Processing 3
Adding 2 to number: 5
Processing 4
Adding 2 to number: 6

The first two lines of code import your code and pdb. To run pdb against your code, you need to use pdb.run() and tell it what to do. In this case, you pass in debug_code.looper(5) as a string. When you do this, the pdb module will transform the string into an actual function call of debug_code.looper(5).

The next line is prefixed with (Pdb). That means you are now in the debugger. Success!

To run your code in the debugger, type continue or c for short. This will run your code until one of the following happens:

In this case, there were no exceptions or breakpoints set, so the code worked perfectly and finished execution!

Starting pdb on the Command Line

An alternative way to start pdb is via the command line. The process for starting pdb in this manner is similar to the previous method. You still need to open up your terminal and navigate to the folder where you saved your code.

But instead of opening Python, you will run this command:

python -m pdb debug_code.py

When you run pdb this way, the output will be slightly different:

> /python101code/chapter26_debugging/debug_code.py(1)<module>()
-> def log(number):
(Pdb) continue
Processing 0
Adding 2 to number: 2
Processing 1
Adding 2 to number: 3
Processing 2
Adding 2 to number: 4
Processing 3
Adding 2 to number: 5
Processing 4
Adding 2 to number: 6
The program finished and will be restarted
> /python101code/chapter26_debugging/debug_code.py(1)<module>()
-> def log(number):
(Pdb) exit

The 3rd line of output above has the same (Pdb) prompt that you saw in the previous section. When you see that prompt, you know you are now running in the debugger. To start debugging, enter the continue command.

The code will run successfully as before, but then you will see a new message:

The program finished and will be restarted

The debugger finished running through all your code and then started again from the beginning! That is handy for running your code multiple times! If you do not wish to run through the code again, you can type exit to quit the debugger.

Stepping Through Code

Stepping through your code is when you use your debugger to run one line of code at a time. You can use pdb to step through your code by using the step command, or s for short.

Following is the first few lines of output that you will see if you step through your code with pdb:

$ python -m pdb debug_code.py 
> /python101code/chapter26_debugging/debug_code.py(3)<module>()
-> def log(number):
(Pdb) step
> /python101code/chapter26_debugging/debug_code.py(8)<module>()
-> def looper(number):
(Pdb) s
> /python101code/chapter26_debugging/debug_code.py(12)<module>()
-> if __name__ == '__main__':
(Pdb) s
> /python101code/chapter26_debugging/debug_code.py(13)<module>()
-> looper(5)
(Pdb)

The first command that you pass to pdb is step. Then you use s to step through the following two lines. You can see that both commands do exactly the same, since “s” is a shortcut or alias for “step”.

You can use the next (or n) command to continue execution until the next line within the function. If there is a function call within your function, next will step over it. What that means is that it will call the function, execute its contents, and then continue to the next line in the current function. This, in effect, steps over the function.

You can use step and next to navigate your code and run various pieces efficiently.

If you want to step into the looper() function, continue to use step. On the other hand, if you don’t want to run each line of code in the looper() function, then you can use next instead.

You should continue your session in pdb by calling step so that you step into looper():

(Pdb) s
--Call--
> /python101code/chapter26_debugging/debug_code.py(8)looper()
-> def looper(number):
(Pdb) args
number = 5

When you step into looper(), pdb will print out --Call-- to let you know that you called the function. Next you used the args command to print out all the current args in your namespace. In this case, looper() has one argument, number, which is displayed in the last line of output above. You can replace args with the shorter a.

The last command that you should know about is jump or j. You can use this command to jump to a specific line number in your code by typing jump followed by a space and then the line number that you wish to go to.

Now let’s learn how you can add a breakpoint!

Adding Breakpoints in pdb

A breakpoint is a location in your code where you want your debugger to stop so you can check on variable states. What this allows you to do is to inspect the callstack, which is a fancy term for all variables and function arguments that are currently in memory.

If you have PyCharm or WingIDE, then they will have a graphical way of letting you inspect the callstack. You will probably be able to mouse over the variables to see what they are set to currently. Or they may have a tool that lists out all the variables in a sidebar.

Let’s add a breakpoint to the last line in the looper() function which is line 10.

Here is your code again:

# debug_code.py

def log(number):
    print(f'Processing {number}')
    print(f'Adding 2 to number: {number + 2}')
    

def looper(number):
    for i in range(number):
        log(i)
        
if __name__ == '__main__':
    looper(5)

To set a breakpoint in the pdb debugger, you can use the break or b command followed by the line number you wish to break on:

$ python3.8 -m pdb debug_code.py 
> /python101code/chapter26_debugging/debug_code.py(3)<module>()
-> def log(number):
(Pdb) break 10
Breakpoint 1 at /python101code/chapter26_debugging/debug_code.py:10
(Pdb) continue
> /python101code/chapter26_debugging/debug_code.py(10)looper()
-> log(i)
(Pdb)

Now you can use the args command here to find out what the current arguments are set to. You can also print out the value of variables, such as the value of i, using the print (or p for short) command:

(Pdb) print(i)
0

Now let’s find out how to add a breakpoint to your code!

Creating a Breakpoint with set_trace()

The Python debugger allows you to import the pbd module and add a breakpoint to your code directly, like this:

# debug_code_with_settrace.py

def log(number):
    print(f'Processing {number}')
    print(f'Adding 2 to number: {number + 2}')


def looper(number):
    for i in range(number):
        import pdb; pdb.set_trace()
        log(i)

if __name__ == '__main__':
    looper(5)

Now when you run this code in your terminal, it will automatically launch into pdb when it reaches the set_trace() function call:

$ python3.8 debug_code_with_settrace.py 
> /python101code/chapter26_debugging/debug_code_with_settrace.py(12)looper()
-> log(i)
(Pdb)

This requires you to add a fair amount of extra code that you’ll need to remove later. You can also have issues if you forget to add the semi-colon between the import and the pdb.set_trace() call.

To make things easier, the Python core developers added breakpoint() which is the equivalent of writing import pdb; pdb.set_trace().

Let’s discover how to use that next!

Using the built-in breakpoint() Function

Starting in Python 3.7, the breakpoint() function has been added to the language to make debugging easier. You can read all about the change here:

Go ahead and update your code from the previous section to use breakpoint() instead:

# debug_code_with_breakpoint.py

def log(number):
    print(f'Processing {number}')
    print(f'Adding 2 to number: {number + 2}')


def looper(number):
    for i in range(number):
        breakpoint()
        log(i)

if __name__ == '__main__':
    looper(5)

Now when you run this in the terminal, Pdb will be launched exactly as before.

Another benefit of using breakpoint() is that many Python IDEs will recognize that function and automatically pause execution. This means you can use the IDE’s built-in debugger at that point to do your debugging. This is not the case if you use the older set_trace() method.

Getting Help

This chapter doesn’t cover all the commands that are available to you in pdb. So to learn more about how to use the debugger, you can use the help command within pdb. It will print out the following:

(Pdb) help

Documented commands (type help <topic>):
========================================
EOF    c          d        h         list      q        rv       undisplay
a      cl         debug    help      ll        quit     s        unt      
alias  clear      disable  ignore    longlist  r        source   until    
args   commands   display  interact  n         restart  step     up       
b      condition  down     j         next      return   tbreak   w        
break  cont       enable   jump      p         retval   u        whatis   
bt     continue   exit     l         pp        run      unalias  where    

Miscellaneous help topics:
==========================
exec  pdb

If you want to learn what a specific command does, you can type help followed by the command.

Here is an example:

(Pdb) help where
w(here)
        Print a stack trace, with the most recent frame at the bottom.
        An arrow indicates the "current frame", which determines the
        context of most commands.  'bt' is an alias for this command.

Go give it a try on your own!

Wrapping Up

Being able to debug your code successfully takes practice. It is great that Python provides you with a way to debug your code without installing anything else. You will find that using breakpoint() to enable breakpoints in your IDE is also quite handy.

In this article you learned about the following:

You should go and try to use what you have learned here in your own code. Adding intentional errors to your code and then running them through your debugger is a great way to learn how things work!

The post Python 101 – Debugging Your Code with pdb appeared first on The Mouse Vs. The Python.

July 07, 2020 02:00 PM UTC


Real Python

Pointers and Objects in Python

If you’ve ever worked with lower-level languages like C or C++, then you may have heard of pointers. Pointers are essentially variables that hold the memory address of another variable. They allow you to create great efficiency in parts of your code but can lead to various memory management bugs.

You’ll learn about Python’s object model and see why pointers in Python don’t really exist. For the cases where you need to mimic pointer behavior, you’ll learn ways to simulate pointers in Python without managing memory.

In this course, you’ll:


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

July 07, 2020 02:00 PM UTC


Erik Marsja

Adding New Columns to a Dataframe in Pandas (with Examples)

The post Adding New Columns to a Dataframe in Pandas (with Examples) appeared first on Erik Marsja.

In this Pandas tutorial, we are going to learn all there is about adding new columns to a dataframe. Here, we are going to use the same three methods that we used to add empty columns to a Pandas dataframe. Specifically, when adding columns to the dataframe we are going to use the following 3 methods:

  1. Simply assigning new data to the dataframe
  2. The assign() method to add new columns
  3. The insert() method to add new columns

Outline

The outline of the tutorial is as follow: a brief introduction, and a quick overview on how to add new columns to Pandas dataframe (all three methods). Following the overview of the three methods, we create some fake data, and then we use the three methods to add columns to the created dataframe.

Introduction

There are many things that we may want to do after we have created, or loaded, our dataframe in Pandas. For instance, we may go on and do some data manipulation tasks such as manipulating the columns of the dataframe. Now, if we are reading most of the data from one data source but some data from another we need to know how to add columns to a dataframe.

Adding a column to a Pandas dataframe is easy. Furthermore, as you surely have noticed, there are a few ways to carry out this task. Of course, this can create some confusion for beginners- Here, as a beginner you might see several different ways to add a column to a dataframe and you may ask yourself: which one should I use?

How to Add New Columns to a Dataframe in Pandas in 3 Ways

As previously mentioned, this tutorial is going to go through 3 different methods we can use when adding columns to the dataframe. First, we are going to use the method you may be familiar with if you know Python but have not worked with Pandas that much yet. Namely, we are going to use simple assigning:

1. Adding a New Column by Assigning New Data:

Here’s how to add a list, for example, to an existing dataframe in Pandas:df[‘NewCol’] = [1, 3, 4, 5, 6]. In the next example, we are going to use the assign() method:

2. Adding New Columns Using the assign() Method:

Here’s how to add new columns by using the assign() method: df = df.assign(NewCol1=[1, 2, 3, 4, 5], NewCol2=[.1, .2, .3., .5, -3]). After this, we will see an example of adding new columns using the insert() method:

3. Adding New Columns Using the insert() Method:

Here’s how new columns can be added with the insert() method: df.insert(4, [1, 2, 3, 4, 5]). In the next section, before we go through the examples, we are going to create some example data to play around with.

Pandas dataframe from a dictionary

In most cases, we are going to read our data from an external data source. Here, however, we are going to create a Pandas dataframe from a dictionary

import pandas as pd

gender = ['M', 'F', 'F', 'M']
cond = ['Silent', 'Silent', 
        'Noise', 'Noise']
age = [19, 21, 20, 22]
rt = [631.2, 601.3, 
     721.3, 722.4]

data = {'Gender':gender,
       'Condition':cond,
       'age':age,
       'RT':rt}

# Creating the Datafame from dict:
df = pd.DataFrame(data)

In the code chunk above, we imported Pandas and created 4 Python lists. Second, we created a dictionary with the column names we later want in our dataframe as keys and the 4 lists as values. Finally, we used the dataframe constructor to create a dataframe from our list. If you need to learn more about importing data to a Pandas dataframe check the following tutorials:

Example 1: Adding New Columns to a dataframe by Assigning Data

In the first example, we are going to add new columns to the dataframe by assigning new data. For example, if we are having two lists, containing new data, that we need to add to an existing dataframe we can just assign each list as follows:

df['NewCol1'] = 'A'
df['NewCol2'] = [1, 2, 3, 4]

display(df)

In the code above, we first added the list ([1 ,2 ,3 ,4 ,5]) by assigning it to a new column. To explain, the new column was created using the brackets ([]). Second, we added another column in the same way. Now, the second column, on the other hand, we just added a string (‘A’). Note, assigning a single value, as we did, will fill the entire newly added column with that value. Finally, when adding columns using this method we set the new column names using Python strings.

Two added columns to dataframe in pandasThe dataframe with the new, added, columns

Now, it’s important to know that each list we assign to a new column from, for example, a list it needs to be of the exact same length as the existing columns in the Pandas dataframe. For example, the example dataframe we are working with have 4 rows:

If we try to add 3 new rows, it won’t work (see the image below, for error message).

Example 2: Adding New Columns to a dataframe with the assign() method

In the second example, we are adding new columns to the Pandas dataframe with the assign() method:

df.assign(NewCol1='A',
         NewCol2=[1, 2, 3, 4])

In the second adding new columns example, we assigned two new columns to our dataframe by adding two arguments to the assign method. These two arguments will become the new column names. Furthermore, each of our new columns also has the two lists we used in the previous example added. This way the result is exactly the same as in the first example.  Importantly, if we use the same names as already existing columns in the dataframe, the old columns will be overwritten. Again, when adding new columns the data you want to add need to be of the exact same length as the number of rows of the Pandas dataframe.

Example 3: Adding New Columns to dataframe in Pandas with the insert() method

In the third example, we are going to add new columns to the dataframe using the insert() method:

df.insert(4, 'NewCol1', 'Bre')
df.insert(5, 'NewCol2', [1, 2, 3, 4])

display(df)

To explain the code above: we added two empty columns using 3 arguments of the insert() method. First, we used the loc argument to “tell” Pandas where we want our new column to be located in the dataframe. In our case, we add them to the last position in the dataframe. Second, we used the column argument (takes a string for the new column names). Lastly, we used the value argument to actually add the same list as in the previous examples. Here is the resulting dataframe:

Two new columns added to the dataframe in Pandas using the insert method.

As you may have noticed, when working with insert() method we need to how many columns there are in the dataframe. For example, when we use the code above it is not possible to insert a column where there already is one. Another option, however, that we can use if we don’t know the number of columns is using len(df.columns). Here is the same example as above using the length of the columns instead:

df.insert(len(df.columns), 'NewCol1', 'Bre')
df.insert(len(df.columns), 'NewCol2', [1, 2, 3, 4])

Note, if we really want to we actually can insert columns wherever we want in the dataframe. To accomplish this we need to set the allow_duplicates to true. For example, the following adding column example will work:

df.insert(1, 'NewCol1', 'Bre', allow_duplicates=TRUE)
df.insert(3, 'NewCol2', [1, 2, 3, 4], allow_duplicates=TRUE)

Now, if we have a lot of columns there are, of course, alternatives that may be more feasable than the one we have covered here. For instance, if we want to add columns from another dataframe we can either use the join, concat, or merge methods.

Conclusion

In this post, we learned how to add new columns to a dataframe in Pandas. Specifically, we used 3 different methods. First, we added a column by simply assigning a string and a list. This method is very similar to when we assign variables to Python variables. Second, we used the assign() method and added new columns in the Pandas dataframe. Finally, we had a look at the insert() method and used this method to add new columns in the dataframe. In conclusion, the best method to add columns is the assign() method. Of course, if we read data from other sources and want to merge two dataframe, only getting the new columns from one dataframe, we should use other methods (e.g., concat or merge).

Hope you enjoyed this Pandas tutorial and please leave a comment below. Especially, if there is something you want to be covered on the blog or something that should be added to this blog post. Finally, please share the post if you learned something new!

The post Adding New Columns to a Dataframe in Pandas (with Examples) appeared first on Erik Marsja.

July 07, 2020 01:52 PM UTC


Everyday Superpowers

Stop working so hard on paths. Get started with pathlib!

Most people are working to hard to access files and folders with python. Pathlib makes it so much easier, and I have released two resources to help you get started using it.


Read more...

July 07, 2020 01:39 PM UTC


The Digital Cat

Flask project setup: TDD, Docker, Postgres and more - Part 3

In this series of posts I explore the development of a Flask project with a setup that is built with efficiency and tidiness in mind, using TDD, Docker and Postgres.

Catch-up

In the first and second posts I created a Flask project with a tidy setup, using Docker to run the development environment and the tests, and mapping important commands in a management script, so that the configuration can be in a single file and drive the whole system.

In this post I will show you how to easily create scenarios, that is databases created on the fly with custom data, so that it is possible to test queries in isolation, either with the Flask application or with the command line. I will also show you how to define a configuration for production and give some hints for the deployment.

Step 1 - Creating scenarios

The idea of scenarios is simple. Sometimes you need to investigate specific use cases for bugs, or maybe increase the performances of some database queries, and you might need to do this on a customised database. This is a scenario, a Python file that populates the database with a specific set of data and that allows you to run the application or the database shell on it.

Often the development database is a copy of the production one, maybe with sensitive data stripped to avoid leaking private information, and while this gives us a realistic case where to test queries (e.g. how does the query perform on 1 million lines?) it might not help during the initial investigations, where you need to have all the data in fron of you to properly understand what happens. Whoever learned how joins work in relational databases understands what I mean here.

In principle, to create a scenario we just need to spin up an empty database and to run the scenario code against it. In practice, things are not much more complicated, but there are a couple of minor issues that we need to solve.

First, I am already running a database for the development and one for the testing. The second is ephemeral, but I decided to setup the project so that I can run the tests while the development database is up, and the way I did it was using port 5432 (the standard Postgres one) for development and 5433 for testing. Spinning up scenarios adds more databases to the equation. Clearly I do not expect to run 5 scenrios at the same time while running the development and the test databases, but I make myself a rule to make something generic as soon I do it for the third time.

This means that I won't create a database for a scenario on port 5434 and will instead look for a more generic solution. This is offered me by the Docker networking model, where I can map a container port to the host but avoid assigning the destination port, and it will be chose randomly by Docker itself among the unprivileged ones. This means that I can create a Postgres container mapping port 5432 (the port in the container) and having Docker connect it to port 32838 in the host (for example). As long as the application knows which port to use this is absolutely the same as using port 5432.

Unfortunately the Docker interface is not extremely script-friendly when it comes to providing information and I have to parse the output a bit. Practically speaking, after I spin up the containers, I will run the command docker-compose port db 5432 which will return a string like 0.0.0.0:32838, and I will extract the port from it. Nothing major, but these are the (sometimes many) issues you face when you orchestrate different systems together.

The new management script is

File: manage.py

#! /usr/bin/env python

import os
import json
import signal
import subprocess
import time
import shutil

import click
import psycopg2
from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT


# Ensure an environment variable exists and has a value
def setenv(variable, default):
    os.environ[variable] = os.getenv(variable, default)


setenv("APPLICATION_CONFIG", "development")

APPLICATION_CONFIG_PATH = "config"
DOCKER_PATH = "docker"


def app_config_file(config):
    return os.path.join(APPLICATION_CONFIG_PATH, f"{config}.json")


def docker_compose_file(config):
    return os.path.join(DOCKER_PATH, f"{config}.yml")


def configure_app(config):
    # Read configuration from the relative JSON file
    with open(app_config_file(config)) as f:
        config_data = json.load(f)

    # Convert the config into a usable Python dictionary
    config_data = dict((i["name"], i["value"]) for i in config_data)

    for key, value in config_data.items():
        setenv(key, value)


@click.group()
def cli():
    pass


@cli.command(context_settings={"ignore_unknown_options": True})
@click.argument("subcommand", nargs=-1, type=click.Path())
def flask(subcommand):
    configure_app(os.getenv("APPLICATION_CONFIG"))

    cmdline = ["flask"] + list(subcommand)

    try:
        p = subprocess.Popen(cmdline)
        p.wait()
    except KeyboardInterrupt:
        p.send_signal(signal.SIGINT)
        p.wait()


def docker_compose_cmdline(commands_string=None):
    config = os.getenv("APPLICATION_CONFIG")
    configure_app(config)

    compose_file = docker_compose_file(config)

    if not os.path.isfile(compose_file):
        raise ValueError(f"The file {compose_file} does not exist")

    command_line = [
        "docker-compose",
        "-p",
        config,
        "-f",
        compose_file,
    ]

    if commands_string:
        command_line.extend(commands_string.split(" "))

    return command_line


@cli.command(context_settings={"ignore_unknown_options": True})
@click.argument("subcommand", nargs=-1, type=click.Path())
def compose(subcommand):
    cmdline = docker_compose_cmdline() + list(subcommand)

    try:
        p = subprocess.Popen(cmdline)
        p.wait()
    except KeyboardInterrupt:
        p.send_signal(signal.SIGINT)
        p.wait()


def run_sql(statements):
    conn = psycopg2.connect(
        dbname=os.getenv("POSTGRES_DB"),
        user=os.getenv("POSTGRES_USER"),
        password=os.getenv("POSTGRES_PASSWORD"),
        host=os.getenv("POSTGRES_HOSTNAME"),
        port=os.getenv("POSTGRES_PORT"),
    )

    conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
    cursor = conn.cursor()
    for statement in statements:
        cursor.execute(statement)

    cursor.close()
    conn.close()


def wait_for_logs(cmdline, message):
    logs = subprocess.check_output(cmdline)
    while message not in logs.decode("utf-8"):
        time.sleep(0.1)
        logs = subprocess.check_output(cmdline)


@cli.command()
def create_initial_db():
    configure_app(os.getenv("APPLICATION_CONFIG"))

    try:
        run_sql([f"CREATE DATABASE {os.getenv('APPLICATION_DB')}"])
    except psycopg2.errors.DuplicateDatabase:
        print(
            f"The database {os.getenv('APPLICATION_DB')} already exists and will not be recreated"
        )


@cli.command()
@click.argument("filenames", nargs=-1)
def test(filenames):
    os.environ["APPLICATION_CONFIG"] = "testing"
    configure_app(os.getenv("APPLICATION_CONFIG"))

    cmdline = docker_compose_cmdline("up -d")
    subprocess.call(cmdline)

    cmdline = docker_compose_cmdline("logs db")
    wait_for_logs(cmdline, "ready to accept connections")

    run_sql([f"CREATE DATABASE {os.getenv('APPLICATION_DB')}"])

    cmdline = ["pytest", "-svv", "--cov=application", "--cov-report=term-missing"]
    cmdline.extend(filenames)
    subprocess.call(cmdline)

    cmdline = docker_compose_cmdline("down")
    subprocess.call(cmdline)


@cli.group()
def scenario():
    pass


@scenario.command()
@click.argument("name")
def up(name):
    os.environ["APPLICATION_CONFIG"] = f"scenario_{name}"
    config = os.getenv("APPLICATION_CONFIG")

    scenario_config_source_file = app_config_file("scenario")
    scenario_config_file = app_config_file(config)

    if not os.path.isfile(scenario_config_source_file):
        raise ValueError(f"File {scenario_config_source_file} doesn't exist")
    shutil.copy(scenario_config_source_file, scenario_config_file)

    scenario_docker_source_file = docker_compose_file("scenario")
    scenario_docker_file = docker_compose_file(config)

    if not os.path.isfile(scenario_docker_source_file):
        raise ValueError(f"File {scenario_docker_source_file} doesn't exist")
    shutil.copy(docker_compose_file("scenario"), scenario_docker_file)

    configure_app(f"scenario_{name}")

    cmdline = docker_compose_cmdline("up -d")
    subprocess.call(cmdline)

    cmdline = docker_compose_cmdline("logs db")
    wait_for_logs(cmdline, "ready to accept connections")

    cmdline = docker_compose_cmdline("port db 5432")
    out = subprocess.check_output(cmdline)
    port = out.decode("utf-8").replace("\n", "").split(":")[1]
    os.environ["POSTGRES_PORT"] = port

    run_sql([f"CREATE DATABASE {os.getenv('APPLICATION_DB')}"])

    scenario_module = f"scenarios.{name}"
    scenario_file = os.path.join("scenarios", f"{name}.py")
    if os.path.isfile(scenario_file):
        import importlib

        os.environ["APPLICATION_SCENARIO_NAME"] = name

        scenario = importlib.import_module(scenario_module)
        scenario.run()

    cmdline = " ".join(
        docker_compose_cmdline(
            "exec db psql -U {} -d {}".format(
                os.getenv("POSTGRES_USER"), os.getenv("APPLICATION_DB")
            )
        )
    )
    print("Your scenario is ready. If you want to open a SQL shell run")
    print(cmdline)


@scenario.command()
@click.argument("name")
def down(name):
    os.environ["APPLICATION_CONFIG"] = f"scenario_{name}"
    config = os.getenv("APPLICATION_CONFIG")

    cmdline = docker_compose_cmdline("down")
    subprocess.call(cmdline)

    scenario_config_file = app_config_file(config)
    os.remove(scenario_config_file)

    scenario_docker_file = docker_compose_file(config)
    os.remove(scenario_docker_file)


if __name__ == "__main__":
    cli()

where I added the scenario up and scenario down commands. As you can see the function up first copies the config/scenario.json and the docker/scenario.yml files (that I still have to create) into files named after the scenario.

Then I run the up -d command and wait for the database to be ready, as I already do for tests. After that, it's time to extract the port of the container with some very simple Python string processing and to initialise the correct environment variable.

Last, I import and execute the Python file containing the code of the scenario itself and print a friendly message with the command line to run psql to have a Postgres shell into the newly created database.

The down function simply tears down the containers and removes the scenario configuration files.

The two missing config files are pretty simple. The docker compose configuration is

File: docker/scenario.yml

version: '3.4'

services:
  db:
    image: postgres
    environment:
      POSTGRES_DB: ${POSTGRES_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_HOSTNAME: ${POSTGRES_HOSTNAME}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    ports:
      - "5432"
  web:
    build:
      context: ${PWD}
      dockerfile: docker/web/Dockerfile
    environment:
      FLASK_ENV: ${FLASK_ENV}
      FLASK_CONFIG: ${FLASK_CONFIG}
      APPLICATION_DB: ${APPLICATION_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_HOSTNAME: ${POSTGRES_HOSTNAME}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_PORT: ${POSTGRES_PORT}
    command: flask run --host 0.0.0.0
    volumes:
      - ${PWD}:/opt/code
    ports:
      - "5000"

Here you can see that the database is ephemeral, that the port on the host is automatically assigned, and that I also spin up the application (mapping it to a random port as well to avoid clashing with the development one).

The configuration file is

File: config/scenario.json

[
  {
    "name": "FLASK_ENV",
    "value": "development"
  },
  {
    "name": "FLASK_CONFIG",
    "value": "development"
  },
  {
    "name": "POSTGRES_DB",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_USER",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_HOSTNAME",
    "value": "localhost"
  },
  {
    "name": "POSTGRES_PASSWORD",
    "value": "postgres"
  },
  {
    "name": "APPLICATION_DB",
    "value": "application"
  }
]

which doesn't add anything new to what I already did for development and testing.

Resources

Scenario example 1

Let's have a look at a very simple scenario that doesn't do anything on the database, just to understand the system. The code for the scenario is

File: scenarios/foo.py

import os


def run():
    print("HEY! This is scenario", os.environ["APPLICATION_SCENARIO_NAME"])

When I run the scenario I get the following output

$ ./manage.py scenario up foo
Creating network "scenario_foo_default" with the default driver
Creating scenario_foo_db_1  ... done
Creating scenario_foo_web_1 ... done
HEY! This is scenario foo
Your scenario is ready. If you want to open a SQL shell run
docker-compose -p scenario_foo -f docker/scenario_foo.yml exec db psql -U postgres -d application

The command docker ps shows that my development environment is happily running alongside with the scenario

$ docker ps
CONTAINER ID  IMAGE             COMMAND                 [...]  PORTS                    NAMES
85258892a2df  scenario_foo_web  "flask run --host 0.…"  [...]  0.0.0.0:32826->5000/tcp  scenario_foo_web_1
a031b6429e07  postgres          "docker-entrypoint.s…"  [...]  0.0.0.0:32827->5432/tcp  scenario_foo_db_1
1a449d23da01  development_web   "flask run --host 0.…"  [...]  0.0.0.0:5000->5000/tcp   development_web_1
28aa566321b5  postgres          "docker-entrypoint.s…"  [...]  0.0.0.0:5432->5432/tcp   development_db_1

And the output of the scenario up foo command contains the string HEY! This is scenario foo that was printed by the file foo.py. We can also successfully run the suggested command

$ docker-compose -p scenario_foo -f docker/scenario_foo.yml exec db psql -U postgres -d application
psql (12.3 (Debian 12.3-1.pgdg100+1))
Type "help" for help.

application=# \l
                                  List of databases
    Name     |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   
-------------+----------+----------+------------+------------+-----------------------
 application | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 postgres    | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0   | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
             |          |          |            |            | postgres=CTc/postgres
 template1   | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
             |          |          |            |            | postgres=CTc/postgres
(4 rows)

application=#

And inside the database we find the application database created explicitly for the scenario (the name is specified in config/scenario.json). If you don't know psql you can exit with \q or Ctrl-d.

Before tearing down the scenario have a look at the two files config/scenario_foo.json and docker/scenario_foo.yml. They are just copies of config/scenario.json and docker/scenario.yml but I think seeing them there might help to understand how the whole thing works. When you are done run ./manage.py scenario down foo.

Scenario example 2

Let's do something a bit more interesting. The new scenario is contained in scenarios/users.py

File: scenarios/users.py

from application.app import create_app
from application.models import db, User


app = create_app("development")


def run():
    with app.app_context():
        db.drop_all()
        db.create_all()

        # Administrator
        admin = User(email="admin@server.com")
        db.session.add(admin)

        # First user
        user1 = User(email="user1@server.com")
        db.session.add(user1)

        # Second user
        user2 = User(email="user2@server.com")
        db.session.add(user2)

        db.session.commit()

I decided to be as agnostic as possible in the scenarios, to avoid creating something too specific that eventually would not give me enough flexibility to test what I need. This means that the scenario has to create the app and to use the database session explicitly, as I do in this example. The application is created with the "development" configuration. Remember that this is the Flask configuration that you find in application/config.py, not the one that is in config/development.json.

I can run the scenario with

$ ./manage.py scenario up users

and then connect to the database to find my users

$ docker-compose -p scenario_users -f docker/scenario_users.yml exec db psql -U postgres -d application
psql (12.3 (Debian 12.3-1.pgdg100+1))
Type "help" for help.

application=# \dt
         List of relations
 Schema | Name  | Type  |  Owner
--------+-------+-------+----------
 public | users | table | postgres
(1 row)

application=# select * from users;
 id |      email
----+------------------
  1 | admin@server.com
  2 | user1@server.com
  3 | user2@server.com
(3 rows)

application=# \q

Step 2 - Simulating the production environment

As I stated at the very beginning of this mini series of posts, one of my goals was to run in development the same database that I run in production, and for this reason I went through the configuration steps that allowed me to have a Postgres container running both in development and during tests. In a real production scenario Postgres would probably run in a separate instance, for example on the RDS service in AWS, but as long as you have the connection parameters nothing changes in the configuration.

Docker actually allows us to easily simulate the production environment as well. Well, if our notebook was connected 24/7 we might as well host the production there directly. Not that I recommend this nowadays, but this is how many important companies begun many years ago when cloud computing had not been here yet. Instead of installing a LAMP stack we configure containers, but the idea doesn't change.

I will then create a configuration that simulates a production environment and then give some hints on how to translate this into a proper production infrastructure. If you want to have a clear picture of the components of a web application in production read my post Dissecting a web stack that analyses them one by one.

The first component that we have to change here is the HTTP server. In development we use Flask's development server, and the first message that server prints is WARNING: This is a development server. Do not use it in a production deployment. Got it, Flask! A good choice to replace it is Gunicorn, so first of all I add it in the requirements

File: requirements/production.txt

Flask
flask-sqlalchemy
psycopg2
flask-migrate
gunicorn

Then I need to create a docker-compose configuration for production

File: docker/production.yml

version: '3.4'

services:
  db:
    image: postgres
    environment:
      POSTGRES_DB: ${POSTGRES_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_HOSTNAME: ${POSTGRES_HOSTNAME}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    ports:
      - "${POSTGRES_PORT}:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data
  web:
    build:
      context: ${PWD}
      dockerfile: docker/web/Dockerfile.production
    environment:
      FLASK_ENV: ${FLASK_ENV}
      FLASK_CONFIG: ${FLASK_CONFIG}
      APPLICATION_DB: ${APPLICATION_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_HOSTNAME: ${POSTGRES_HOSTNAME}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_PORT: ${POSTGRES_PORT}
    command: gunicorn -w 4 -b 0.0.0.0 wsgi:app
    volumes:
      - ${PWD}:/opt/code
    ports:
      - "8000:8000"

volumes:
  pgdata:

As you can see here the command that runs the application is slightly different, gunicorn -w 4 -b 0.0.0.0 wsgi:app. It exposes 4 processes (-w 4) on the container's address 0.0.0.0 loading the app object from the wsgi.py file (wsgi:app). As by default Gunicorn exposes port 8000 I mapped that to the same port in the host.

As the production image need to install the production requirements I also took the opportunity to create the docker/web subdirectory and move the web Dockerfile there. Then I created the Dockerfile.production one

File: docker/web/Dockerfile.production

FROM python:3

ENV PYTHONUNBUFFERED 1

RUN mkdir /opt/code
RUN mkdir /opt/requirements
WORKDIR /opt/code

ADD requirements /opt/requirements
RUN pip install -r /opt/requirements/production.txt

Having moved the development Dockerfile into the subdirectory I also fixed the other docker-compose files to match the new configuration. I can now build the image with

$ APPLICATION_CONFIG="production" ./manage.py compose build web

The last thing I need is a configuration file

File: config/production.json

[
  {
    "name": "FLASK_ENV",
    "value": "production"
  },
  {
    "name": "FLASK_CONFIG",
    "value": "production"
  },
  {
    "name": "POSTGRES_DB",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_USER",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_HOSTNAME",
    "value": "localhost"
  },
  {
    "name": "POSTGRES_PORT",
    "value": "5432"
  },
  {
    "name": "POSTGRES_PASSWORD",
    "value": "postgres"
  },
  {
    "name": "APPLICATION_DB",
    "value": "application"
  }
]

as you can notice this is not very different from the development one, as I just changed the values of FLASK_ENV and FLASK_CONFIG. Clearly this contains a secret that shouldn't be written in plain text, POSTGRES_PASSWORD, but after all this is a simulation of production. In a real environment secrets should be kept in an encrypted manager such as AWS Secret Manager.

Remember that FLASK_ENV changes the internal settings of Flask, most notably disabling the debugger, and the FLASK_CONFIG=production loads the ProductionConfig object from application/config.py. That object is empty for the moment, but it might contain public configuration for the production server.

Resources

Step 3 - Scale up

Mapping the container port to the host is not a great idea, though, as it makes it impossible to scale up and down to serve more load, which is the main point of running containers in production. This might be solved in many ways in the cloud, for example in AWS you might run the container in AWS Fargate and register them in an Application Load Balancer. Another way to do it on a sinlge host is to run a Web Server in front of your HTTP server, and this might be easily implemented with docker-compose

I will add nginx and serve HTTP from there, reverse proxying the application containers through docker-compose networking. First of all the new configuration for docker-compose

File: docker/production.yml

version: '3.4'

services:
  db:
    image: postgres
    environment:
      POSTGRES_DB: ${POSTGRES_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_HOSTNAME: ${POSTGRES_HOSTNAME}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    ports:
      - "${POSTGRES_PORT}:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data
  web:
    build:
      context: ${PWD}
      dockerfile: docker/web/Dockerfile.production
    environment:
      FLASK_ENV: ${FLASK_ENV}
      FLASK_CONFIG: ${FLASK_CONFIG}
      APPLICATION_DB: ${APPLICATION_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_HOSTNAME: ${POSTGRES_HOSTNAME}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_PORT: ${POSTGRES_PORT}
    command: gunicorn -w 4 -b 0.0.0.0 wsgi:app
    volumes:
      - ${PWD}:/opt/code
  nginx:
    image: nginx
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
    ports:
      - 8080:8080

volumes:
  pgdata:

As you can see I added a service nginx that runs the default Nginx image, mapping a custom configuration file that I will create in a minute. The application container doesn't need any port mapping, as I won't access it directly from the host anymore. The Nginx configuration file is

File: docker/nginx/nginx.conf

worker_processes 1;

events { worker_connections 1024; }

http {

    sendfile on;

    upstream app {
        server web:8000;
    }

    server {
        listen 8080;

        location / {
            proxy_pass         http://app;
            proxy_redirect     off;
            proxy_set_header   Host $host;
            proxy_set_header   X-Real-IP $remote_addr;
            proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header   X-Forwarded-Host $server_name;
        }
    }
}

This is a pretty standard configuration, and in a real production environment I would add many other configuration values (most notably serving HTTPS instead of HTTP). The upstream section leverages docker-compose networking referring to web, which in the internal DNS directly maps to the IPs of the service with the same name. The port 8000 comes from the default Gunicorn port that I already mentioned before. I won't run the nginx container as root on my notebook, so I will expose port 8080 instead of the traditional 80 for HTTP, and this is also something that would be different in a real production environment.

I can at this point run

$ APPLICATION_CONFIG="production" ./manage.py compose up -d
Starting production_db_1    ... done
Starting production_nginx_1 ... done
Starting production_web_1   ... done

It's interesting to have a look at the logs of the nginx container, as Nginx by default prints all the incoming requests

$ APPLICATION_CONFIG="production" ./manage.py compose logs -f nginx
Attaching to production_nginx_1
[...]
nginx_1  | 172.30.0.1 - - [05/Jul/2020:10:40:44 +0000] "GET / HTTP/1.1" 200 13 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0"

The last line is what I get when I visit localhost:8080 while the production setup is up and running.

Scaling up and down the service is now a breeze

$ APPLICATION_CONFIG="production" ./manage.py compose up -d --scale web=3
production_db_1 is up-to-date
Starting production_web_1 ... 
Starting production_web_1 ... done
Creating production_web_2 ... done
Creating production_web_3 ... done

Resources

Bonus step - A closer look at Docker networking

I mentioned that docker-compose creates a connection between services, and used that in the configuration of the nginx container, but I understand that this might look like black magic to some people. While I believe that this is actually black magic, I also think that we can investigate it a bit, so let's open the grimoire and reveal (some of) the dark secrets of Docker networking.

While the production setup is running we can connect to the nginx container and see what is happening in real time, so first of all I run a bash shell on it

$ APPLICATION_CONFIG="production" ./manage.py compose exec nginx bash

Once inside I can see my configuration file at /etc/nginx/nginx.conf, but this has not changed. Remember that Docker networking doesn't work as a templating engine, but with a local DNS. This means that if we try to resolve web from inside the container we should see multiple IPs. The command dig is a good tool to investigate the DNS, but it doesn't come preinstalled in the nginx container, so I need to run

root@33cbaea369be:/# apt update && apt install dnsutils

and at this point I can run it

root@33cbaea369be:/# dig web

; <<>> DiG 9.11.5-P4-5.1+deb10u1-Debian <<>> web
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30539
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;web.                           IN      A

;; ANSWER SECTION:
web.                    600     IN      A       172.30.0.4
web.                    600     IN      A       172.30.0.6
web.                    600     IN      A       172.30.0.5

;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Sun Jul 05 10:58:18 UTC 2020
;; MSG SIZE  rcvd: 78

root@33cbaea369be:/#

The command outputs 3 IPs, which correspond to the 3 containers of the web service that I am currently running. If I scale down (from outside the container)

$ APPLICATION_CONFIG="production" ./manage.py compose up -d --scale web=1

then the output of dig becomes

root@33cbaea369be:/# dig web

; <<>> DiG 9.11.5-P4-5.1+deb10u1-Debian <<>> web
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13146
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;web.                           IN      A

;; ANSWER SECTION:
web.                    600     IN      A       172.30.0.4

;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Sun Jul 05 11:01:46 UTC 2020
;; MSG SIZE  rcvd: 40

root@33cbaea369be:/#

How to create the production infrastructure

This will be a very short section, as creating infrastructure and deploying in production are complex topics, so I want to just give some hints to stimulate your research.

AWS ECS is basically Docker in the cloud, and the whole structure can map almost 1 to 1 to the docker-compose setup, so it is worth learning. ECS can work on explicit EC2 instances that you manage, or in Fargate, which means that the EC2 instances running the containers are transparently managed by AWS itself.

Terraform is a good tool to create infrastructure. It has many limitations, mostly coming from its custom HCL language, but it's slowly becoming better (version 0.13 will finally allow us to run for loops on modules, for example). Despite its shortcomings, it's a great tool to create static infrastructure, so I recommend working on it.

Terraform is not the right tool to deploy your code, though, as that requires a dynamic interaction with the system, so you need to setup a good Continuous Integration system. Jenkins is a very well known open source CI, but I personally ended up dropping it because it doesn't seem to be designed for large scale systems. For example, it is very complicated to automate the deploy of a Jenkins server, and dynamic large scale systems should require zero manual intervention to be created. Anyway, Jenkins is a good tool to start with, but you might want to have a look at other products like CircleCI or Buildkite.

When you create your deploy pipeline you need to do much more than just creating the image and running it, at least for real applications. You need to decide when to apply database migrations and if you have a web front-end you will also need to compile and install the JavaScript assets. Since you don't want to have downtime when you deploy you will need to look into blue/green deployments, and in general to strategies that allow you to run different versions of the application at the same time, at least for short periods of time. Or for longer periods, if you want to perform A/B testing or zonal deployments.

Final words

This is the last post of this short series. I hope you learned something useful, and that it encouraged you to properly setup your projects and to investigate technologies like Docker. As always, feel free to send me feedback or questions, and if you find my posts useful please share them with whoever you thing might be interested.

Feedback

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.

July 07, 2020 12:00 PM UTC


PSF GSoC students blogs

Weekly Blog Post | Gsoc'2020 | #6

<meta charset="utf-8">

Greeting, People of the world!

Our first evaluation took place last week and guess what? I passed it!!! Well, I am pretty excited and happy about it. wink

 

1. What did you do this week?

I built the modal for colour customization for a single icon and did little edits here and there. Now it works perfectly. 

 

2. What is coming up next?

This week, I will be implementing logic for icon rotation in icons picker API and also added buttons for rotation on the icons editing Modal.

 

3. Did you get stuck anywhere?

Yes, a little when a few icons were not getting customized. Well, it was because of their SVG file having a style tag. This issue was resolved with the help of my mentors. 

July 07, 2020 07:38 AM UTC

Weekly Check-in #6

Hello,

What did you do this week?

I was looking for a alternate way to convert RST to MD without using Pandoc. Well, there is no current solution without pandoc. So I have to display as raw RST without convertion. I have added test coverage for the code. Added docs for the notebook download feature.

What is coming up next?

Tests needs some improvement. My mentor gave me feedback on my tests and how important the test are. I'm working towards merging the current PR.

Did you get stuck anywhere?

Finding the solution for converting RST to MD was quite challenging.

July 07, 2020 03:03 AM UTC


Matt Layman

Django Testing Toolbox

What are the tools that I use to test a Django app? Let’s find out! You might say I’m test obsessed. I like having very high automated test coverage. This is especially true when I’m working on solo applications. I want the best test safety net that I can have to protect me from myself. We’re going to explore the testing packages that I commonly use on Django projects. We’ll also look at a few of the important techniques that I apply to make my testing experience great.

July 07, 2020 12:00 AM UTC

July 06, 2020


Codementor

Understanding Virtual Environments in Python

Introduction to the concept of virtual environments in Python. Useful for a developer working on multiple projects on a single server.

July 06, 2020 08:37 PM UTC


PSF GSoC students blogs

Blog post for week 5: Polishing

Last week was another week of code and documentation polishing. Originally I planned to implement duplicate filtering with external data sources, however, I already did that in week 2 when I evaluated the possibility of disk-less external queues (see pull request #2).

One of the easier changes was to change the hostname/port/database settings triple to one Redis URL. redis-py supports initializing a client instance from an URL, e. g. redis://[[username]:[password]]@localhost:6379/0. The biggest advantage of the URL is its flexibility. It allows the user to optionally specify username and password, hostname, port, database name and even certain settings. The Redis URL scheme is also specified at https://www.iana.org/assignments/uri-schemes/prov/redis.

While working on the URL refactoring, we also noticed a subtle bug: A spider can have its own settings and hence it's possible for different spiders to use different Redis instances. My implementation was reusing an existing Redis connection but didn't account for spiders having different settings. The fix was easy: The client object is now cached in a dict and indexed by the Redis URL. This way, the object is only reused if the URL matches.

Another thing that kept me busy last week was detecting and handling various errors. If a user configures an unreachable hostname or wrong credentials, Scrapy should fail early and not in the middle of the crawl. The difficulty was that depending on if a new crawl is done or a previous crawl is picked up, queues would be created lazily (new crawl) or eagerly (continued crawl). To unify the behavior, I introduced a queue self-check which not only works for Redis but for all queues (i. e. also plain old disk-based queues). The idea is that upon initialization of a priority queue, it pushes a fake Request object, pops it again from the queue and compares the fingerprints. If the fingerprints match and if no exception was raised at this point, the self-check succeeded.

The code for this self-check looks as follows:

def selfcheck(self):
    # Find an empty/unused queue.
    while True:
        # Use random priority to not interfere with existing queues.
        priority = random.randrange(2**64)
        q = self.qfactory(priority)
        if not q:  # Queue is empty
            break
        q.close()

    self.queues[priority] = q
    self.curprio = priority
    req1 = Request('http://hostname.invalid', priority=priority)
    self.push(req1)
    req2 = self.pop()
    if request_fingerprint(req1) != request_fingerprint(req2):
        raise ValueError(
            "Pushed request %s with priority %d but popped different request %s."
            % (req1, priority, req2)
        )

    if q:
        raise ValueError(
            "Queue with priority %d should be empty after selfcheck!" % priority
        )


The code seems a bit complicated with the random call which might need additional explanation. The difficulty for the self-check is that a queue is basically identified by its priority and if queue with a given priority already exists it will be reused. This means that if we used a static priority we could pick up an existing queue and push and pop to it. This is actually not really a problem for FIFO queues where the last element that is pushed is also popped. But for LIFO queues that are not empty this means that an arbitrary element is popped (and not the request that we pushed). The solution for this problem is to generate a random priority, get a queue for that priority and only use it if it is empty. Otherwise, generate a new priority randomly. Due to the large range for the priority (0..2**64-1) it is extremely unlikely that a queue with that priority already exists but even if it does, the loop makes sure that another priority is generated.

For this week, I will do another feedback iteration with my mentors and prepare for the topic of next week: Implementing distributed crawling using common message queues.

July 06, 2020 07:56 PM UTC

I'm Not Drowning On My Own

Cold Water

Hello there! My schoolyear is coming to an end, with some final assignments and group projects left to be done. I for sure underestimated the workload of these and in the last (and probably next) few days I'm drowning in work trying to meet my deadlines.

One project that might be remotely relevant is cheese-shop, which tries to manage the metadata of packages from the real Cheese Shop. Other than that, schoolwork is draining a lot of my time and I can't remember the last time I came up with something new for my GSoC Project )-;

Warm Water

On the bright side, I received a lot of help and encouragement from contributors and stakeholders of pip. In the last week alone, I had five pull requests merged:

In addition to helping me getting my PRs merged, my mentor Pradyun Gedam also gave me my first official feedback, including what I'm doing right (and wrong too!) and what I should keep doing to increase the chance of the project being successful.

GH-7819's roadmap (Danny McClanahan's discoveries and works on lazy wheels) is being closely tracked by hatch's maintainter Ofek Lev, which really makes me proud and warms my heart, that what I'm helping build is actually needed by the community!

Learning How To Swim

With GH-8467 and GH-8530 merged, I'm now working on GH-8532 which aims to roll out the lazy wheel as the way to obtain dependency information via the CLI flag --use-feature=lazy-wheel.

GH-8532 was failing initially, despite being relatively trivial and that the commit it used to base on was passing. Surprisingly, after rebasing it on top of GH-8530, it suddenly became green mysteriously. After the first (early) review, I was able to iterate on my earlier code, which used the ambiguous exception RuntimeError.

The rest to be done is just adding some functional tests (I'm pretty sure this will be either overwhelming or underwhelming) to make sure that the command-line flag is working correctly. Hopefully this can make it into the beta of the upcoming release this month.

In other news, I've also submitted a patch improving the tests for the parallelization utilities, which was really messy as I wrote them. Better late than never!

Metaphors aside, I actually can't swim d-:

Dive Plan

After GH-8532, I think I'll try to parallelize downloads of wheels that are lazily fetched only for metadata. By the current implementation of the new resolver, for pip install, this can be injected directly between the resolution and build/installation process.

July 06, 2020 07:09 PM UTC


RMOTR

Can Anybody Become a Data Scientist?

Photo by Free To Use Sounds on Unsplash

Hi. My name is Lorelei, I’m a writer, and I know absolutely nothing about coding.

Well, that’s not entirely true. I’ve been writing for tech companies for a few years, so naturally one picks up on a thing or two. And my husband is a talented engineer whom I’d love to refer to as the Shakespeare of Coding (but he would be embarrassed if I did). Regardless, if I was told to write a line of code or perish, or even to describe the differences between, say, Python and Java, I would willingly accept death.

That ends today.

Turns out, there’s a lot more money to be had in the Data Science field and this writerly, hobo existence is tiring. Instead of writing about RMOTR’s courses, I’m going to take them.

…and then write about taking them. (Writing is truly an affliction).

Expectations

To be honest, I genuinely don’t know.

I expect to be challenged, frustrated, confused, and often times afraid.

https://medium.com/media/e25c0de6b4f1823deaeb4ac197d70332/href

I also expect to learn something valuable, be it actual coding or a genuine understanding of what my peers do, allowing me to better support them.

The most exciting prospect of navigating this field is discovering a new way to create and identifying the real possibilities that come with programming.

I’m very glad there will be plenty of projects and interactive tasks to complete. Listening is easy. Applying is hard. But, that’s where the real learning takes place. The more I can get my hands dirty practicing these new skills, the better.

That should also lead to lots of opportunities to celebrate, both victories and lessons learned. I’ve got my sticker chart of accomplishments ready to go!

I do anticipate crying at least twice. (More than twice). Tears of joy and enlightenment? I sure hope so.

https://medium.com/media/f83ac3f721e575f9470123057173b00d/href

The Course

Introduction to Programming with Python is my first stop on this journey. RMOTR co-founder Santiago Basulto leads this course and, boy, does he cover a lot.

https://medium.com/media/883af31573a47bdd084c3d9ac1fb14cb/href

It’s fine. I’ll be fine.

I gotta say, ‘booleans’ is a very intriguing word and I am looking forward to being able to accurately work it into sentences, willy nilly.

I’m also glad I can feel confident that my first meeting with Python is guaranteed to be thorough. Moving beyond this course, I should already have a solid grasp on the most important concepts, making each subsequent topic easier to understand. It should also leave room for some serious creativity. That’s my favorite.

So It Begins

Now I sally forth into the unknown, aiming to return a stronger, savvier human with skills that are actually marketable.

I’m starting off with the first two videos (seems logical) entitled Python Overview and Python Versions. I’ll share my thoughts and insights next week.

If you have any words of wisdom and/or support to share, they would be deeply appreciated. Words of Affirmation is my love language.

Even better, if you’d like to take the course with me, hop in. We have strength in numbers. (That’s a Data Science pun…?)

This is going to be great!


Can Anybody Become a Data Scientist? was originally published in rmotr.com on Medium, where people are continuing the conversation by highlighting and responding to this story.

July 06, 2020 06:41 PM UTC


PSF GSoC students blogs

Phase 2 - Weekly Check-in 6

End of Week 5  - 06/07/2020

The first phase of GSoC ended and I'm very happy that I passed my first evaluation! At the beginning I was scared and unsure about a lot of things but I was able to make it through thanks to the very supportive mentors and fellow students who helped me! I always tried to give my best in any kind of task and I will continue giving my best in the upcoming weeks!


What did you do this week?

This week was a little slow but I did a lot of research on how we can get better accuracy with traditional Computer Vision techniques and what all processing operations are important to achieve this.

What is coming up next?

Next task is to discuss with my mentors and suggest on how the project will be going forward in this phase! I will be adding image processing operations if necessary and will discuss on the possibility of adding deep learning or OpenCV based models to get the bes results while doing different computer vision tasks.

Did you get stuck anywhere?

As I was busy making my road-map ahead for the second phase, so I didn't have any major places I got stuck except some errors related to OpenCV functions but I eventually fixed them or went into the depths of internet to find the solution. :P


Thank you for reading!

July 06, 2020 06:29 PM UTC

Weekly Blog Post #3

So this week I had my first evaluation. It went great also I received my stipend today so hurray.Extracting golang metadata with shell is frustrating not every go lang module have licenses or the copyright text in a set format. So writing a generic script is quiet challenging I researched on how go license works turns out they have a dedicated license parser as well as scripts which can request license from github. But we can't do that. This week was mostly debugging and restarting again. Now I am trying to work on go.sum file. Extracting names and versions is easy. So all the go modules live in ~/go/pkg/mod and then repo name. But we can't cd into module dir with its name because some does which have upper case letter in their name have different directory name. My exams are also nearing so I have to study for that too . :P

July 06, 2020 06:09 PM UTC


Podcast.__init__

Pure Python Configuration Management With PyInfra

Building and managing servers is a challenging task. Configuration management tools provide a framework for handling the various tasks involved, but many of them require learning a specific syntax and toolchain. PyInfra is a configuration management framework that embraces the familiarity of Pure Python, allowing you to build your own integrations easily and package it all up using the same tools that you rely on for your applications. In this episode Nick Barrett explains why he built it, how it is implemented, and the ways that you can start using it today. He also shares his vision for the future of the project and you can get involved. If you are tired of writing mountains of YAML to set up your servers then give PyInfra a try today.

Summary

Building and managing servers is a challenging task. Configuration management tools provide a framework for handling the various tasks involved, but many of them require learning a specific syntax and toolchain. PyInfra is a configuration management framework that embraces the familiarity of Pure Python, allowing you to build your own integrations easily and package it all up using the same tools that you rely on for your applications. In this episode Nick Barrett explains why he built it, how it is implemented, and the ways that you can start using it today. He also shares his vision for the future of the project and you can get involved. If you are tired of writing mountains of YAML to set up your servers then give PyInfra a try today.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • This portion of Podcast.__init__ is brought to you by Datadog. Do you have an app in production that is slower than you like? Is its performance all over the place (sometimes fast, sometimes slow)? Do you know why? With Datadog, you will. You can troubleshoot your app’s performance with Datadog’s end-to-end tracing and in one click correlate those Python traces with related logs and metrics. Use their detailed flame graphs to identify bottlenecks and latency in that app of yours. Start tracking the performance of your apps with a free trial at datadog.com/pythonpodcast. If you sign up for a trial and install the agent, Datadog will send you a free t-shirt.
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to pythonpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!
  • Your host as usual is Tobias Macey and today I’m interviewing Nick Barrett about PyInfra, a pure Python framework for agentless configuration management

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what PyInfra is and its origin story?
  • There are a number of options for configuration management of various levels of complexity and language options. What are the features of PyInfra that might lead someone to choose it over other systems?
  • What do you see as the major pain points in dealing with infrastructure today?
  • For someone who is using PyInfra to manage their servers, what is the workflow for building and testing deployments?
  • How do you handle enforcement of idempotency in the operations being performed?
  • Can you describe how PyInfra is implemented?
    • How has its design or focus evolved since you first began working on it?
    • What are some of the initial assumptions that you had at the outset which have been challenged or updated as it has grown?
  • The library of available operations seems to have a good baseline for deploying and managing services. What is involved in extending or adding operations to PyInfra?
  • With the focus of the project being on its use of pure Python and the easy integration of external libraries, how do you handle execution of python functions on remote hosts that requires external dependencies?
  • What are some of the other options for interfacing with or extending PyInfra?
  • What are some of the edge cases or points of confusion that users of PyInfra should be aware of?
  • What has been the community response from developers who first encounter and trial PyInfra?
  • What have you found to be the most interesting, unexpected, or challenging aspects of building and maintaining PyInfra?
  • When is PyInfra the wrong choice for managing infrastructure?
  • What do you have planned for the future of the project?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

July 06, 2020 05:56 PM UTC


PSF GSoC students blogs

GSoC Weekly Blog #3

This week, I worked on completing the delete and edit message functionality and writing tests for them. I was able to complete these tasks and later worked on refactoring some part of the new code. I also made some small tweaks and fixes to the existing code base of mscolab. I was planning on redesigning the version window UI but I realised the redesign would take a lot of work in the backend and changes in database as well. I would need to talk to my mentors on how to proceed on this.

This week, I have a list of small tweaks that I need to do in the work I have finished till now. I will be working on those. I would also be talking with my mentors to confirm some requirements for the next component of my project which is the offline editing feature.

This week writing tests took me the most time. With very limited documentation it's kind of tough to write tests for PyQt5 but I was able to write them in the end. 

Currently my PR is awaiting approval from the mentors and will be merged soon.

July 06, 2020 05:23 PM UTC


Real Python

Object-Oriented Programming (OOP) in Python 3

Object-oriented programming (OOP) is a method of structuring a program by bundling related properties and behaviors into individual objects. In this tutorial, you’ll learn the basics of object-oriented programming in Python.

Conceptually, objects are like the components of a system. Think of a program as a factory assembly line of sorts. At each step of the assembly line a system component processes some material, ultimately transforming raw material into a finished product.

An object contains data, like the raw or preprocessed materials at each step on an assembly line, and behavior, like the action each assembly line component performs.

In this tutorial, you’ll learn how to:

  • Create a class, which is like a blueprint for creating an object
  • Use classes to create new objects
  • Model systems with class inheritance

Note: This tutorial is adapted from the chapter “Object-Oriented Programming (OOP)” in Python Basics: A Practical Introduction to Python 3.

The book uses Python’s built-in IDLE editor to create and edit Python files and interact with the Python shell, so you will see occasional references to IDLE throughout this tutorial. However, you should have no problems running the example code from the editor and environment of your choice.

Free Bonus: Click here to get access to a free Python OOP Cheat Sheet that points you to the best tutorials, videos, and books to learn more about Object-Oriented Programming with Python.

What Is Object-Oriented Programming in Python?

Object-oriented programming is a programming paradigm that provides a means of structuring programs so that properties and behaviors are bundled into individual objects.

For instance, an object could represent a person with properties like a name, age, and address and behaviors such as walking, talking, breathing, and running. Or it could represent an email with properties like a recipient list, subject, and body and behaviors like adding attachments and sending.

Put another way, object-oriented programming is an approach for modeling concrete, real-world things, like cars, as well as relations between things, like companies and employees, students and teachers, and so on. OOP models real-world entities as software objects that have some data associated with them and can perform certain functions.

Another common programming paradigm is procedural programming, which structures a program like a recipe in that it provides a set of steps, in the form of functions and code blocks, that flow sequentially in order to complete a task.

The key takeaway is that objects are at the center of object-oriented programming in Python, not only representing the data, as in procedural programming, but in the overall structure of the program as well.

Define a Class in Python

Primitive data structures—like numbers, strings, and lists—are designed to represent simple pieces of information, such as the cost of an apple, the name of a poem, or your favorite colors, respectively. What if you want to represent something more complex?

For example, let’s say you want to track employees in an organization. You need to store some basic information about each employee, such as their name, age, position, and the year they started working.

One way to do this is to represent each employee as a list:

kirk = ["James Kirk", 34, "Captain", 2265]
spock = ["Spock", 35, "Science Officer", 2254]
mccoy = ["Leonard McCoy", "Chief Medical Officer", 2266]

There are a number of issues with this approach.

First, it can make larger code files more difficult to manage. If you reference kirk[0] several lines away from where the kirk list is declared, will you remember that the element with index 0 is the employee’s name?

Second, it can introduce errors if not every employee has the same number of elements in the list. In the mccoy list above, the age is missing, so mccoy[1] will return "Chief Medical Officer" instead of Dr. McCoy’s age.

A great way to make this type of code more manageable and more maintainable is to use classes.

Classes vs Instances

Classes are used to create user-defined data structures. Classes define functions called methods, which identify the behaviors and actions that an object created from the class can perform with its data.

In this tutorial, you’ll create a Dog class that stores some information about the characteristics and behaviors that an individual dog can have.

Read the full article at https://realpython.com/python3-object-oriented-programming/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

July 06, 2020 04:21 PM UTC