skip to navigation
skip to content

Planet Python

Last update: October 20, 2021 09:40 PM UTC

October 20, 2021


Ben Cook

Poetry for Package Management in Machine Learning Projects

When you’re building a production machine learning system, reproducibility is a proxy for the effectiveness of your development process. But without locking all your Python dependencies, your builds are not actually repeatable. If you work in a Python project without locking long enough, you will eventually get a broken build because of a transitive dependency (that is, a dependency of a dependency).

But a broken build isn’t the most dangerous problem. A transitive dependency can change in a way that affects an algorithm’s results without you ever knowing it. Imagine being unable to reproduce an algorithm result because an ancestor dependency that never got pinned changes in an unexpected way. Not good.

Enter Poetry. Poetry is an environment management tool for Python projects. Admittedly, there are a lot of environment management tools for Python and there’s no guarantee that Poetry will win. But the Python development community does seem to be slowly standardizing around Poetry.

What is Poetry?

Virtual environment tools like virtualenv and conda have been an important part of modern Python development for a while. But there are a few problems:

Ideally, you want all your builds to be precisely repeatable and you want a straightforward solution that you can use across all your projects. This is where Poetry shines.

Think of Poetry like npm for Python. It’s centered around pyproject.toml, a single config file that defines your entire Python environment and it comes with a lock file, poetry.lock, which pins all dependencies (including transitive dependencies).

Poetry configuration

Poetry extends the pyproject.toml file (introduced in PEP 517) to specify build requirements, project dependencies and development dependencies. It also includes project-level configuration (things like the name, description and license), scripts, extra dependencies and more.

Importantly, the Poetry spec for pyproject.toml works for both Python packages and Python applications, which prevents you from using a different approach across Python projects. This is important for machine learning development because reusable utilities like config and basic calculations like bounding box math often fit better in Python packages than in your actual machine learning application.

Lock file

The poetry.lock file is similar in concept to running pip freeze > requirements.txt every time you add or remove a package. It’s just that Poetry manages this for you when you add or remove a dependency.

And Poetry comes with a good dependency resolver. Since some packages on PyPI don’t specify all dependencies in their metadata, it can make adding a new package a little slower in Poetry. But it does ensure that you won’t get broken builds down the road as dependencies evolve, which is important.

Quick Start

To install the latest Poetry into your Python environment, run:

curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python

After installation, the script will print instructions to your screen to add poetry to your path.

To initialize a new project, run:

poetry init

This will guide you interactively through the process of setting up a new Python project with Poetry. You will add information about the project, the version of Python to use and (optionally) dependencies and development dependencies.

Say you want to add a dependency, like NumPy, after your project has been initialized. You can do that with the poetry add command:

poetry add numpy@~1.20

This will install NumPy with version >=1.20.0,<1.21.0. Alternatively, you could run poetry add numpy@^1.20 to install version >=1.20.0,<2.0.0. When you run this command Poetry does a few things:

  1. Update pyproject.toml to specify the new dependency.
  2. Use the dependency resolver to find the set of package versions that best fit the configuration. In this case, just numpy 1.20.3, but this will also include other packages if they’re specified.
  3. Install all the packages found by the resolver. By default, these will be installed into a virtual environment managed by Poetry. But you can also tell Poetry to install dependencies directly to the system Python.
  4. Freeze all dependencies and save them to poetry.lock so the exact build can be repeated in the future.

If you add the --dev flag to the poetry add command, Poetry treats the dependency as a development dependency. For example:

poetry add black --dev

This distinction is useful because for production builds, you can install all non-development dependencies by adding the --no-dev flag:

poetry install --no-dev

This prevents unnecessary bloat in your production environment where you don’t need tools like formatters and linters.

Finally, you can enter a shell with the virtual environment managed by Poetry activated:

poetry shell
python

Recommended Setup

As mentioned above, Poetry manages virtual environments for you and gives you a dedicated shell. But for production machine learning projects, I think it’s usually better to develop inside a Docker container.

In this case, I recommend turning off Poetry’s virtual environment management. You can do that with a config command:

poetry config virtualenvs.create false

Or you can use an environment variable, for example:

POETRY_VIRTUALENVS_CREATE=false poetry install

And now, you should know everything you need to get started using Poetry to manage your Python environment machine learning projects! Of course, there’s a ton more to learn about Poetry. Fortunately, their documentation is pretty good.

Happy ML engineering!

The post Poetry for Package Management in Machine Learning Projects appeared first on Sparrow Computing.

October 20, 2021 09:33 PM UTC


Real Python

Using the len() Function in Python

In many situations, you’ll need to find the number of items stored in a data structure. Python’s built-in function len() is the tool that will help you with this task.

There are some cases in which the use of len() is straightforward. However, there are other times when you’ll need to understand how this function works in more detail and how to apply it to different data types.

In this tutorial, you’ll learn how to:

  • Find the length of built-in data types using len()
  • Use len() with third-party data types
  • Provide support for len() with user-defined classes

By the end of this article, you’ll know when to use the len() Python function and how to use it effectively. You’ll know which built-in data types are valid arguments for len() and which ones you can’t use. You’ll also understand how to use len() with third-party types, such as ndarray in NumPy and DataFrame in pandas, and with your own classes.

Free Bonus: Click here to get a Python Cheat Sheet and learn the basics of Python 3, like working with data types, dictionaries, lists, and Python functions.

Getting Started With Python’s len()

The function len() is one of Python’s built-in functions. It returns the length of an object. For example, it can return the number of items in a list. You can use the function with many different data types. However, not all data types are valid arguments for len().

You can start by looking at the help for this function:

>>>
>>> help(len)
Help on built-in function len in module builtins:
len(obj, /)
    Return the number of items in a container.

The function takes an object as an argument and returns the length of that object. The documentation for len() goes a bit further:

Return the length (the number of items) of an object. The argument may be a sequence (such as a string, bytes, tuple, list, or range) or a collection (such as a dictionary, set, or frozen set). (Source)

When you use built-in data types and many third-party types with len(), the function doesn’t need to iterate through the data structure. The length of a container object is stored as an attribute of the object. The value of this attribute is modified each time items are added to or removed from the data structure, and len() returns the value of the length attribute. This ensures that len() works efficiently.

In the following sections, you’ll learn about how to use len() with sequences and collections. You’ll also learn about some data types that you cannot use as arguments for the len() Python function.

Using len() With Built-in Sequences

A sequence is a container with ordered items. Lists, tuples, and strings are three of the basic built-in sequences in Python. You can find the length of a sequence by calling len():

>>>
>>> greeting = "Good Day!"
>>> len(greeting)
9

>>> office_days = ["Tuesday", "Thursday", "Friday"]
>>> len(office_days)
3

>>> london_coordinates = (51.50722, -0.1275)
>>> len(london_coordinates)
2

When finding the length of the string greeting, the list office_days, and the tuple london_coordinates, you use len() in the same manner. All three data types are valid arguments for len().

The function len() always returns an integer as it’s counting the number of items in the object that you pass to it. The function returns 0 if the argument is an empty sequence:

>>>
>>> len("")
0
>>> len([])
0
>>> len(())
0

In the examples above, you find the length of an empty string, an empty list, and an empty tuple. The function returns 0 in each case.

A range object is also a sequence that you can create using range(). A range object doesn’t store all the values but generates them when they’re needed. However, you can still find the length of a range object using len():

>>>
>>> len(range(1, 20, 2))
10

This range of numbers includes the integers from 1 to 19 with increments of 2. The length of a range object can be determined from the start, stop, and step values.

In this section, you’ve used the len() Python function with strings, lists, tuples, and range objects. However, you can also use the function with any other built-in sequence.

Read the full article at https://realpython.com/len-python-function/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

October 20, 2021 02:00 PM UTC


Python for Beginners

Iterator in Python

You must have used different data structures like python dictionary, list, tuple and set while programming. We often need to access the elements of these data structures in a sequential manner. To iterate these data structures sequentially, we generally use for loops and index of elements. In this article, we will try to understand how we can access elements of a list or a tuple without using a for loop or  the indices of the elements. So, let’s dive into the concept of iterator and iterables in Python.

What is an iterable in Python? 

A container object like a list or set can contain a lot of elements. If we can access the member elements of the container objects one at a time then the container object is called an iterable. In our programs we use different iterables like list, tuple, set, or dictionary. 

Elements of any iterable can be accessed one at a time using a for loop as follows.

myList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print("Elements of the list are:")
for element in myList:
    print(element)

Output:

Elements of the list are:
1
2
3
4
5
6
7
8
9
10

In a for loop, we access all the elements from start till end. But, Consider a situation where we have to access the next element of the iterable only when some specific event occurs. 

For example, Let us take a scenario in which we have a list of 100 elements. We continually ask the user to input a number and whenever they enter an even number, we print the next element of the list. 

Now, this cannot be done using a for loop as we cannot predict when the user will enter an even number.  Having no order or sequence in the inputs stops us from using the for loop to iterate the list in our program. Iterators can be a handy tool to access the elements of iterables in such situations. So, Let’s learn what an iterator is and how we can create an iterator in python to access the elements of an iterable.

What is an iterator in Python?

An iterator is an object that can be iterated upon. In other words, we can  access all the elements in an iterable object using an iterator. 

In python, we can create an iterator for any container object using the iter() method. The iter() method takes an iterable object as input and returns an iterator for the same object.

myList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
myIter = iter(myList)
print("list is:", myList)
print("Iterator for the list is:", myIter)

Output:

list is: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Iterator for the list is: <list_iterator object at 0x7f4c21734070>

In the output, you can see that a list_iterator object has been created by passing a list to the iter() function.

How to traverse an iterator in Python?

The simplest way to traverse an iterator is by using a for loop. We can access each element in the iterator using a for loop as follows.

myList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
myIter = iter(myList)
print("list is:", myList)
print("Elements in the iterator are:")
for element in myIter:
    print(element)

Output:

list is: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Elements in the iterator are:
1
2
3
4
5
6
7
8
9
10

As discussed earlier, for loop won’t work if we have no order or sequence for traversing the iterator. For this, we can use two ways.

The first way to access an element of the iterator is by using the __next__() method. The __next__() method, when invoked on the iterator, returns an element next to the previously traversed element. It always retains the information about the element which was returned last time and whenever it is invoked, it returns only the next element which has not been traversed yet. We can understand this using the following example.

myList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
myIter = iter(myList)
print("list is:", myList)
print("Elements in the iterator are:")
try:
    print(myIter.__next__())
    print(myIter.__next__())
    print(myIter.__next__())
    print(myIter.__next__())
    print(myIter.__next__())
    print(myIter.__next__())
    print(myIter.__next__())
    print(myIter.__next__())
    print(myIter.__next__())
    print(myIter.__next__())
    print(myIter.__next__())
    print(myIter.__next__())
except StopIteration as e:
    print("All elements in the iterator already traversed. Raised exception", e)

Output:

list is: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Elements in the iterator are:
1
2
3
4
5
6
7
8
9
10
All elements in the iterator already traversed. Raised exception 

Another way to access the elements of the iterator is by using the next() function. The next() function takes the iterator as input and returns the next element that has not been traversed yet, just like the __next__() method. 

myList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
myIter = iter(myList)
print("list is:", myList)
print("Elements in the iterator are:")
try:
    print(next(myIter))
    print(next(myIter))
    print(next(myIter))
    print(next(myIter))
    print(next(myIter))
    print(next(myIter))
    print(next(myIter))
    print(next(myIter))
    print(next(myIter))
    print(next(myIter))
    print(next(myIter))
except StopIteration as e:
    print("All elements in the iterator already traversed. Raised exception", e)

Output:

list is: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Elements in the iterator are:
1
2
3
4
5
6
7
8
9
10
All elements in the iterator already traversed. Raised exception 

You can observe from the above two examples that the functioning of the __next__() method and next() function are almost similar. Also, both the next() function and the __next__() method raise an StopIteration error when all the elements of the iterator are already traversed. So, it is advised to use exception handling using python try except blocks.

Example use case of an iterator

Now let us take a closer look at the situation discussed above where we have to access the elements from a list only when the user gives an even number as input. 

Now, we have the next() function that can access the element from an iterator whenever we call the function. We can use it to access the elements from the list. For this, we will create an iterator for the list and then we will access the elements from the list using the next() function whenever the user gives an even number as input. We can implement a python function for this as follows.

myList = range(0, 101)
myIter = iter(myList)
while True:
    try:
        user_input = int(input("Input an even number to get an output, 0 to exit:"))
        if user_input == 0:
            print("Good Bye")
            break
        elif user_input % 2 == 0:
            print("Very good. Here is an output for you.")
            print(next(myIter))
        else:
            print("Input an even number.")
            continue
    except StopIteration as e:
        print("All the output has been exhausted.")
        

Output:

Input an even number to get an output, 0 to exit:1
Input an even number.
Input an even number to get an output, 0 to exit:2
Very good. Here is an output for you.
0
Input an even number to get an output, 0 to exit:4
Very good. Here is an output for you.
1
Input an even number to get an output, 0 to exit:3
Input an even number.
Input an even number to get an output, 0 to exit:6
Very good. Here is an output for you.
2
Input an even number to get an output, 0 to exit:0
Good Bye

Conclusion

In this article, we have discussed how we can access elements from different iterable objects using an iterator in Python. To learn more about python programming, you can read this article on list comprehension. You may also like this article on the linked list in Python.

The post Iterator in Python appeared first on PythonForBeginners.com.

October 20, 2021 01:41 PM UTC


PyCharm

PyCharm 2021.2.3 Is Out!

The third minor release of PyCharm 2021.2 contains multiple bug fixes:

Download PyCharm 2021.2.3

For the full list of issues addressed in PyCharm 2021.2.3, please see the release notes.
Found a bug? Please report it using our bug tracker.

October 20, 2021 01:36 PM UTC


Python Software Foundation

Announcing Python Software Foundation Fellow Members for Q3 2021! 🎉

The PSF is pleased to announced its third batch of PSF Fellows for 2021! Let us welcome the new PSF Fellows for Q3! The following people continue to do amazing things for the Python community:

Anthony Sottile

Twitch, YouTube, GitHub Sponsors, Twitter

Bernát Gábor

Twitter, Website, GitHub

Cristián Danilo Maureira-Fredes

Website, Twitter, GitHub

Michael Iyanda

LinkedIn, GitHub

Nicolás Demarchi

Twitter, GitHub, LinkedIn


Thank you for your continued contributions. We have added you to our Fellow roster online.

The above members help support the Python ecosystem by being phenomenal leaders, sustaining the growth of the Python scientific community, maintaining virtual Python communities, maintaining Python libraries, creating educational material, organizing Python events and conferences, starting Python communities in local regions, and overall being great mentors in our community. Each of them continues to help make Python more accessible around the world. To learn more about the new Fellow members, check out their links above.

Let's continue recognizing Pythonistas all over the world for their impact on our community. The criteria for Fellow members is available online: https://www.python.org/psf/fellows/. If you would like to nominate someone to be a PSF Fellow, please send a description of their Python accomplishments and their email address to psf-fellow at python.org. We are accepting nominations for quarter 4 through November 20, 2021.

Are you a PSF Fellow and want to help the Work Group review nominations? Contact us at psf-fellow at python.org.

October 20, 2021 11:26 AM UTC


Python Anywhere

Our October system update

On 6 October we upgraded our EU-based systems to the latest version of our platform, and today, 20 October, we did the same upgrade on our US-based system. There are a bunch of changes to report!

October 20, 2021 12:00 AM UTC

October 19, 2021


Ben Cook

NumPy Where: Understanding np.where()

The NumPy where() function is like a vectorized switch that you can use to combine two arrays. For example, let’s say you have an array with some data called df.revenue and you want to create a new array with 1 whenever an element in df.revenue is more than one standard deviation from the mean and -1 for all other elements.

This is a perfect use case for np.where(). First, create a boolean array for your conditional, and then call np.where():

import numpy as np
import pandas as pd

df = pd.read_csv("https://jbencook.s3.amazonaws.com/data/dummy-sales.csv")
condition = np.abs(df.revenue - df.revenue.mean()) > df.revenue.std()

np.where(condition, 1, -1)

# Expected result
# array([ 1, -1, -1,  1, -1, -1, -1, -1,  1,  1, -1,  1, -1, -1,  1,  1, -1,
#        -1, -1,  1, -1, -1, -1, -1,  1,  1,  1,  1,  1,  1])

The arguments to np.where() are:

The elements of condition don’t actually need to have a boolean type as long as they can be coerced to a boolean (e.g. non-zero integers are interpreted as True). Also, both x and y are optional, but if you provide one, you need to provide both. Additionally, the input arrays can have any shape so you can use this as a multi-dimensional switch.

One thing to watch out for: the return value takes a different form if you don’t supply x and y. In that case, np.where() returns the indices of the true elements (for a 1-D vector) and the indices for all axes where the elements are true for higher dimensional cases. This is equivalent to np.argwhere() except that the index arrays are split by axis.

You can see how this works by calling np.stack() on the result of np.where():

x = np.eye(4)
np.stack(np.where(x), -1) == np.argwhere(x)

# Expected result
# array([[ True,  True],
#        [ True,  True],
#        [ True,  True],
#        [ True,  True]])

This makes np.where() without the x and y inputs equivalent to calling the .nonzero() method on the condition array:

np.stack(x.nonzero(), -1) == np.argwhere(x)

# Expected result
# array([[ True,  True],
#        [ True,  True],
#        [ True,  True],
#        [ True,  True]])

Multi-dimensional binary cross entropy

Now that we know how the API works, let’s look at another example: multi-dimensional binary cross entropy. Say we have a 3-D array of binary class probabilities yhat and a 3-D array of binary labels y. The one-liner formula for binary cross-entropy is the following:

-(y * np.log(yhat) + (1 - y) * np.log(1 - yhat)).mean()

This does work in the multi-dimensional case because NumPy defaults to element-wise operations. The multiplication of y and 1 - y times the log terms function like switches. When y == 1 the first term is included and when y == 0 the second term is included:

np.random.seed(1)

yhat = np.random.uniform(size=(3, 3, 3))
y = np.random.randint(0, 2, size=(3, 3, 3))

-(y * np.log(yhat) + (1 - y) * np.log(1 - yhat)).mean()

# Expected result
# 1.221865004504288

But we can accomplish the same thing with np.where():

-np.where(y, np.log(yhat), np.log(1 - yhat)).mean()

# Expected result
# 1.221865004504288

Pretty cool! This is not necessarily a better implementation in any important way, but it does make the purpose of the y and 1 - y terms very clear.

The post NumPy Where: Understanding np.where() appeared first on Sparrow Computing.

October 19, 2021 09:49 PM UTC


PyCoder’s Weekly

Issue #495 (Oct. 19, 2021)

#495 – OCTOBER 19, 2021
View in Browser »

The PyCoder’s Weekly Logo


No-GIL Fork of CPython

This is a proof-of-concept implementation of CPython that supports multithreading without the global interpreter lock (GIL), from Facebook research. An overview of the design is described in the Python Multithreading without the GIL Google doc. Also see the related discussions on LWN and Hacker News.
GITHUB.COM/COLESBURY • Shared by Henry Schreiner

Why You Shouldn’t Invoke setup.py Directly

“The setuptools team no longer wants to be in the business of providing a command line interface and is actively working to become just a library for building packages. What you should do instead depends on your use case, but if you want some basic rules of thumb, there is a table in the summary section.”
PAUL GANSSLE

Data Elixir: Data Science Newsletter

alt

Data Elixir is an email newsletter that keeps you on top of the tools and trends in Data Science. Covers machine learning, data visualization, analytics, and strategy. Curated weekly with top picks from around the web →
DATA ELIXIR sponsor

Where Does All the Effort Go? Looking at Python Core Developer Activity

“One of the tasks given me by the Python Software Foundation as part of the Developer in Residence job was to look at the state of CPython as an active software development project. What are people working on? Which standard libraries require most work? Who are the active experts behind which libraries?”
ŁUKASZ LANGA

Cool New Features in Python 3.10

In this course, you’ll explore some of the coolest and most useful features in Python 3.10. You’ll appreciate more user-friendly error messages, learn about how you can handle complicated data structures with structural pattern matching, and explore new enhancements to Python’s type system.
REAL PYTHON course

PyCascades 2022 CFP Closes on Sunday (Oct 24)

PYCASCADES CONFERENCE

Announcing PSF Fellow Members for Q3 2021

PYTHON SOFTWARE FOUNDATION

Join the Python Developers Survey 2021

PYTHON SOFTWARE FOUNDATION

Psycopg 3.0 Released

PSYCOPG.ORG

PyPy 7.3.6 Released

PYPY.ORG

Python Jobs

Senior Python Engineer @ Moody's AI & ML Center of Excellence (New York, NY, USA)

Moody's

Senior Software Engineer (Washington D.C., DC, USA)

Quorum

Senior Backend Software Engineer (Anywhere)

Clay

Full Stack Developer (Anywhere)

Level 12

Software Engineer (Anywhere)

1Point21 Interactive

More Python Jobs >>>

Articles & Tutorials

Tests Aren’t Enough: Case Study After Adding Type Hints to urllib3

“Since Python 3.5 was released in 2015 including PEP 484 and the typing module type hints have grown from a nice-to-have to an expectation for popular packages. To fulfill this expectation our team has committed to shipping type hints for the v2.0 milestone. What we didn’t realize is the amount of value we’d derive from this project in terms of code correctness.”
SETH MICHAEL LARSON

Welcoming the CPython Developer in Residence

Earlier this year, the Python Software Foundation announced the creation of the Developer in Residence role. The first Visionary Sponsors of the PSF have provided funding for this new role for one year. What development responsibilities does this job address? This week on the show, Łukasz Langa talks about becoming the first CPython Developer in Residence.
REAL PYTHON podcast

Accelerate Your Python Apps With Apache Cassandra™ NoSQL. Register for an Astra DB Demo

alt

Scale data for your Django, Flask, FastAPI apps with our multi-cloud, serverless DBaaS–built on Apache Cassandra™. Painless APIs, free for developers. Get 80 Gigabytes of Storage Free Every Month. Explore Astra DB now →
DATASTAX sponsor

Python Assignment Expressions and Using the Walrus Operator

In this course, you’ll learn about assignment expressions and the walrus operator. The biggest change in Python 3.8 was the inclusion of the := operator, which you can use to assign variables in the middle of expressions. You’ll see several examples of how to take advantage of this new feature.
REAL PYTHON course

A Roadmap to XML Parsers in Python

In this tutorial, you’ll learn what XML parsers are available in Python and how to pick the right parsing model for your specific use case. You’ll explore Python’s built-in parsers as well as major third-party libraries.
REAL PYTHON

How APT Does Its Fancy Progress Bar

“Today while running an apt full-upgrade I asked myself how apt does this nice progress bar stuck at the bottom line while still writing scrolling text.” Python example code included.
JULIEN PALARD

Configuration Is an API, Not an SDK

Guidelines for config management in general and for Python apps in particular. Why “Configuration is just another API of your app” might be a good philosophy to adopt.
HERNAN LOZANO • Shared by Hernan Lozano

Python’s property(): Add Managed Attributes to Your Classes

In this step-by-step tutorial, you’ll learn how to create managed attributes, also known as properties, using Python’s property() in your custom classes.
REAL PYTHON

Receive a $5 Donation to the OSS of Your Choice When You Deploy Your Free Scout APM Trial Today

Scout is performance monitoring designed to provide the data insights necessary for any dev to become a performance pro. Find and fix observability issues before your customers notice by connecting your error reporting and APM data on one platform.
SCOUT APM sponsor

Pip vs Conda: A Comparison of Python’s Two Packaging Systems

“Python has two commonly used packaging systems, pip and Conda. Learn the differences between them so you can pick the right one for you.”
ITAMAR TURNER-TRAURING

Secure Password Handling in Python

Protect and secure your passwords and credentials in Python with help of these techniques and tips.
MARTIN HEINZ • Shared by Martin Heinz

Understanding np.where()

The NumPy where() function is like a vectorized switch that you can use to combine two arrays.
BEN COOK

Three Tools to Profile a Django App

KRACEKUMAR.COM

More Uses for functools.partial() in Django

ADAM JOHNSON

Projects & Code

kubernetes-client: Official Python Client Library for Kubernetes

GITHUB.COM/KUBERNETES-CLIENT

Lenia: Mathematical Life Forms Simulator

GITHUB.COM/CHAKAZUL

troposphere: Create AWS CloudFormation Descriptions

GITHUB.COM/CLOUDTOOLS

classyconf: Declarative and Extensible Library for Configuration & Code Separation

GITHUB.COM/HERNANTZ

Events

Weekly Real Python Office Hours Q&A (Virtual)

October 20, 2021
REALPYTHON.COM

Inland Empire Pyladies (CA, USA)

October 25, 2021
MEETUP.COM

Introduction to the Python Programming Language (In Persian)

October 26, 2021
INSTAGRAM.COM

Python Sheffield

October 26, 2021
GOOGLE.COM

Dominican Republic Python User Group

October 26 to October 27, 2021
PYTHON.DO

PyKla Monthly Meetup

October 27, 2021
MEETUP.COM

Python Meeting Düsseldorf

October 27, 2021
PYDDF.DE

Heidelberg Python Meetup

October 27, 2021
MEETUP.COM

PyData Global 2021

October 28 to October 31, 2021
PYDATA.ORG

deploy by DigitalOcean

November 16 to November 17, 2021
DIGITALOCEAN


Happy Pythoning!
This was PyCoder’s Weekly Issue #495.
View in Browser »

alt

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

October 19, 2021 07:30 PM UTC


Peter Bengtsson

How to string pad a string in Python with a variable

I just have to write this down because that's the rule; if I find myself googling something basic like this more than once, it's worth blogging about.

Suppose you have a string and you want to pad with empty spaces. You have 2 options:

>>> s = "peter"
>>> s.ljust(10)
'peter     '
>>> f"{s:<10}"
'peter     '

The f-string notation is often more convenient because it can be combined with other formatting directives.
But, suppose the number 10 isn't hardcoded like that. Suppose it's a variable:

>>> s = "peter"
>>> width = 11
>>> s.ljust(width)
'peter      '
>>> f"{s:<width}"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Invalid format specifier

Well, the way you need to do it with f-string formatting, when it's a variable like that is this syntax:

>>> f"{s:<{width}}"
'peter      '

October 19, 2021 04:44 PM UTC


PyCon

PyCon US 2022 Website and Sponsorship Program Launch!

With PyCon US 2022 planning underway, we are excited to be launching the conference website along with our sponsorship program.

Our team is planning to host the event in person with an online component. Head over to the PyCon US 2022 website for details about the conference and more information about our sponsorship program. You will not want to miss the opportunity to be part of this event! If your organization depends on the Python ecosystem, check out our prospectus online, and sign up today!
 
The PSF's comprehensive sponsorship program allows organizations to support community programs financially and delivers a variety of benefits that provide visibility across PyCon US, pypi.org, python.org, and much more: 

Sponsorships have a tremendous impact!

Most importantly, the Python Software Foundation operates off sponsorships it receives. The more funding the PSF has to work with, the more we can do to support Python and its community. In 2021 the PSF launched two new initiatives that will have a tremendous impact on all users:

  1. In July, the PSF hired the inaugural Developer-in-Residence (Łukasz Langa) to support CPython. A full-time, paid core developer supporting dozens of volunteers that make Python happen has been an enormous help. In addition to supporting volunteers, Łukasz is analyzing usage and workflows to better address sustainability and future funding.

  2. In August, the PSF hired Shamika Mohanan to support Python packaging. Shamika will handle outreach to Python users so the PSF can better understand the landscape, identify fundable initiatives, seek grants, oversee funded projects, and report on their progress and results to improve Python packaging for all users.

The above would not have been possible without the generous support of our sponsors over the years. Please help us ensure these initiatives have funding for years to come.

Sponsor the PSF today!

If you have any questions about sponsoring PyCon US and the PSF, please get in touch with us at sponsors@python.org

As we get closer to the event, the conference website is where you’ll find details for our call for proposals, registration process, venue information, and everything PyCon US related!

October 19, 2021 03:27 PM UTC


Quansight Labs Blog

An efficient method of calling C++ functions from numba using clang++/ctypes/rbc

The aim of this document is to explore a method of calling C++ library functions from Numba compiled functions --- Python functions that are decorated with numba.jit(nopython=True).

While there exist ways to wrap C++ codes to Python, calling these wrappers from Numba compiled functions is often not as straightforward and efficient as one would hope.

Read more… (5 min remaining to read)

October 19, 2021 02:00 PM UTC


Real Python

Python Assignment Expressions and Using the Walrus Operator

Each new version of Python adds new features to the language. For Python 3.8, the biggest change is the addition of assignment expressions. Specifically, the := operator gives you a new syntax for assigning variables in the middle of expressions. This operator is colloquially known as the walrus operator.

This course is an in-depth introduction to the walrus operator. You’ll learn some of the motivations for the syntax update and explore some examples where assignment expressions can be useful.

In this course, you’ll learn how to:


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

October 19, 2021 02:00 PM UTC


Python for Beginners

Create Generator from a List in Python

Generators in Python are a very useful tool for accessing elements from a container object. In this article, we will discuss how we can create a generator from a list and why we need to do this. Here we will use two approaches for creating the generator from a list. First using the generator functions and the second using generator comprehension. 

Convert a list to generator using generator function 

Generator functions are those functions that have yield statements instead of return statements. Generator functions have a specialty that they pause their execution once the yield statement is executed. To resume the execution of the generator function, we just need to use the next() function with the generator to which the generator function has been assigned as the input argument.

For example, suppose that we want the squares of the elements of a given list. One way to obtain the squares of the elements may be the use of list comprehension or a function to create a new list having squares of elements of the existing list as follows.

def square(input_list):
    square_list = []
    for i in input_list:
        square_list.append(i ** 2)
    return square_list


myList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print("The given list is:", myList)
squares = square(myList)
print("Elements obtained from the square function are:")
for ele in squares:
    print(ele)

Output:

The given list is: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Elements obtained from the square function are:
1
4
9
16
25
36
49
64
81
100

We can also create a generator instead of a new list to obtain squares of the elements of the existing list using generator function.

To create a generator from a list using the generator function , we will define a generator function that takes a list as input. Inside the function, we will use a for loop in which the yield statement will be used to give the squares of the elements of the existing list as output. We can perform this operation as follows.

def square(input_list):
    square_list = []
    for element in input_list:
        yield element ** 2


myList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print("The given list is:", myList)
squares = square(myList)
print("Elements obtained from the square generator are:")
for ele in squares:
    print(ele)

Output:

The given list is: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Elements obtained from the square generator are:
1
4
9
16
25
36
49
64
81
100

Convert a list to generator using generator comprehension

Instead of using a generator function, we can use generator comprehension to create a generator from a list.The syntax for generator comprehension is almost identical to list comprehension.

The syntax for set comprehension is: generator= (expression for element in iterable if condition)

You can create a generator from a list using the generator comprehension as follows. Here, we have used the implemented the same example given in the previous section.

myList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print("The given list is:", myList)
mygen = (element ** 2 for element in myList)
print("Elements obtained from the generator are:")
for ele in mygen:
    print(ele)

Output:

The given list is: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Elements obtained from the generator are:
1
4
9
16
25
36
49
64
81
100

Why create a generator from a list?

Generators can be used in place of a list for two reasons. Let us understand both of them one by one.

  1. When we create a new list from an existing list, the program uses memory for storing the elements of the existing list. On the other hand, a generator uses a minimal amount of memory that is almost similar to the memory required by a function. Hence, Using generators in place of a list can be more efficient if we just have to access the elements from the newly created list. 
  2. With generators, we can access the next element of the list at random occasions without explicitly using any counter. For this, we can use the next() method to extract the next element from the generator.

For example, Consider a situation in which we have a list of 100 elements. We continually ask the user to input a number and whenever they enter an even number, we print the next element of the list.

Here, the user input does not have any pattern. Due to this, we cannot access the elements of the list using a for loop. Instead, we will use a generator to print the next element from the list using the next() function as shown in the following example.

myList = range(0, 101)
myGen = (element ** 2 for element in myList)
while True:
    user_input = int(input("Input an even number to get an output, 0 to exit:"))
    if user_input == 0:
        print("Good Bye")
        break
    elif user_input % 2 == 0:
        print("Very good. Here is an output for you.")
        print(next(myGen))
    else:
        print("Input an even number.")
        continue

Output:

Input an even number to get an output, 0 to exit:23
Input an even number.
Input an even number to get an output, 0 to exit:123
Input an even number.
Input an even number to get an output, 0 to exit:12
Very good. Here is an output for you.
0
Input an even number to get an output, 0 to exit:34
Very good. Here is an output for you.
1
Input an even number to get an output, 0 to exit:35
Input an even number.
Input an even number to get an output, 0 to exit:0
Good Bye

Conclusion

In this article, we have discussed two ways to create a generator from a list. To learn more about python programming, you can read this article on list comprehension. You may also like this article on the linked list in Python.

The post Create Generator from a List in Python appeared first on PythonForBeginners.com.

October 19, 2021 01:48 PM UTC


Wingware

Wing Python IDE Version 8.1 - October 19, 2021

Wing 8.1 adds Delete Symbol and Rename Current Module refactoring operations, improves some aspects of type analysis, fixes occasional failure to detect Python Path, fixes starting the remote agent in some cases, correct reports exception from Django templates, supports pip 21.3+ in the Packages tool, and makes several other improvements.

See the change log for details.

Download Wing 8 Now: Wing Pro | Wing Personal | Wing 101 | Compare Products


What's New in Wing 8.1


Wing 8 Screen Shot

Support for Containers and Clusters

Wing 8 adds support for developing, testing, and debugging Python code that runs inside containers, such as those provided by Docker and LXC/LXD, and clusters of containers managed by a container orchestration system like Docker Compose. A new Containers tool can be used to start, stop, and monitor container services, and new Docker container environments may be created during project creation.

For details, see Working with Containers and Clusters.

New Package Management Tool

Wing 8 adds a new Packages tool that provides the ability to install, remove, and update packages found in the Python environment used by your project. This supports pipenv, pip, and conda as the underlying package manager. Packages may be selected manually from PyPI or by package specifications found in a requirements.txt or Pipfile.

For details, see Package Manager .

Improved Project Creation

Wing 8 redesigns New Project support so that the host, project directory, Python environment, and project type may all be selected independently. New projects may use either an existing or newly created source directory, optionally cloning code from a revision control repository. An existing or newly created Python environment may be selected, using virtualenv, pipenv, conda, or Docker.

Improved Python Code Analysis and Warnings

Wing 8 expands the capabilities of Wing's static analysis engine, by improving its support for f-strings, named tuples, and other language constructs. Find Uses, Refactoring, and auto-completion now work within f-string expressions, Wing's built-in code warnings work with named tuples, the Source Assistant displays more detailed and complete value type information, and code warning indicators are updated more cleanly during edits.

And More

Wing 8 also adds support for Python 3.10, native executable for Apple Silicon (M1) hardware, a new Nord style display theme, reduced application startup time, Delete Symbol and Rename Current Module refactoring operations, and much more.

For a complete list of new features in Wing 8, see What's New in Wing 8.


Try Wing 8 Now!


Wing 8 is an exciting new step for Wingware's Python IDE product line. Find out how Wing 8 can turbocharge your Python development by trying it today.

Downloads: Wing Pro | Wing Personal | Wing 101 | Compare Products

See Upgrading for details on upgrading from Wing 7 and earlier, and Migrating from Older Versions for a list of compatibility notes.

October 19, 2021 01:00 AM UTC

October 18, 2021


Ben Cook

PyTorch DataLoader Quick Start

PyTorch comes with powerful data loading capabilities out of the box. But with great power comes great responsibility and that makes data loading in PyTorch a fairly advanced topic.

One of the best ways to learn advanced topics is to start with the happy path. Then add complexity when you find out you need it. Let’s run through a quick start example.

What is a PyTorch DataLoader?

The PyTorch DataLoader class gives you an iterable over a Dataset. It’s useful because it can parallelize data loading and automatically shuffle and batch individual samples, all out of the box. This sets you up for a very simple training loop.

PyTorch Dataset

But to create a DataLoader, you have to start with a Dataset, the class responsible for actually reading samples into memory. When you’re implementing a DataLoader, the Dataset is where almost all of the interesting logic will go.

There are two styles of Dataset class, map-style and iterable-style. Map-style Datasets are more common and more straightforward so we’ll focus on them but you can read more about iterable-style datasets in the docs.

To create a map-style Dataset class, you need to implement two methods: __getitem__() and __len__(). The __len__() method returns the total number of samples in the dataset and the __getitem__() method takes an index and returns the sample at that index.

PyTorch Dataset objects are very flexible — they can return any kind of tensor(s) you want. But supervised training datasets should usually return an input tensor and a label. For illustration purposes, let’s create a dataset where the input tensor is a 3×3 matrix with the index along the diagonal. The label will be the index.

It should look like this:

dataset[3]

# Expected result
# {'x': array([[3., 0., 0.],
#         [0., 3., 0.],
#         [0., 0., 3.]]),
#  'y': 3}

Remember, all we have to implement are __getitem__() and __len__():

from typing import Dict, Union

import numpy as np
import torch

class ToyDataset(torch.utils.data.Dataset):
    def __init__(self, size: int):
        self.size = size

    def __len__(self) -> int:
        return self.size

    def __getitem__(self, index: int) -> Dict[str, Union[int, np.ndarray]]:
        return dict(
            x=np.eye(3) * index,
            y=index,
        )

Very simple. We can instantiate the class and start accessing individual samples:

dataset = ToyDataset(10)
dataset[3]

# Expected result
# {'x': array([[3., 0., 0.],
#         [0., 3., 0.],
#         [0., 0., 3.]]),
#  'y': 3}

If you happen to be working with image data, __getitem__() may be a good place to put your TorchVision transforms.

At this point, a sample is a dict with "x" as a matrix with shape (3, 3) and "y" as a Python integer. But what we want are batches of data. "x" should be a PyTorch tensor with shape (batch_size, 3, 3) and "y" should be a tensor with shape batch_size. This is where DataLoader comes back in.

PyTorch DataLoader

To iterate through batches of samples, pass your Dataset object to a DataLoader:

torch.manual_seed(1234)

loader = torch.utils.data.DataLoader(
    dataset,
    batch_size=3,
    shuffle=True,
    num_workers=2,
)
for batch in loader:
    print(batch["x"].shape, batch["y"])

# Expected result
# torch.Size([3, 3, 3]) tensor([2, 1, 3])
# torch.Size([3, 3, 3]) tensor([6, 7, 9])
# torch.Size([3, 3, 3]) tensor([5, 4, 8])
# torch.Size([1, 3, 3]) tensor([0])

Notice a few things that are happening here:

There’s one other thing that I’m not doing in this sample but you should be aware of. If you need to use your tensors on a GPU (and you probably are for non-trivial PyTorch problems), then you should set pin_memory=True in the DataLoader. This will speed things up by letting the DataLoader allocate space in page-locked memory. You can read more about it here.

Summary

To review: the interesting part of custom PyTorch data loaders is the Dataset class you implement. From there, you get lots of nice features to simplify your data loop. If you need something more advanced, like custom batching logic, check out the API docs. Happy training!

The post PyTorch DataLoader Quick Start appeared first on Sparrow Computing.

October 18, 2021 07:10 PM UTC


Łukasz Langa

Weekly Report, October 11 - 17

Very few merged PRs this week as I focused on pushing the report out. And it’s out 😅

October 18, 2021 05:34 PM UTC

Where does all the effort go? Looking at Python core developer activity

One of the tasks given me by the Python Software Foundation as part of the Developer in Residence job was to look at the state of CPython as an active software development project. What are people working on? Which standard libraries require most work? Who are the active experts behind which libraries? Those were just some of the questions asked by the Foundation. In this post I’m looking into our Git repository history and our Github PR data to find answers.

October 18, 2021 05:33 PM UTC


PyCharm

Webinar: “Smarter FastAPI Through Tooling” with Sebastián Ramírez

FastAPI has quickly become super-popular. One of the reasons: embracing standards which help tooling, which then boost productivity. This is something FastAPI’s author, Sebastián Ramírez, really cares about.

But what does this mean in practice? In this webinar we’ll put this in action, with live demos showing a big productivity boost when developing FastAPI applications using PyCharm Professional’s set of integrated tools.

Textual

REGISTER

Speaking to You

Sebastián Ramírez (@tiangolo) is the creator of FastAPI, Typer, SQLModel, and other open source tools.

He is currently a Staff Software Engineer at Forethought while also helping other companies as an external consultant.

October 18, 2021 05:06 PM UTC


Python Morsels

Creating and writing to a file in Python

Transcript

Let's write to a text file.

Files can be read from (but not written to) by default

Here we're using the open function on a text file called my_file.txt (using a with block to automatically close the file when we're done working with it) and we're calling the write method on the file object we get back to write text to that file:

>>> with open("my_file.txt") as f:
...     f.write("This is text!")
...     f.write("And some more text")
...

When we run this code, we'll see an error:

>>> with open("my_file.txt") as f:
...     f.write("This is text!")
...     f.write("And some more text")
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'my_file.txt'

We get an error because Python's open function accepts more than just a filename:

>>> help(open)

Help on built-in function open in module io:

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd
=True, opener=None)

The open function also accepts a mode, and by default that mode is r (for read mode). In order to write text to this file, we need to specify write mode.

To write to a file we need to open our file in write mode

Here we can see the modes accepted by open function:

    ========= ===============================================================
    Character Meaning
    --------- ---------------------------------------------------------------
    'r'       open for reading (default)
    'w'       open for writing, truncating the file first
    'x'       create a new file and open it for writing
    'a'       open for writing, appending to the end of the file if it exists
    'b'       binary mode
    't'       text mode (default)
    '+'       open a disk file for updating (reading and writing)
    'U'       universal newline mode (deprecated)
    ========= ===============================================================

The default mode is r, in fact more explicitly it's rt, for read text mode. We need to specify the mode as w or (even more explicitly) wt for write text mode (we want text mode as opposed to binary mode).

We're going to specify a mode (wt) as we open up our file:

>>> with open("my_file.txt", mode="wt") as f:
...     f.write("This is text!")
...     f.write("And some more text")

Notice that we're not passing our mode in as positional argument, even though we could:

>>> with open("my_file.txt", "wt") as f:

We're passing mode as a named argument to be explicit.

When we open this file now and call its write method, we'll get back the number of characters that were written to this file:

>>> with open("my_file.txt", mode="wt") as f:
...     f.write("This is text!")
...     f.write("And some more text")
...
13
18

Newline characters aren't automatically added to the end of each line

If we take a look at the file contents of my_file.txt now, we'll see that our text was written to the file:

This is text!And some more text

But it wasn't written exactly how we wanted! We wanted to write two separate lines to this file, but instead, Python wrote just one line.

Python wrote just one line because it wrote exactly the text we gave to it, and we didn't give it any newline characters (\n) to write.

To write two separate lines to this file, we should end each of our lines in a newline character:

>>> with open("my_file.txt", mode="wt") as f:
...     f.write("This is text!\n")
...     f.write("And some more text\n")

Now as expected, we have two separate lines in this file:

This is text!
And some more text

Python doesn't write until the file is closed (or flushed)

Let's open a file without using a with block:

>>> f = open("my_file.txt", mode="wt")

And then call the write method on the our file object to write some text to this file:

>>> f.write("some text")
9

Has our file been written to at this point? What's your guess? 🤔

The answer is, probably not! Our file is empty right now:


Python doesn't write to a file until the file is flushed or closed:

>>> f.flush()
>>> f.close()

The best way to make sure that everything will be written to your file as soon as you're done working to it is to use a with block to use your file as a context manager. This will make sure your file is closed automatically as soon as you're done working with it.

Summary

To write to a text file in Python, you can use the built-in open function, specifying a mode of w or wt. You can then use the write method on the file object you get back to write to that file.

It's best to use a with block when you're opening a file to write to it.

October 18, 2021 03:00 PM UTC


Python Software Foundation

Join the Python Developers Survey 2021: Share and learn about the community

This year we are conducting the fifth iteration of the official Python Developers Survey. The goal is to capture the current state of the language and the ecosystem around it. By comparing the results with last year's, we can identify and share with everyone the hottest trends in the Python community and the key insights into them.

In 2020, more than 28,000 Python users from 150 countries participated and shared with us how they use the language.

We encourage you to contribute to our community's knowledge. The survey should only take you about 10-15 minutes to complete.

Contribute to the Python Developers Survey 2021.

This year we have added questions that will help the CPython Developer-in-Residence and the Python Packaging Project Manager prioritize their work based on community feedback.

The survey is organized in partnership between the Python Software Foundation and JetBrains. After the survey is over, we will publish the aggregated results and randomly choose 20 winners (among those who complete the survey in its entirety), who will each receive a $100 Amazon Gift Card or a local equivalent.

Click on this link to participate in the Python Developers Survey 2021!

October 18, 2021 02:16 PM UTC


Real Python

A Roadmap to XML Parsers in Python

If you’ve ever tried to parse an XML document in Python before, then you know how surprisingly difficult such a task can be. On the one hand, the Zen of Python promises only one obvious way to achieve your goal. At the same time, the standard library follows the batteries included motto by letting you choose from not one but several XML parsers. Luckily, the Python community solved this surplus problem by creating even more XML parsing libraries.

Jokes aside, all XML parsers have their place in a world full of smaller or bigger challenges. It’s worthwhile to familiarize yourself with the available tools.

In this tutorial, you’ll learn how to:

  • Choose the right XML parsing model
  • Use the XML parsers in the standard library
  • Use major XML parsing libraries
  • Parse XML documents declaratively using data binding
  • Use safe XML parsers to eliminate security vulnerabilities

You can use this tutorial as a roadmap to guide you through the confusing world of XML parsers in Python. By the end of it, you’ll be able to pick the right XML parser for a given problem. To get the most out of this tutorial, you should already be familiar with XML and its building blocks, as well as how to work with files in Python.

Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you’ll need to take your Python skills to the next level.

Choose the Right XML Parsing Model

It turns out that you can process XML documents using a few language-agnostic strategies. Each demonstrates different memory and speed trade-offs, which can partially justify the wide range of XML parsers available in Python. In the following section, you’ll find out their differences and strengths.

Document Object Model (DOM)

Historically, the first and the most widespread model for parsing XML has been the DOM, or the Document Object Model, originally defined by the World Wide Web Consortium (W3C). You might have already heard about the DOM because web browsers expose a DOM interface through JavaScript to let you manipulate the HTML code of your websites. Both XML and HTML belong to the same family of markup languages, which makes parsing XML with the DOM possible.

The DOM is arguably the most straightforward and versatile model to use. It defines a handful of standard operations for traversing and modifying document elements arranged in a hierarchy of objects. An abstract representation of the entire document tree is stored in memory, giving you random access to the individual elements.

While the DOM tree allows for fast and omnidirectional navigation, building its abstract representation in the first place can be time-consuming. Moreover, the XML gets parsed at once, as a whole, so it has to be reasonably small to fit the available memory. This renders the DOM suitable only for moderately large configuration files rather than multi-gigabyte XML databases.

Use a DOM parser when convenience is more important than processing time and when memory is not an issue. Some typical use cases are when you need to parse a relatively small document or when you only need to do the parsing infrequently.

Simple API for XML (SAX)

To address the shortcomings of the DOM, the Java community came up with a library through a collaborative effort, which then became an alternative model for parsing XML in other languages. There was no formal specification, only organic discussions on a mailing list. The end result was an event-based streaming API that operates sequentially on individual elements rather than the whole tree.

Elements are processed from top to bottom in the same order they appear in the document. The parser triggers user-defined callbacks to handle specific XML nodes as it finds them in the document. This approach is known as “push” parsing because elements are pushed to your functions by the parser.

SAX also lets you discard elements if you’re not interested in them. This means it has a much lower memory footprint than DOM and can deal with arbitrarily large files, which is great for single-pass processing such as indexing, conversion to other formats, and so on.

However, finding or modifying random tree nodes is cumbersome because it usually requires multiple passes on the document and tracking the visited nodes. SAX is also inconvenient for handling deeply nested elements. Finally, the SAX model just allows for read-only parsing.

In short, SAX is cheap in terms of space and time but more difficult to use than DOM in most cases. It works well for parsing very large documents or parsing incoming XML data in real time.

Streaming API for XML (StAX)

Although somewhat less popular in Python, this third approach to parsing XML builds on top of SAX. It extends the idea of streaming but uses a “pull” parsing model instead, which gives you more control. You can think of StAX as an iterator advancing a cursor object through an XML document, where custom handlers call the parser on demand and not the other way around.

Note: It’s possible to combine more than one XML parsing model. For example, you can use SAX or StAX to quickly find an interesting piece of data in the document and then build a DOM representation of only that particular branch in memory.

Using StAX gives you more control over the parsing process and allows for more convenient state management. The events in the stream are only consumed when requested, enabling lazy evaluation. Other than that, its performance should be on par with SAX, depending on the parser implementation.

Learn About XML Parsers in Python’s Standard Library

In this section, you’ll take a look at Python’s built-in XML parsers, which are available to you in nearly every Python distribution. You’re going to compare those parsers against a sample Scalable Vector Graphics (SVG) image, which is an XML-based format. By processing the same document with different parsers, you’ll be able to choose the one that suits you best.

Read the full article at https://realpython.com/python-xml-parser/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

October 18, 2021 02:00 PM UTC


Codementor

How Python Became the #1 Programming Language and How You Can Make the Best of It

Python is now more popular than Java, C, and other programming languages. Here are my tips on how to make the best of Python.

October 18, 2021 11:16 AM UTC


Inspired Python

Python Pattern Matching Examples: ETL and Dataclasses

Python Pattern Matching Examples: ETL and Dataclasses

In I walked you through the theory of Structural Pattern Matching, so now it’s time to apply that knowledge and build something practical.

Let’s say you need to process data from one system (a JSON-based REST API) into another (a CSV file for use in Excel). A common task. Extracting, Transforming, and Loading (ETL) data is one of the things Python does especially well, and with pattern matching you can simplify and organize your business logic in such a way that it remains maintainable and understandable.

Let’s get some test data. For this you’ll need the requests library.

>>> resp = requests.get('https://demo.inspiredpython.com/invoices/')
>>> assert resp.ok
>>> data = resp.json()
>>> data[0]
{'recipient': {'company': 'Trommler',
               'address': 'Annette-Döring-Allee 5\n01231 Grafenau',
               'country_code': 'DE'},
 'invoice_id': 15134,
 'currency': 'JPY',
 'amount': 945.57,
 'sku': 'PROPANE-ACCESSORIES'}


Read More ->

October 18, 2021 10:15 AM UTC


Mike Driscoll

PyDev of the Week: Talley Lambert

This week we chatted with Talley Lambert (@TalleyJLambert) who is a microscopist and Python enthusiast at Harvard Medical School. You can learn more about what Talley is up to and see his publications on his website.

Let's spend some time getting to know Talley better!

Can you tell us a little about yourself (hobbies, education, etc):

I'm a neurobiologist by training – I studied learning and memory and synaptic plasticity at the University of Washington – but sometime during my postdoc I realized I enjoyed learning about and optimizing the tools and techniques I was using (specifically: microscopy) more than the biological questions I was addressing. So I pursued that line of work. I'm currently a Lecturer and microscopist at Harvard Medical School. I work in a "core facility" where we build, maintain, and optimize microscopes, provide training and experimental design advice to local researchers, and help with challenges in image processing and analysis.

Currently, if someone were to ask me what my hobbies were, I would probably say "coding"! 🙂 It's what I prefer to be doing if I'm not obligated to be doing something else. In the past, I'd say cooking, hiking, music... but always: generally learning something.

Why did you start using Python?

I started dabbling in python probably around 15 years ago, during grad school. I've always been interested in computer programming (though, I have no formal training), and wanted to start automating some of my data processing – just some light scripting. I didn't really start using python in earnest until maybe 6 years ago. The main application at that point was to create tools and user interfaces for the users of our facility to simplify some aspect of their imaging acquisition or data analysis pipelines.

What other programming languages do you know and which is your favorite?

I used MATLAB in grad school, can "read" C/C++ enough to create some python extensions/wrappers, and have some experience with JavaScript. The JavaScript was mostly learned by necessity to build a front-end for a django-based website I created (fpbase.org - which is a database for "fluorescent proteins": a commonly used molecular tool in microscopy).

Python is easily my favorite language.

What projects are you working on now?

The project around which all my other projects "orbit" is napari (napari.org), for which I am a core developer. Napari is an n-dimensional data viewer, built with large (possibly out-of-core) datasets in mind. It attempts to provide fast rendering for all of the various data types that one might encounter in imaging (n-dimensional images obviously, but also points, shapes, surfaces, vectors, etc...) with a friendly graphical user interface and a python API for accessing most of the internals. It's also important to us that napari integrate nicely with the existing scientific python stack.

Other projects that have emerged from this (excluding napari plugins very domain-specific projects) are:

How did the psygnal package come about?

psygnal (which is a pure python implementation of Qt's signals & slots pattern) also arose from a desire to make it easier for developers to work "around" Qt-dependent packages like napari and magicgui, while also being able to create "pure python" objects that can also work in the absence of Qt. Psygnal implements a simple callback-connection system that can in theory be used with any event loop, but it does so using the same API that Qt uses: `psygnal.Signal` aims to be a swappable replacement for `PyQt5.QtCore.pyqtSignal` or `PySide2.QtCore.Signal` (though of course, for the Qt versions, your class needs to inherit from `QObject` to work).

It's a subtle distinction perhaps 🙂 but we're generally interested in making "pure python" objects that can easily (but optionally) integrate into a Qt application, without requiring the end user to learn an entirely new API.
Which Python libraries are your favorite (core or 3rd party)?
So many!

The standard library is obviously one of the best parts of python, I particularly like functools, itertools, contextlib, pathlib, inspect, and typing.

for third party: numpy and scipy go without saying, and scikit-image is indispensable for a lot of the imaging work I do. I love dask, since it makes working with out-of-core data almost trivial if you already know the numpy or pandas APIs. pydantic is fantastic, and I find that objects I build using pydantic tend to be better-conceived and stabler in the longer run.

on the dev side: pretty much every repo I have uses black, flake8, isort, mypy, pre-commit.

If you couldn't use Python for your next project, which programming language would you choose and why?

Wait, why can't I use Python!?? 🙂

For data-heavy stuff, I'm curious to learn more about Julia. But if I were to invest more time into a language besides python, it would probably be JavaScript. It doesn't exactly "spark joy" for me the way that python does (though TypeScript is appealing!), but I do enjoy building visual tools and interfaces (ideally browser-based) and the ubiquity of JavaScript on the web is a strong draw.

Thanks for doing the interview, Talley!

The post PyDev of the Week: Talley Lambert appeared first on Mouse Vs Python.

October 18, 2021 05:05 AM UTC

October 17, 2021


Talk Python to Me

#338: Using cibuildwheel to manage the scikit-HEP packages

How do you build and maintain a complex suite of Python packages? Of course, you want to put them on PyPI. The best format there is as a wheel. This means that when developers use your code, it comes straight down and requires no local tooling to install and use. <br/> <br/> But if you have compiled dependencies, such as C or FORTRAN, then you have a big challenge. How do you automatically compile and test against Linux, macOS (Intel and Apple Silicon), Windows, and so on? That's the problem cibuildwheel is solving. <br/> <br/> On this episode, you'll meet Henry Schreiner. He is developing tools for the next era of the Large Hadron Collider (LHC) and is an admin of Scikit-HEP. Of course, cibuildwheel is central to this process.<br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>Henry on Twitter</b>: <a href="https://twitter.com/HenrySchreiner3" target="_blank" rel="noopener">@HenrySchreiner3</a><br/> <b>Henry's website</b>: <a href="https://iscinumpy.gitlab.io" target="_blank" rel="noopener">iscinumpy.gitlab.io</a><br/> <br/> <b>Large Hadron Collider (LHC)</b>: <a href="https://home.cern/science/accelerators/large-hadron-collider" target="_blank" rel="noopener">home.cern</a><br/> <b>cibuildwheel</b>: <a href="https://github.com/pypa/cibuildwheel" target="_blank" rel="noopener">github.com</a><br/> <b>plumbum package</b>: <a href="https://plumbum.readthedocs.io/en/latest/" target="_blank" rel="noopener">plumbum.readthedocs.io</a><br/> <b>boost-histogram</b>: <a href="https://github.com/scikit-hep/boost-histogram" target="_blank" rel="noopener">github.com</a><br/> <b>vector</b>: <a href="https://github.com/scikit-hep/vector" target="_blank" rel="noopener">github.com</a><br/> <b>hepunits</b>: <a href="https://github.com/scikit-hep/hepunits" target="_blank" rel="noopener">github.com</a><br/> <b>awkward arrays</b>: <a href="https://github.com/scikit-hep/awkward-1.0" target="_blank" rel="noopener">github.com</a><br/> <b>Numba</b>: <a href="https://numba.pydata.org/" target="_blank" rel="noopener">numba.pydata.org</a><br/> <b>uproot4</b>: <a href="https://github.com/scikit-hep/uproot4" target="_blank" rel="noopener">github.com</a><br/> <b>scikit-hep developer</b>: <a href="https://scikit-hep.org/developer" target="_blank" rel="noopener">scikit-hep.org</a><br/> <b>pypa</b>: <a href="https://www.pypa.io/en/latest/" target="_blank" rel="noopener">pypa.io</a><br/> <b>CLI11</b>: <a href="https://github.com/CLIUtils/CLI11" target="_blank" rel="noopener">github.com</a><br/> <b>pybind11</b>: <a href="https://github.com/pybind/pybind11" target="_blank" rel="noopener">github.com</a><br/> <b>cling</b>: <a href="https://root.cern/cling/" target="_blank" rel="noopener">root.cern</a><br/> <b>Pint</b>: <a href="https://pint.readthedocs.io/en/stable/" target="_blank" rel="noopener">pint.readthedocs.io</a><br/> <b>Python Wheels site</b>: <a href="https://pythonwheels.com/" target="_blank" rel="noopener">pythonwheels.com</a><br/> <b>Build package</b>: <a href="https://pypa-build.readthedocs.io/en/latest/" target="_blank" rel="noopener">pypa-build.readthedocs.io</a><br/> <b>Mac Mini Colo</b>: <a href="https://macminicolo.net/" target="_blank" rel="noopener">macminicolo.net</a><br/> <b>scikit-build</b>: <a href="https://github.com/scikit-build/scikit-build" target="_blank" rel="noopener">github.com</a><br/> <b>plotext</b>: <a href="https://pypi.org/project/plotext/" target="_blank" rel="noopener">pypi.org</a><br/> <b>Code Combat</b>: <a href="https://codecombat.com/" target="_blank" rel="noopener">codecombat.com</a><br/> <b>clang format wheel</b>: <a href="https://github.com/ssciwr/clang-format-wheel/" target="_blank" rel="noopener">github.com</a><br/> <b>cibuildwheel examples</b>: <a href="https://cibuildwheel.readthedocs.io/en/latest/working-examples/" target="_blank" rel="noopener">cibuildwheel.readthedocs.io</a><br/> <b>Cling in LLVM</b>: <a href="https://root.cern/blog/cling-in-llvm/" target="_blank" rel="noopener">root.cern</a><br/> <br/> <b>New htmx course</b>: <a href="https://talkpython.fm/htmx" target="_blank" rel="noopener">talkpython.fm/htmx</a><br/> <b>Watch this episode on YouTube</b>: <a href="https://www.youtube.com/watch?v=8MfmY0IaeT4" target="_blank" rel="noopener">youtube.com</a><br/> <b>Episode transcripts</b>: <a href="https://talkpython.fm/episodes/transcript/338/using-cibuildwheel-to-manage-the-scikit-hep-packages" target="_blank" rel="noopener">talkpython.fm</a><br/> <br/> <b>---------- Stay in touch with us ----------</b><br/> <b>Subscribe on YouTube (for live streams)</b>: <a href="https://talkpython.fm/youtube" target="_blank" rel="noopener">youtube.com</a><br/> <b>Follow Talk Python on Twitter</b>: <a href="https://twitter.com/talkpython" target="_blank" rel="noopener">@talkpython</a><br/> <b>Follow Michael on Twitter</b>: <a href="https://twitter.com/mkennedy" target="_blank" rel="noopener">@mkennedy</a><br/></div><br/> <strong>Sponsors</strong><br/> <a href='https://talkpython.fm/training'>Talk Python Training</a><br> <a href='https://talkpython.fm/assemblyai'>AssemblyAI</a>

October 17, 2021 08:00 AM UTC