How Python can help you with your day job
3 ways non-data scientists can use Python to elevate their work
This is Autonomous Econ, a weekly newsletter providing useful tips on how to apply Python and data science tools for absolute beginners working in Economics, policy, or research. Upcoming posts will roughly be split between practical guides (80 percent) and data journalism pieces where I demonstrate the tools (20 percent).
If you find the content valuable, please consider becoming a free subscriber by clicking on the button below.
Python is widely recognized as the industry standard in AI and machine learning, notably with packages like scikit-learn. It's also a powerhouse for bread-and-butter statistical analysis, thanks to tools like Statsmodels. In this post, I will share three lesser-known ways Python can enhance the toolkit of people who work with data (e.g., analysts and researchers) but may not have a traditional programming background.
Extract and process data smartly
Firstly, Python excels in extracting, cleaning, and transforming data, especially text. It can scrape data from websites and pull numbers from those tricky PDFs. Python's access to APIs (Application Programming Interfaces) allows easy retrieval of data from various databases. This not only saves time compared to manual downloads and cumbersome processing, but also minimizes human error—a common issue in Excel-based tasks.
A simple demonstration of this is using Langchain’s WebBaseLoader, which can effortlessly compile a structured CSV from various websites. The instructions for what to extract can be written in plain English, since the package uses a large language model like ChatGPT in the background. I used it to create a custom dataset of recipes, including their ingredients and nutritional information (see the demo code for it on my GitHub).
Create web-apps to enhance your data vizualisations
Secondly, Python enables quick creation of web-based dashboards, eliminating the need for other front-end languages. These custom dashboards, ranging from simple Plotly visualizations (see my previous post on Plotly) to more sophisticated Streamlit or Dash apps, can be shared publicly. This feature, often a premium in other products, allows for dynamic exploration of complex datasets and effective presentation of research. I will cover how to create these dashboards for your data journalism needs in coming posts.
Here is a great example of a Streamlit dashboard exploring inter-state migration in the US from Marshall Krassenstein:
Automate your workflows
Lastly, one of the most powerful features of Python is automation. The workflows I've mentioned – from prediction models to dashboard updates – can be automated, triggered by schedule or new data arrivals, using tools like GitHub Actions. This is a feature that The Economist uses for trackers like their covid-19 excess deaths model.
Back when I worked in economic forecasting, half of the time was spent monitoring data releases in my sector and then updating whichever short-term indicator/prediction model or chart pack I had. In hindsight, a significant part of this could have been automated in the background. This automation would have been invaluable, especially considering the time pressure of publishing analysis and commentary on new data points.
Yes, anyone can learn Python
I’m sure anyone who works with data at all can find at least one use case from the tools I’ve just described. If you're hesitant about diving into Python, or have tried and felt overwhelmed, don't worry. You don't need a computer science background. With experience in scripting languages like Stata or R, you'll find Python's learning curve manageable, perhaps even easier. Next week, I'll outline a beginner-friendly learning path for Python: the way I wish I had learned it if I started over. Stay tuned!