portfolio

Sonntagsfrage

The “Sonntagsfrage” is a german survey where people answer the crucial question:

If you had to vote the german government on this sunday, whom would you choose?

https://www.infratest-dimap.de/umfragen-analysen/bundesweit/sonntagsfrage/
(translated freely)

It is performed by the infratest bimap institute and generally around 1000 – 1500 people take part in the questions. The results are often used as a barometer for the political situation within the country and are regularly presented in one of germanies main news programs, the “Tagesschau”.

In this Post the historical results of the Sonntagsfrage are used as input for a machine learning algorithm to produce a forecast for upcoming Sunday. Hence creating an AI which predicts the answers to the next Sonntagsfrage and thus gives an indication of the political climate in germany right now.

The results for next Sunday can be seen in the following Dashboard. Predictions can be viewed per used model or the average over all models can be displayed. Hint: Change the view from “mobile” to Desktop View” with the second icon from the right at the bottom of the dashboard.

Check out my Medium Posts, where I explain how I used Google and Azure Cloud services to set this whole thing up.

Solution architecture

For the realisation of this project as well as automatisation and data consumption a combination of services from the Azure Cloud, Google Sheets and Tableau Public were used. The architecture can be seen in the image below.

Historical data is pulled from
https://www.wahlrecht.de/umfragen/dimap.htm
via a webcraller in python. For automation purposes the Microsoft Cloud service Azure was chosen: crawling and cleaning are each realised through Azure Functions, while orchestration is performed by Azure Durable Functions. As the single-point-of-truth storage the Azure SQL database is used.

For more information on this topic, check out my Medium Articles about Azure Functions and Durable Functions. The corresponding code can be found in my GitHub account.

Model training and predicting is performed with Python through the the use of the Azure Machine Learning service. Pipelines and Compute Clusters create an automated and reproducible Data Science Product.

Data consumption is performed via dashboards from the Tableau Public service using Google Sheets as a technical backbone.

Model evaluation

The models I use for machine learning are as follows:

  • DecisionTreeRegressor (sklearn)
  • SGDRegressor (sklearn)
  • GradientBoostingRegressor (sklearn)
  • XGBoost Regressor

As for input parameters right now only temporal features are used. Cyclical passing over the years is encoded into radial coordinates and additionally features like “number of days since the last survey” are being used.

In order to compare the different models I defined the following common performance metrics. Each one is computed over the most recent 12 weeks

  • MAE
  • MSE
  • RMSE
  • r2

The following Dashboard shows these metrics computed per party and model in a heat map. Hint: Change the view from “mobile” to Desktop View” with the second icon from the right at the bottom of the dashboard.

As can be seen, all models perform similarly poor. Reasons for this are probably the lack of seasonal time series modelling (like ARIMA, prophet, etc.) or the lack of useful features.

Data consumption

Gathering data and fueling some algorithm are only two thirds of the way done. In order to have impact the results of all this magic in the backend have to be consumed by people. Hence the last step of this project: easy to read and easy to access visualisation. This is the part of such data science projects which is actually visible to other people and by which the quality of the project is often judged.

Visualisation of the historical answers to the Sonntagsfrage as well as of the prediction for the next survey are done with the service of tableau public. As data source for the dashboard Google Sheets ist used. This way new calculations are automatically uploaded into a google document and the dashboard refreshes the displayed data each time the document is updated. Check out my Article on Medium for more information on that topic.

The Dashboard with the current predictions can be found at the top of the article, a dashboard with historical values at the bottom.

Conclusion

The Sonntagsfrage gets now predicted weekly and this prediction visualized permanently without the need of any further input: we are now provided with a glimpse into the possible future political climate of germany. Look forward to my posts on Medium, where i go into detail about the essential steps. Also follow me on twitter and take a look into the well documented git repository of this Sonntagsfrage project.

Now Enjoy the dashboards!

Hint: Change the view from “mobile” to Desktop View” with the second icon from the right at the bottom of the dashboard.

Week 33: End of the week update

Current project

Right now I am working on a smaller scope project, to create some content and go back to generating value. With swagger and Azure Functions I am currently building an API that makes my data from all portfolio cases accessible to everyone. Finer details like limiting the number of executions or authentication are consuming some time, but the API is already taking shape.

Future plans

Credit: https://unsplash.com/@silverhousehd

After dabbling a little with crypto currencies on coinbase, I noticed the lack of an incredible base feature: point in time of a transaction and the amount of coins purchased including the price per coin are NOT tracked on the platform itself. I receive a mail for each transaction, but to have an actual overview of my gains and losses I would have to track it on a spreadsheet. In the year 2021!

Hence I want to build a small platform, which lets you input your past orders and then show you, what your current gains and losses are.

Next step would be notifications on severe drops or gains in value and maybe even a “sell now” alarm via App directly to your smartphone.

And as a final step the implementation of a predicting algorithm, which sends mails like “Coin X is likely to raise above 10% within the next hour”.

Sounds exciting? Subscribe to my blog, to keep up to date with all developments!

High level thoughts on writing

On a broader scale I want to invest more time in my writing. From all the different paths, I tried so far, writing has been generating the most attention and has also been the most gratifying. It does take a lot of energy though to generate content, that I am – hehe – content with.

I want to start developing my skill by means of volume and adjust the quality along the way. So be excited for more regular updates: new portfolio cases, posts and Medium articles!

Week 26: End of the week update

This week there are only small updates due to a busy week in my Job.

Volunteer Work

Got some communication done with the end users and the IT. Now I have a solid understanding of the available toolset and its limitation. Scheduled a meeting for next week with me, the end user and IT to plan a solution for the existing pain point and create a roadmap on how to realise it.

Next Project

Looking for inspiration on the next project. Currently thinking about some stock analytics (“Which stocks to buy this week.”) or some reinforcement learning (checkers).

Gonna do some research, to get some ideas. If you have one, post it in a comment! Would help me out tremendously =)

Have a sunny week.

Week 25: End of the Week Update

Nice week overall – got a lot done: Article, website and social stuff.

Twitter Showcase

I finished my Hate on Twitter project and as it turns out, the – let us call it – “german twitter space” is not as toxic as i feared. In general it is a neutral to positive place. As per usual there seems to be a vocal minority at twitter that regularly crosses the line of my moral compass.

In terms of technicalities I finished the dashboards, despaired on my correlation analysis (maybe one more chapter for my Twitter Showcase will follow) and wrote everything I got into a cohesive story. My new article can be found on this site here.

Next Project: Off to new horizons and unchartered topics. If you have some fresh ideas, you are well inclined to let me know.

Website

Wrote an “Impressum”, added Cookie with policy and made my site overall comfort to the European DSGVO standard. Additionally I added a subscribe button – you can now receive my newsletter with updates each time I post.

Added backups to my google drive storage, reorganised my content and added to the “about me” page.

Social Media

Created a LinkedIn Account, you can add me here. Gonna read up on networking and start getting to it at LinkedIn.

Twitter Monitor

Under the working title “hate on twitter” I started this project. From german politicians having to manually issue takedowns of false statements (example: Renate Künast), students who still go to school receiving offensive messages on Instagram, because they have the wrong religion when somewhere in the world a conflict escalates to people of public interest becoming victims of cancel culture.

Examples of this unmoderated toxicity within social media sparked my disgust and I started wondering, whether I could make this negativity visible with numbers. Thus the idea of the twitter monitor came about.

In this project I want to take twitters most popular hashtags, apply sentiment analysis to them and visualise the results get an idea of how toxic the conversation is.

The Dashboard

In the following dashboard the sentiments and the amount of tweets per hashtag are displayed. All measures are aggregated per hour. Hint: Change the view from “mobile” to Desktop View” with the second icon from the right at the bottom of the dashboard.

The upper plot shows the overall hourly tweet activity over the past two weeks. The middle plot shows the average hourly polarity from the sentiment analysis of the received tweets. Two graphs are displayed, since a german and an english corpus were used.

The hearty of the Dashboard is the lower half. Again, the two plots represent the average polarity from the german and the english corpus. But this time each dot represents a hashtag and its size the amount of tweets that were obtained from the twitter stream API. Mousing over the dots reveals some interesting additional information, like corresponding standard deviation to the average or the name of the hashtag.

All time periods can be adjusted using the boxes on the top of the plots.

Interlude: Messi

Check out the data from 5. August 2021 at 22:00: A gigantic spike with tweets containing the word ‘Messi’. This football player changed the sports club he is playing for and this had obviously quite the impact on Germans on twitter =D

The Results

Surprisingly – for me at least – the overall polarity of the tweets seems rather positive and especially using the english corpus a clear trend towards positivity can be seen as of the 22nd of June 2021. Negative hashtags are mostly outliers with only few tweets, but from time to time truly negative topics do appear, like ‘#ITASWI’ on the 17th of June 2021 at 3 o’clock.

On the one hand side it always feels bad to be proven wrong, but this time, it seems, I fell victim to the effect of vocal minorities. Negativity and toxicity in social media do exist and are as disgusting as ever, but they seem not to be the norm.

Averages do not tell the whole story

One critical type of information is not displayed properly in the above dashboard: the standard deviation. I was not able to implement this variable in a meaningful way. Yes, one can mouse over a dot and get the stddev information for each data point, but this is a far cry of an understandable visual representation. There exists some insights still to be gained from this data.

In the below dashboard the average polarity versus its standard deviation per hashtag are plotted. The size again corresponds to the number of tweets received. Hint: Change the view from “mobile” to Desktop View” with the second icon from the right at the bottom of the dashboard.

First let us take a look at the plot with the german polarity. It clearly shows that most data points center around a polarity of zero. This was to be expected, since zero is the default value in case no polarity can be calculated. A slight shift into the positive can be seen in the DE polarity, while in the EN polarity this trend is even more prominent and has less scattered data points far away from this visible correlation. This leads to the following interpretations:

  1. The underlying sentiment of tweets for popular german hashtags has a bias towards the positive, which is bigger in english tweets than it is in german ones. Hence English tweets seem to be more positive in nature than german tweets.
  2. German tweets have generally a higher standard deviation on their polarity, which in turns means, that german tweets are more volatile and tend to prefer more extreme positions than english tweets.

Whether these statements are the result of differences in quality between the german and english textblob or they say something about the german twitter culture, is up for you you decide.

How this all works

Hashtags are crawled from here at the start of every hour and used to obtain tweets from the twitter streams API. In this step the sentiment analysis using the textblob package is performed the tweets are send to an Azure SQL DB. In this SQL DB a procedure runs hourly (triggered by Azure Data Factory) that aggregates the results to an hourly time slice. Afterwards these aggregated results are sent to google spreadsheets and then used as input for the above Tableau Public dashboard. This way the data in the dashboard os updated once the google sheets is updated.

All the code can be found here.

Afterwards I packaged everything into a docker container, automated all remaining processes using cron jobs and uploaded this container into a private image on docker hub. With Kamatera I found a low cost way of hosting this image and I have my solution hosted and running.

CSV-file to Mailchimp

The NGO

This NGO is an international movement that is committed to safe escape routes, unhindered sea rescue and an end to dying at European borders.

I came into contact with them via the DSSG Berlin organisation.

The Task

Mailing Lists can be obtained from a variety of sources. The central hub used by this NGO for newsletter and campaigning is the web service MailChimp.

Hence the task is set: Use csv-file exports from other services (like FundraisingBox or twingle) to automatically import mailing lists into MailChimp.

The Solution

For this solution I created a python project with a clickable shell script to execute the code. Non-technical users have to work with my solution, so I tried to make it user friendly.

I first created individual preprocessing steps for each source in Python. The goal was to homogenise the different inputs into a consistent format.

The next step used these files and send the contents via the MailChimp API to the MailChimp account.

After running tests, the code was successfully deployed and used in a real world scenario. Now it helps elevate some of the manual work and receives small updates from time to time to time.

The complete code can be found here on GitHub.

Week 24: End of the Week Update

What a week. Breakthroughs. Finally!

Portfolio Project: Twitter Analysis

I managed to find an affordable host for docker containers: Kamatera. At 4$ per month I get my own mini-server with docker installed. So I got my twitter stream working from inside a docker container with cron, pushed it to a private repo on docker hub and deployed the container on my Kamatera server. A minor annoyance is, that I had to manually start cron on the container. Maybe have to add a startup script?

Also created a first dashboard at tableau public. A few more will follow, but only few steps are left until I will be ready to post a new project to my portfolio!

Insta stuff

Settled on theme for my channel: Art & AI.

Read up on Skillshare, created some posts, posted twice. Things are starting, am kind of nervous. Took some pictures from Memo Akten and posted them. Send Memo a mail asking for permission, but got no answer since he is on holiday. Let’s hope, I don’t get trouble.

Settled on Plannthat for the time as feed planning App.

Style Transfer

Also got some pictures going using style transfer with tensorflow lite. Here are some examples:



Initially I planned using them for some insta posts, but maybe I can create also a portfolio page from them. Let’s see how much I can milk that cow =D

Finishing Words

Was finally a nicely productive week. Sometimes one just has to push through the tough times to reap some rewards. Keep it up guys, results are just around the corner!

Week 19: End of the Week Update

This weeks learnings come from starting the second use case for the portfolio page, working for Seebrücke and getting down some bigger picture plans.

learnings

  • Requirements engineering: While assessing the requirements and performing a first evaluations of the solution space I realised that BI Solutions are quite expensive overall. The cheapest solutions with a complete feature set came at about 70 Euro per month per user.
    This made also way for the next learning: For small scope problems there exist way easier solutions than enterprise grade ones.
  • Connecting twitter to azure requires a running server somewhere. With a configured twitter app from the developer account there still needs to be a process somewhere that extracts (consumes) the tweets and sends them somewhere. This process needs to be running permanently for aggregation. Maybe this site can be used? Further investigation is required.
  • Sentiment analysis in German is tough to find, but possible. I did find 2 – 3 sites and python packages that offer this feature and aren’t too outdated (two years).
  • In order to extract information about hate in tweets sentiment analysis is not enough. Azures content moderation API would more useful, but limited in the number of free monthly requests (ca. 167 per day). Paid request would end up very expensive.
    Python packages performing similar tasks were not findable in German.
  • Need to get clear plans: What do I want broad stroke wise? I want build portfolio projects to learn, I want to convert them somehow into income streams and I want to get away from working for money to making money work for me.
    I need to find a management solution for overview. Using a small notebooks is not a working solution for me.