Countif and Sort, 1 million de fois plus rapide

De temps en temps, je dois faire face à des CSV qui comptent des millions de lignes chacun, que les données proviennent de large datasets ou de scraping.

Il m'arrive d'extraire les followers d'un compte Instagram, afin de savoir avec quels comptes ils sont en concurrence sur leur audience.

Par exemple, un compte de 10 000 followers - et, à peu près, chaque followers suivait en moyenne 800 comptes également : cela fait 8 000 000 de lignes au total. Et ces données étaient compilées dans un CSV, pas dans une base de données (oup's).

La question est la suivante : quels sont les comptes les plus suivis par les followers d'un compte donné ?

Sur Excel, c'est une question de base de COUNTIF.

Excel ne prend en compte qu'un peu plus de 1 000 000 de lignes. Plus encore - essayez d'effectuer un =COUNTIF() sur 1 million de lignes. Vous obtiendrez le résultat deux jours plus tard.

Donc ici, je partage juste un petit script Python, qui fait le travail en moins de 2 minutes. Je laisse des commentaires pour expliquer le script ci-dessous.

import csv
import collections
id_followed = collections.Counter() ## we initialize the COUNTIF column as a Counter
outFile = open('occurences.csv','w') ## first output file
outFileSorted = open('sorted.csv','w') ## second output file, sorted
listLines = []
## the output-followings.csv file structure was:
## id_follower, id_following, username_following
## 123456, 7890987, following1
## 123456, 0987890, following2
with open('output-followings.csv','r') as input_file:
   for row in csv.reader(input_file, delimiter=','):
       print(row[2]) ## prints in console the username
       id_followed[row[2]] += 1 ## increment in the Counter corresponding to the username
with open('sorted.csv','w') as sortedfile:
    writer = csv.writer(sortedfile)
    for key, value in id_followed.most_common(1000): # most_common() gives us the possibility to directly sort the Counter
        line = str(key) + ',' + str(value)
        writer.writerow(line.split()) # we write in the sorted.csv file
with open('occurences.csv','w') as csvfile:
   writer = csv.writer(csvfile)
   for key, value in id_followed.items():
       line = str(key) + ',' + str(value)
       writer.writerow(line.split()) # we write all the rows in the outFile
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.


Obtenez les bons insights de vos données et analytics

Mesurer ce qui compte - le reste n'existe pas.

Mesurez ce qui compte.

Growth Insights • More articles

Your CRM action plan to take your ecommerce store to the next level [checklist included]

Launching an ecommerce store? You have everything ready, your products, your website, your design - even your analytics? Yupee !But how about your CRM ? In this article, we'll review a simple go-to plan you can start with today to extend your CRM capabilities. Let's dive in!

Read →

The Perfect Email Journey to get back +25% of your lost checkouts

I'm sharing here a proven tactic that drive results to get back at least 25% of your lost checkouts, while the e-commerce average is around 8%.

Read →

Facebook Ads Budget & Profitability Calculator

This calculator will let you know which budget you'll need on Facebook - and if your ads are profitable.‍

Read →

Emails Metrics A/B Test Significance Test

This email A/B test significance test will let you know which version of your email A/B test is the most performant - and on which metrics. Plus, we'll display which recommandations you can take.

Read →

What is a Good Conversion Rate On Shopify? [& Tips On How to Improve Conversions]

To help boost your Shopify store conversion rates, you’ll need a holistic approach.

Read →

The Perfect Product Page Structure for Your Ecommerce Store

The Product Page is the most crucial page of your eCommerce store where conversion takes place.

Read →

How to Use Pinterest To Promote Your eCommerce Store

Pinterest is a great platform to promote your eCommerce business.

Read →

eCommerce Personalization Examples & Tactics For Shopify

Personalizing your eCommerce site is a great way to improve conversion rates.

Read →

Top 10 Direct-to-Consumer Brands you need to benchmark your product page

With the top 10 Direct-to-Consumer brands that we’ve reviewed and was able to gather all the juicy details you need in setting up your e-shop’s product page! Whether you are an entrepreneur in beauty, fashion, lifestyle, or food & wine — we got you covered. Check out our list below to find your perfect match:

Read →

Benchmark - Email strategies you can learn from these DTC Brands

What do customers want to see? Truth is, it really depends on what you offer and how you want to brand your business. Overall, here are the things that we highly suggest you consider when writing your newsletter.

Read →

Social Media 101: How to organically launch your brand’s social media platform

Launching your brand’s very first social media platform can be exciting and nerve-wracking at the beginning. Of course your main goal here is to drive social media users to your website or e-shop and convert them to customers. However, in this day and age, brands are expected to be more relevant and engaging to its target community especially in the universe of instagram, facebook, tik-tok, and twitter. So, how do you do that? How will you be able to sell while being relevant? 

Read →

Growth Blog • D'autres articles

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
▲ hutte • que 2021 soit spécial • réalisé avec attention