Countif and Sort, 1 million times faster

From time to time, I got to deal with CSVs that are millions lines each, whether the data is from large datasets or from web scraping.

One time, I extracted the followings of the followers of an account, having in mind to know with which accounts they were competing on their audience.

That was an 10k followers account - and, give or take, each follower was in average following 800 accounts as well: that's 8,000,000 lines in total. And I had this data in a CSV, not in a database (oup's).

Question is: what are the accounts the most followed by the follower of an account?

That's a basic COUNTIF question.

Excel only takes into account a little more than 1,000,000 rows. Even more - try to perform a =COUNTIF() on 1 million rows. You'll get the result two days after.

So here I'm just sharing a little Python script, that actually does the work in less than 2 minutes. I'm leaving comments to explain the script below.

Hope it'll help someone!

import csv
import collections
id_followed = collections.Counter() ## we initialize the COUNTIF column as a Counter
outFile = open('occurences.csv','w') ## first output file
outFileSorted = open('sorted.csv','w') ## second output file, sorted
listLines = []
## the output-followings.csv file structure was:
## id_follower, id_following, username_following
## 123456, 7890987, following1
## 123456, 0987890, following2
with open('output-followings.csv','r') as input_file:
   for row in csv.reader(input_file, delimiter=','):
       print(row[2]) ## prints in console the username
       id_followed[row[2]] += 1 ## increment in the Counter corresponding to the username
with open('sorted.csv','w') as sortedfile:
    writer = csv.writer(sortedfile)
    for key, value in id_followed.most_common(1000): # most_common() gives us the possibility to directly sort the Counter
        line = str(key) + ',' + str(value)
        writer.writerow(line.split()) # we write in the sorted.csv file
with open('occurences.csv','w') as csvfile:
   writer = csv.writer(csvfile)
   for key, value in id_followed.items():
       line = str(key) + ',' + str(value)
       writer.writerow(line.split()) # we write all the rows in the outFile
Auteur : 
Retrouvez-nous sur YouTube pour des analyses marketing hebdomadaires.
▶ Je m'abonne !
Abonnez-vous à notre podcast hebdomadaire 3615 Marketing.
▶ Notre podcast
Author : 
Subscribe onYouTube for weekly growth ecommerce best practices.
▶ Subscribe !
Connect with me on Linkedin!
▶ Connect
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.


Get the right insights for your brand from your data & analytics

Measure what matters - the rest doesn’t count.

Measure what matters.

Growth Insights • More articles

The Perfect Email Journey to get back +25% of your lost checkouts

I'm sharing here a proven tactic that drive results to get back at least 25% of your lost checkouts, while the e-commerce average is around 8%.

Read →

Your CRM action plan to take your ecommerce store to the next level [checklist included]

Launching an ecommerce store? You have everything ready, your products, your website, your design - even your analytics? Yupee !But how about your CRM ? In this article, we'll review a simple go-to plan you can start with today to extend your CRM capabilities. Let's dive in!

Read →

Facebook Ads Budget & Profitability Calculator

This calculator will let you know which budget you'll need on Facebook - and if your ads are profitable.‍

Read →

Emails Metrics A/B Test Significance Test

This email A/B test significance test will let you know which version of your email A/B test is the most performant - and on which metrics. Plus, we'll display which recommandations you can take.

Read →

How Much Does an Ecommerce Website Cost?

The perennial, ever-so-sticky question: How much does an e-commerce website cost? The answers will vary. Also, the answer you don’t like: It depends. 

Read →

eCommerce Personalization Examples & Tactics For Shopify

Personalizing your eCommerce site is a great way to improve conversion rates.

Read →

How to Create FAQ Pages For eCommerce: Best Practices for 2023

Learn how to create FAQ (Frequently Asked Questions) for eCommerce and you’d not only do a service for your potential customers but also save time answering questions (on live chat, email, phone, or otherwise). 

Read →

Programmatic SEO and eCommerce: What are some best practices? 

Search Engine Optimization (SEO) is key to getting your website found and for you to take advantage of the phenomenal use of search engines as the starting point for most users’ journey on the web -- to find answers, to look for information, to compare product A with product B, to look for solutions, to find local stores or merchants, and also to buy. 

Read →

Proven eCommerce Marketing Strategies To Try in 2023

As far as proven eCommerce marketing strategies go, while we use the word “try” in the title, what we really mean is that you “should”. By the end of 2022, global eCommerce will be worth a whopping $5.55 Trillion. By 2023, eCommerce is going to be worth $6.17 Trillion. If anything, eCommerce is only going to get bigger and is a viable opportunity for any eCommerce brand.

Read →

Why Do You Need Landing Pages For eCommerce Sites? 

Landing pages -- unlike regular pages -- help convert better. Use them generously for all campaigns. eCommerce conversions -- along with sign ups with tracking pixels happen on landing pages. Sales happen on eCommerce product pages. All of this is tracked. 

Read →

Big data for Ecommerce Small Businesses

One way to challenge big brands is to leverage data. Which data? Yours

Read →

What is a Good Conversion Rate On Shopify? [& Tips On How to Improve Conversions]

To help boost your Shopify store conversion rates, you’ll need a holistic approach.

Read →

Benchmark - Email strategies you can learn from these DTC Brands

What do customers want to see? Truth is, it really depends on what you offer and how you want to brand your business. Overall, here are the things that we highly suggest you consider when writing your newsletter.

Read →

How to increase click rate on emails?

How to increase click rate on emails? This is a very tricky question because it always depends on a lot of stuff and by a lot we mean — A LOT. So, we suggest to actually follow this checklist that we have formulated to help you get a bigger picture of what should be done. 

Read →

Growth Blog • D'autres articles

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
▲ hutte • que 2021 soit spécial • réalisé avec attention