Case Study: Using Redis intersect on very large sets

Week or two ago, we needed to calculate difference between two large MySQL tables.

The client had csv dumb in text file of both tables. Because of that we decided to import those dumps in two Redis sets and to see what will happen.

require_once "config.php";

$file = "dump_keys.001.txt.gz";

echo "Importing $file...\n";

$i = 0;

if ($F = gzopen($file, "r")){
        while( ( $line = fgets($F, 1024)) !== false){
                $line = trim(strtolower($line));

                $r->sadd("DUMP1", $line);

                printf("%10d %s\n", $i, $line);


Both dumps were around 120,000,000 items (keys of MySQL records) and were imported for under 30 min each.

Redis used memory were 21.87 GB.

Finally we did:

(integer) 2655487

As seen, the intersect took less 300 seconds (less 5 min).

Finally we did script that SPOP() from result set and do some MySQL processing. Whole "difference calculation" were done for less 2 hours.

We did not use cloud, because we have lots of cheap dedicated servers, but example is perfect for demonstrating power of cloud computing.

In case one use Amazon AWS, "Double Extra Large" is the way to go. Since it it one-time job, suppose it will take 3 hours x 0.90 USD.
This makes price less 3 USD to get the job done.

blog comments powered by Disqus

Articles library

Redis as session handler in PHP

How to use Redis for session handler in PHP in a way similar to msession or PostgreSQL sessions
Date: 2011-07

Tags: system administration php

Lazy starter script on Linux such Redhat CentOS or Fedora

How to quickly made Redis to start after Linux boot.
Date: 2011-07

Tags: system administration

Redis vs Memcached comparison

Here is basic comparison of Redis Memcached and Memcachedb.
Date: 2011-07

Tags: memcached

Why RDBMS and SQL are difficult...

...while sometimes NoSQL solution may be easier and way faster? This arcle deals with incremental counters and how they needs to be implemented properly in RDBMS-es such MySQL
Date: 2011-07
Updated: 2011-09-30

Tags: anti-pattern programming mysql

Seamless migration from one Redis server to another

How to use replication to migrate Redis server without service interruption
Date: 2011-07

Tags: system administration

Redis swap issue while save

Explanation why your server load average may spike and server to swap memory when running redis
Date: 2011-08-21

Tags: system administration

Amazon EC2 and Amazon ElastiCache service

Memcached at Amazon first impressions
Date: 2011-08-23

Tags: system administration memcached cloud

Understanding hash-max-zipmap-entries and 'hash of hashes' optimization

Explanation of config parameters such hash-max-zipmap-entries and how we can make huge memory savings in new Redis 2.2
Date: 2011-08-28

Tags: optimization system administration programming php

NoSQL database design example

This article explains how we made our article section
Date: 2011-09-02

Tags: database design programming mysql

Redis save and backup script

Simple backup script for Redis
Date: 2011-09-28

Tags: system administration optimization

Postgres 9.1 foreign data wrapper interface

Postgres 9.1 gives you ability to access data from different data sources including Redis
Date: 2011-10-20

Tags: programming mysql pgsql memcached optimization

Redis high traffic connection issue

Explanation on 'Cannot assign requested address' connection error
Date: 2011-11-12

Tags: optimization system administration scaling

Redis too many open files error on high traffic sites

Explanation on 'Accepting client connection: accept: Too many open files'
Date: 2011-12-21

Tags: system administration memcached security

PHP Redis bug with PHP 5.4

Description of bug PHP Redis bug
Date: 2012-07-02

Tags: system administration php

Case Study: Using Redis intersect on very large sets

Intersection of 2 x 120M MySQL records
Date: 2012-08-29

Tags: programming mysql optimization cloud php

Getting started with Python and Redis on CentOS 6.X

How to install and test Redis client for Python on CentOS
Date: 2013-03-21

Tags: system administration python

GeoIP in Redis

Put Maxmind's GeoIP in Redis for best berformance for cient apps written in PHP or Python
Date: 2013-03-24

Tags: programming mysql optimization php scaling python

Redis database file examination

Examination of Redis rdb file in order to find huge memory consumption
Date: 2013-08-23

Tags: optimization system administration

© Jul.2011 - 2015,