1. 10:25 28th Jul 2013

    notes: 4

    reblogged from: buildingwanelo

    A Cost-effective Approach to Scaling Event-based Data Collection and Analysis

    Here is how we efficiently scaled event-based data collection and analysis at Wanelo.

    buildingwanelo:

    With millions of people now using Wanelo across various platforms, collecting and analyzing user actions and events becomes a pretty fun problem to solve. While in most services user actions generate some aggregated records in database systems and keeping those actions non-aggregated is not explicitly required for the product itself, it is critical for other reasons such as user history, behavioral analytics, spam detection and ad hoc querying.

    If we were to split this problem into two sub-problems, they would probably be “data collection” and “data aggregation and analysis.”

    Read More

     
  2. 07:22 17th Oct 2012

    notes: 12

    reblogged from: wanelo

    wanelo:

    The Wanelo you see today is a completely different website than the one that existed a few months ago. It’s been rewritten and rebuilt from the ground up, as part of a process that took about two months. We thought we’d share the details of what we did and what we learned, in case someone out…

     
  3. Simplicity is complexity resolved.
     
  4. Experiment 3: MemcacheDB Performance

    Hi folks.

    After a long time, I finally managed to write this post on my experiment on MemcacheDB.

    MemcacheDB actually uses BerkeleyDB on the backend, which is a high-performance database that stores key/value pairs as byte arrays, and also supports multiple data items for a single key.

    MemcacheDB uses Memcached API, so you can use any client that you use to connect to Memcached.

    I implemented an adapter for MemcacheDB on DB Testing Suite using one of the clients served on Memcached Wiki. You can find several clients there for the language you are using.

    Here are the results:

    MemcacheDB connection initialized.
    KEY-VALUE CORRECTNESS TEST
    Correctness OK!
    MemcacheDB connection shut down.
    MemcacheDB connection initialized.
    MemcacheDB connection initialized.
    MemcacheDB connection initialized.
    MemcacheDB connection initialized.
    MemcacheDB connection initialized.
    MemcacheDB connection initialized.
    MemcacheDB connection initialized.
    MemcacheDB connection initialized.
    MemcacheDB connection initialized.
    MemcacheDB connection initialized.
    KEY-VALUE WRITE PERFORMANCE TEST (10 threads)
    Total write time: 2431 milliseconds (100000 requests)
    Write performance: 41135 writes per sec
    KEY-VALUE READ PERFORMANCE TEST (10 threads)
    Total read time: 1001 milliseconds (100000 requests)
    Read performance: 99900 reads per sec
    MemcacheDB connection shut down.
    MemcacheDB connection shut down.
    MemcacheDB connection shut down.
    MemcacheDB connection shut down.
    MemcacheDB connection shut down.
    MemcacheDB connection shut down.
    MemcacheDB connection shut down.
    MemcacheDB connection shut down.
    MemcacheDB connection shut down.
    MemcacheDB connection shut down.

    Even using my 2 core 1 year old macbook, I was impressed with the performance.

    For sure, client implementation affects the performance. If I will have time, I will try different clients and post the comparative results here.

     
  5. Experiment 2: Redis Performance

    And here comes the first experiment with DBTestingSuite :).

    Redis

    Before going through the details, I want to tell about two tests I have written on the suite.

    1. KeyValueCorrectness: Actually this is a straightforward test to check the correctness of a Key/Value DB or its binding. It writes and reads random values on random keys and asserts the equality of value that was written and read. This seems somewhat ridiculous (a DB should achieve this, why should I use a DB that cannot even give me what I have supplied before :)), but this test can also be used to test the bindings or clients of a DB brand, some of which are implemented by community people (I may need this :)). The test is really dummy and extends the non-threaded version of Key/Value DB test abstract.
    2. KeyValuePerformance: I have been reading about load testing on DBs. If you want to push a load on a DB and test the performance of it, you should avoid overheads and focus on the pure performance of the DB. Using the local loopback (127.0.0.1) for requests achieves minimum ping time. Of course the size of the values in Key/Value DBs is a point to consider. So I left it customizable. Most DBs respond to the requests via multi-threading (the number is decided on starting time or dynamic). KeyValuePerformance is a threaded test composed of write and read phases. When testing the write performance, it fires x threads to write n data values at total, of length l. The number of threads can be chosen considering the number of cores on the machine, or just a number that supplies enough CPU load, where “enough” equals “maximum” :) (to be watched from Activity Monitor :)). n majorly decides on the test duration, but of course test calculates the average writes per sec, which is the actual number that we are looking for. The read phase works similar. I checked the performance test overhead on a large n (and an average l), and total write overhead was 86 millisecs and read was 46 millisecs, which is really negligible when a proper test’s duration is 5-10 secs. Also these overheads will be equal for all DB brands (or clients).

    I have used JRedis as the java client for Redis.

    JRedis

    The testing environment is again my little white macbook :)

    2.4 GHz Intel Core 2 Duo (3MB L2 Cache, Bus at 800 Mhz)
    4 GB DDR2 SD Ram at 667 Mhz
    150 GB Hitachi Disk on SATA (1.5 Gigabit)

    And here are the results for Redis, where x = 2, n = 10000, and l = 32:

    Redis connection initialized.
    KEY-VALUE CORRECTNESS TEST
    Correctness OK!
    Redis connection shut down.
    Redis connection initialized.
    Redis connection initialized.
    KEY-VALUE WRITE PERFORMANCE TEST (2 threads)
    Total write time: 742 milliseconds (10000 requests)
    Write performance: 13477 writes per sec
    KEY-VALUE READ PERFORMANCE TEST (2 threads)
    Total read time: 757 milliseconds (10000 requests)
    Read performance: 13210 reads per sec
    Redis connection shut down.
    Redis connection shut down.

    Actually the result surprised me a little bit. As far as I now (it was also told on the Redis project site), Redis can achieve speeds like 110000 writes per sec and 80000 reads per sec. Then I ran the Redis Benchmark that comes with the package:

    atasay-gokkayas-macbook:redis-1.0 rincewind$ ./redis-benchmark -q -n 10000
    SET: 28194.37 requests per second
    GET: 26525.20 requests per second
    INCR: 28850.14 requests per second
    LPUSH: 30183.74 requests per second
    LPOP: 29618.34 requests per second
    PING: 31734.18 requests per second

    The benchmark results shown that Redis was performing about 28000 writes per sec and 26500 reads per sec on my environment, which dramatically relieved my surprise. The difference between the numbers on project site and numbers from the ./redis-benchmark was probably due to the environment. And I think the speed difference between my results and the ./redis-benchmark was sourced from the language (benchmark was written in C), and/or the client that I used.

    Another important point that both my tests and ./redis-benchmark together show is that Redis performs better in writes. In default configuration, Redis writes on memory and periodically syncs with the disk. This can be very helpful in some use-cases and shows a strong difference of Redis, while it also constitutes a trade-off.

    If I have time, I’ll try the same tests on Redis with different clients. And of course with other Key/Value DBs :).

    Kthxbai.

     
  6. Implementation 1: DB Testing Suite

    I have implemented a DB Testing Suite in Java, and its name is DBTestingSuite :).

    I know that there are several DB benchmarking tools out there but I just wanted to implement my own tool that will help me on my further experiments. I’ve chosen Java as the language since it’s high-level :) and markedly fast with the latest major versions and JREs, and most DB brands have mature Java bindings.

    It is abstracted from DB types and brands.

    In my definition, types are DB types like Key/Value, Column-Oriented, etc. And brands are DB brands like, Redis, HBase, Tokyo Cabinet etc.

    In DBTestingSuite, there are connection interfaces for DB types. For instance, KeyValueConnection interface. Every DB brand will have its connection adapter that implements the connection interface of its type.

    And there are threaded and non-threaded test abstracts for different DB types. Every test will extend either the threaded or non-threaded abstract for its DB type according to the nature of the test. For instance, KeyValueCorrectness test uses non-threaded Key/Value test abstract, while KeyValuePerformance test uses threaded Key/Value test abstract.

    With this approach, experimenting a new DB brand will be as easy as implementing a DB adapter for it, of course that implements the connection interface of its type. Also different adapters can be easily written for different client libraries or bindings of the same DB brand to be able to test the performances of client libraries. And at the same time, when a new test is written, it will be applicable to all brands in its type.

    In order to avoid coupling, since I didn’t use a framework supplying means to bind class types to container classes (Dependency Injection), I manually inject the adapter classes to generic testing class constructors and lower layers use the connections supplied without knowing which kind of connections they are.

    The tool is in pre-alpha, I will share it when it becomes more mature :).

     
  7. Experiment 1: Counting words with Hadoop.

    Today I have installed Hadoop Core on my white macbook (currently I have only one though :)).

    Hadoop Logo

    I remember my times that I loved to use map and reduce in languages following functional paradigm (like Haskell or Lisp) during my undergraduate years. I know that Google’s Map/Reduce is pretty different than that but I felt the same “love” again after years.

    Deploying Hadoop Core was pretty easy though, a very straightforward guide is there on Hadoop wiki.

    Running Hadoop On OS X 10.5 64-bit (Single-Node Cluster)

    Hadoop needs Java SERE 1.6 to run on, so I updated it first and configured from Java Preferences.

    A little problem was that I first set JAVA_HOME in hadoop-env.sh to Java SE 1.5 directory, updated Java RE and forgot to change JAVA_HOME to point to the new RE.

    Exception in thread “main” java.lang.UnsupportedClassVersionError: Bad version number in .class file

    After starting NameNode, DataNode, JobTracker and TaskTracker (1) on the same machine, Hadoop was good to go for experiments.

    The package comes with several examples, very little instances/prototypes of the big problems in real life. Counting the occurrence of words in different files was one of that. In the WordCount example:

    Each mapper takes a line as input and breaks it into words. It then emits a key/value pair of the word and 1. Each reducer sums the counts for each word and emits a single key/value with the word and sum. As an optimization, the reducer is also used as a combiner on the map outputs.

    This was my first experiment and an initial touch on HDFS. The next experiment might be on HBase :)

     
  8. 03:50 5th Jun 2009

    notes: 4

    Scientists study the world as it is; engineers create the world that has never been.
     
  9. Dream Theater have announced their new album “Black Clouds & Silver Linings”. It’s their tenth studio album.
It’s coming up on 23rd of June. For pre-promotion, they made the song “A Rite Of Passage"s mp3 available for free. It’s obvious that their sound is becoming harder but not loosing anything from the power of their melodies, and that is the sound I love to hear! Incredible harmonies, stop’n’goes.. I’m curious to listen to the other songs in the album.
And here are the details.

    Dream Theater have announced their new album “Black Clouds & Silver Linings”. It’s their tenth studio album.

    It’s coming up on 23rd of June. For pre-promotion, they made the song “A Rite Of Passage"s mp3 available for free. It’s obvious that their sound is becoming harder but not loosing anything from the power of their melodies, and that is the sound I love to hear! Incredible harmonies, stop’n’goes.. I’m curious to listen to the other songs in the album.

    And here are the details.

     
  10. Hello world!

    #include <stdio.h>
    #include <stdlib.h>

    void leak(void) {

    void* m;
    m = malloc(1);
    return;

    }

    int main(void) {

    while(1) leak();
    return 0;

    }