Sunday, August 5, 2018

Concurrency bug in a real banking system

In this era of banking where online banking and mobile banking has became a daily affair for many of us (esp. in cities), any technical bug in the backend software used by banks can have an impact whose severity is difficult to assess without the occurrence of the incident. While the primary onus of correctness of the software is on the verification engineer, in the era of multi-processing and multi-threading, the design structure and the coding style matters a lot. Many of the artifacts of complexities in the current systems are non-deterministic and non-repeatable and cannot be verified with guarantees. Thus, the designer is also equally responsible for a bug that leaks into the production system.

Recently, I hit up on one such bug in a recent banking transaction, which we have studied as a fundamental problem in concurrent systems. Apparently, the example studied in books consider the scenario of banking transaction and hitting the bug in reality reminded me of the same. To explain the situation, I had INR 7316.88 in my account. I transferred INR 3000.00, which attracted a service charge and tax of INR (50 + 9). This triggered 2 transactions in parallel - deducting the 3000.00 and deducting 59.00 as service charge from the same account. It could have possibly happened in sequence - 3000.00 deducted first and then 59.00. But somehow, the banking system's code treated them as 2 parallel threads and invoked them in parallel. Now what happens - 3000.00 deducted from 7316.88 leading to a balance of 4316.88. Now 59.00 should be deducted from 4316.88, but due to this happening in parallel with 3000.00 deduction, it deducted 59.00 from the original amount of 7316.88, leading to a balance of 7257.88. The transaction of 59.00 got completed later than 3000.00, ending up with a total balance of 7257.88 in my account. The below snapshot proves this incident (listed in reverse order of date - the last entry is older transaction than previous).

This is a clear concurrency bug which could have been avoided by using proper locks in the software code (for those who understand the terminologies). What this incident brings to forefront is that there are bugs for such simple situations still existing in production banking systems which many of us just rely on and use regularly. This one ended up being beneficial for me and hence I am not worried. What if the reverse had happened - The final balance turned out to be lower than expected. I would be running from one bank counter to another, and they would blame it on the software without being able to help themselves. This might get resolved, but would take time. What if people do not notice this - then it is a matter of chance that it would be fixed.

I was able to withdraw 7000.00 later from the account, leading to no way that the software could fix the problem some days later. I am waiting to see what happens and how the system goes with this. I have no intent to keep this amount and I would eventually return it, once the bank identifies their issue.

I would expect that the application developers understand the severity of such incidents and do their due diligence in ensuring the quality of their software. I would also hope that the bank staff and the customers do not trust these systems blindly.

Happy online banking to all !!

Sunday, October 25, 2015

Clang (LLVM) versus gcc

Below are some of my experiments which list the difference between clang and gcc seen as a user.
(clang version 3.6, gcc version 4.8.4)

Expt. 1: Memory Allocation:
If I declare something like the below (small arrays):
    char a[4];
    char b[4];
    char c[4];

gcc allocates memory addresses to a, b, c which are ending in 0 (aligned to 16 bytes) even if the actual usage is only 4 bytes for each of them.
But clang is able to analyze the size of the array and allocates space only what is needed, thus preventing memory holes.
gcc output:
Addresses for char:
clang output:
Addresses for char:

Expt. 2: Register Allocation:
I have the below testcase, which I am compiling using -O3 optimization.

int main(){
    register int i;
    int a[1000];
    for(i=0;i<1000;i++)          //loop-1
    printf ("%p\n",&i);             //line-1

When I remove line-1, then variable i is allocated register as expected. But when I add line-1, then variable i is not allocated register, but referenced from memory for each iteration of the for loop (loop-1). This was unexpected and on further probing, it was understood that because printf is an extern function and may be spawning threads and modifying variable i using pointers, compiler tries to be pessimistic and references variable from memory.
But the for loop occurs before the printf is called. In no way, the external function could harm using i from register in loop-1. This behavior is properly analyzed by clang but not by gcc. See the disassembly for both of them below. It is clear that clang allocates register rbx to variable i but gcc references it from memory always and hence clang does a better job here.

gcc code:
  400497:    8b 04 24                 mov    (%rsp),%eax
  40049a:    83 c0 01                 add    $0x1,%eax
  40049d:    3d e7 03 00 00           cmp    $0x3e7,%eax
  4004a2:    89 04 24                 mov    %eax,(%rsp)
  4004a5:    7e d9                    jle    400480 <main+0x10>

llvm code:

 40055f:    48 ff c3                 inc    %rbx
 400562:    48 81 fb e8 03 00 00     cmp    $0x3e8,%rbx
 400569:    75 e5                    jne    400550 <main+0x20>

Expt. 3: Memory Allocation:
gcc allocates memory addresses to local variables such that the smaller sized variables are allocated first, followed by larger ones and so on. In this process, it may happen that some of the intermediate addresses are kept unused (for alignment). e.g. char are allocated first, then short and then int.
clang allocates memory addresses in reverse order as gcc. This prevents any unused address space in between the allocated addresses and packs the variables compactly.

while gcc relatively performs bad in terms of space, there is one benefit of such allocation. gcc tries to maximize the number of variables which are closer to the SP and hence would be easily accessible by the immediate offset relative to SP. For variables which are allocated large address offset w.r.t. SP (which can happen if too many variables in the function), far variables cannot be accessed using immediate offset and will require extra processing. Allocating small size variables first enables gcc to enable more variables to be accessed faster compared to llvm.

Sample output from gcc and llvm are printed below:

    char a;
    char c;
    short aa;
    short ab;
    char b;
    int bb;

    printf ("Addresses for char:\n%p\n%p\n%p\n",&a,&b,&c);
    printf ("\nAddresses for short:\n%p\n%p\n",&aa,&ab);
    printf ("\nAddresses for int:\n%p\n\n",&bb);

gcc output:
Addresses for char:

Addresses for short:

Addresses for int:

llvm output:
Addresses for char:

Addresses for short:

Addresses for int: