Saturday, April 5, 2014

BMDiff - Bentley and Mclroy's compression of long common strings

       A java implementation inspired from BigTable paper[1] description about Bentley and McIlroy’s long string compression scheme[2].

[1] - http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf

[2] - http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.11.8470&rep=rep1&type=pdf

Source Code

Github project BMDiff.

Dry run performance numbers

  •  JITed version maximum speed Encoding - 35MB/s
    Decoding - 330MB/s
    The speeds reported above are due to tests with following parameters
    a) 3Gz dual core iMac (8GB RAM) with 1GB allotted to the VM(JDK 1.8 preview).
    b) Block length = 32
    c) UTF-8 text file sizes 150KB and 7MB.
    The speed changes when block length changes(smaller block lengths decrease the speed, larger block length decreases compression).
  • There is scope for speed improvement which is the target for next version as this version the target was to produce correct enc/dec.
  • There is a slight twist to the original algorithm in the implementation which is when repeated string occur together then still they are chained and represented using notation but for the decoder to decipher this properly for such cases the encoding is instead.

Related Bugs in the open:

HBASE - https://issues.apache.org/jira/browse/HBASE-2655
Hadoop - https://issues.apache.org/jira/browse/HADOOP-5793

Thursday, March 20, 2014

TapTime, a selective profiler

About TapTime :

          A very light weight mechanism based on byte code injection to get method wall clock times and basic metrics like call count, lowest and max CPU times. Why not dynamic proxy ?(Reflection calls overhead and requirement to rewrite the Monitored Class to implement and enforce interface which is painful as you may only want to trace 1 method but when more methods have to tapped the interface has to be changed. Not practical at all. How is this different from any other profiling tool ? It is suitable for selective profiling unlike most profilers which instrument the entire application, this  tool allows the developer to select only interesting parts of the application to be profiled.

 How to use the utility:

1.  Include javassist.jar and TapTime.jar in the build path /class path.
2.  To trace methods at runtime annotate them with @Tap

Usage:

TapTime.init("package");
1) This has to be the very first line(preferably in a static initializer)
2) Do not maintain any static roots in the main class of those classes whose methods you want to tap.

The utility methods users may invoke

1) printStats();
Method avg,min,max time clocked including independant counts.

2) printMethodsTracked();
All the tapped methods are listed out.

3) printCallStackSimple();
Ex: prints like following CALL STACK(SIMPLE)
0(1) 1(1) 1(0) 2(1) 3(1)  3(0) 3(1) 3(0) 2(0) 1(1) 1(0) 0(0)
n(m)
- where n = methodId and m = 0/1 where 1 = entered into method, 0 = left from method.

4) printCallStackHierarchy();
Same as printCallStackSimple() but prints in a tree like format for methods traced instead  of mundane numbers.

5) generateCommonCallSequences(); // This is the flagship method for statistics. Prints the 1 ,2 3 upto deepest level of all possible call sequence counts.

Ex:
abhi->1
snowy->1
arjun->1
tintin->snowy->1
arjun->abhi->1
tintin->2


Deep dive:

         We initially created a TapClassLoader with a custom findClass() implemented which injected instrumentation code on specific classes whose methods are annotated for Tap. Though this can do the inverse delegation of class loading it had to use reflection to instantiate objects of classes under Tap. This is cumbersome from user’s perspective and transfers the burden onto user. We can also make this the System class loader through -Djava.system.class.loader=taptime.TapClassLoader ,unfortunate reality is that all your classes still will be loaded by the parent class loader because of delegation model and not by your TapClassLoader. This applies even when you do 
      Thread.currentThread().setContextClassloader(tapClassLoader).
       Though you can say that this is my context class loader java will delegate the call to parent and if parent class loader is able to load the class then thats it. But if you have some weird location from which you are loading classes which are not visible to the parent class loader then you are lucky and can load the classes from the current class loader which is again because of delegation model(This is how class loaders in web containers like tomcat work). So we chose an alternate way - got rid of class loader, instead relied on temporal facts of class loading to modify and persist the bytecode. The only obvious disadvantage of this approach is we need to do a clean rebuild every time you want to run the program which is acceptable considering the smaller build time for java. This way we can be agnostic of existence of multiple threads in your program.
      Currently we support specifying root package whose sub packages will be searched for Tapped methods. Plan to support multiple root packages in the future. This has not gone through rigorous testing, please let me know if you find any bugs. I hope this tool will be helpful.

Source Code:

  • Hosted as TapTime at Github .Import into Netbeans and run build to generate artifacts. 


Feel free to leave your feedback. Hope this tool is of some use to some one.


Saturday, March 8, 2014

TrieMap - A Trie for Java

         Trie is a very simple data structure in concept, however the implementation is a little involved. Suppose you want to have the trie accept entries like
             [ {1,3,4},1 ] ,  [{1}, 3],  [{3,4}, 6] ....
where the values within {} form the keys and the lone value outside it corresponding values then this TrieMap comes in handy for such situations. We can think of behaviors like
1) Adding/updating existing value  
        Ex: {1,3,4}’s value set to supplied value
2) Incrementing prefix partial key’s corresponding values  
        Ex: {3,4},6 will update {3}’s value as well as {3,4}’s value
3) Incrementing prefixes of all suffixes of whole key.
        Ex:{1,3,4} will update {1} {3} {4} {1,3} {3,4} and {1,3,4}’s values respectively .
4) Iterate through all the possible <K,V> combinations.
     Find the open source implementation at github here.  This implementation accepts list of numbers(can be int, byte, double etc).


      A HashMap could have been used as well. The beauty with Trie is say you have a set of keys
{2,3,4,5}
{2,3}
{2,3,4}
  then in a Trie 2  3  and 4 are stored only once because they all belong to same level(if you imagine a Trie as a tree) but a HashMap stores them as many number of times as they appear. There is also a test case within to showcase the usage. Feel free to leave a feedback.

Sunday, April 28, 2013

FXTalk , a Java(FX) based gtalk / jabber client

 FXTalk is a  cross-platform rich UI chat client demonstrating the abilities of JavaFX platform.

1. Resources

1.1 Sources

The source code is in Github .

1.2 Executable bundles

1) Mac OSX(built on Lion, but works on Snowleopard, Lion and mountain lion). On Mountain lion however due to Gatekeeper intervention you need to run the app with Cntrl+Click for the 1st time and subsequently you can launch the app normally.
           DMG
           App

2) Windows (32 bit)
           Msi
           Exe


2. Addressing the short comings of Smack library


We rely on a 3rd party open source library called Smack V 3.2.2. This library though is wonderful has some shortcomings which we address by fixing the smack library and re building it to meet our needs.

1) Most proxies may not allow connecting to it via itself . So setting proxy and creating a socket to proxy will try to route through itself and fail.

  Fix:
    org.jivesoftware.smack.proxy.HTTPProxySocketFactory in the method httpProxifiedSocket() change the following line
             Socket socket = new Socket(proxyhost,proxyPort);
to
             Socket socket = new Socket(Proxy.NO_PROXY);
             socket.connect(new InetSocketAddress(proxyhost,proxyPort));
2) When creating an SSL socket behind firewall & proxy there must be provision for HTTPS socket creation with an understanding there is a proxy in between
Fix:
org.jivesoftware.smack.proxy.ProxyInfo
    1) Add new enum ProxyType.HTTPS and method forHttpsProxy()
    2) Within getSocketFactory() method create HTTPSProxySocketFactory() if the ProxyType is HTTPS.
    2) Implement your own HTTPSProxySocketFactory which creates a normal HTTP connection to the proxy and then tunnel SSL through it.
We have built our own version of Smack library. This can be contributed to the  Smack source code base.

3. Quick look at Features

  1. Supports user subscription,friend request, setting avatar(Avatar editor pops up when you click your image), changing presence, set status, signout, set default account,delete accounts,set proxy, etc.
  2. Supports File transfer (by default we enable only IBB) hence please do not try to transfer large files.
  3. Password is encrypted by key which is secure randomly generated when ran the first time. Any one who can read your home folder can read the key and hence can decrypt your password.We disable read/write/exec access for any user other than the one who runs the App hence this approach is "Safe enough" . If you still think it is not enough for you (OR) your machine is shared machine then do not save the account.
  4. Tested successfully on GTalk and Oracle(Jabber) systems.
  5. Tested on MacOSX(lion) and Win 7(and native bundles are available for download see Resources Section 6).

4. Features planned for future releases

  1. Group chat
  2. Themable chat bubbles
  3. Sock5 based file transfer.
  4. Jingle based Voice chat.

5. Snapshots

6. Native Bundles


Generating native bundles is not straight forward as the bundling skips ext from jre and hene fetures like https access will not be available. To circumvent this we use custom configuration scripts for deployment.
  • Set environment variable JAVAFX_ANT_DEBUG to true
  • Create windows script    package/windows/FXTalk-post-image.wsf
  • Create shell script for MacOSX script    package/macosx/FXTalk-post-image.sh
  • Copy your custom resources to
    package/macosx
    package/windows
    package/linux

  • Run    
    ant -lib . jfx-build-native

6.1 Windows (FXTalk-post-image.wsf)

<?xml version="1.0" ?>
<package>  
   <job id="postImage">  
    <script language="JScript">  
     <![CDATA[  
        var oFSO = new ActiveXObject("Scripting.FileSystemObject");  
        var oFolder = oFSO.getFolder(".");  
        var from1 = "C:\\Program Files\\Java\\jdk1.7.0_17\\jre\\lib\\ext\\sunjce_provider.jar";  
        var from2 = "C:\\Program Files\\Java\\jdk1.7.0_17\\jre\\lib\\ext\\access-bridge.jar";  
        var from3 = "C:\\Program Files\\Java\\jdk1.7.0_17\\jre\\lib\\ext\\jaccess.jar";  
        var from4 = "C:\\Program Files\\Java\\jdk1.7.0_17\\jre\\lib\\ext\\sunpkcs11.jar";  
        var from5 = "C:\\Program Files\\Java\\jdk1.7.0_17\\jre\\lib\\ext\\dnsns.jar";  
        var from6 = "C:\\Program Files\\Java\\jdk1.7.0_17\\jre\\lib\\ext\\localedata.jar";  
        var from7 = "C:\\Program Files\\Java\\jdk1.7.0_17\\jre\\lib\\ext\\sunec.jar";  
        var from8 = "C:\\Program Files\\Java\\jdk1.7.0_17\\jre\\lib\\ext\\sunmscapi.jar";  
        var from9 = "C:\\Program Files\\Java\\jdk1.7.0_17\\jre\\lib\\ext\\zipfs.jar";  
        var to1 = "C:\\Users\\srikalyc\\Documents\\NetBeansProjects\\FXTalk\\dist\\bundles\\FXTalk\\runtime\\jre\\lib\\ext";  
        var to2 = ".\\FXTalk\\runtime\\jre\\lib\\ext";  
        if (!oFSO.FolderExists(to1)) {  
          oFSO.CreateFolder(to1);  
        }  
        if (!oFSO.FolderExists(to2)) {  
          oFSO.CreateFolder(to2);  
        }  
        to1 += "\\";  
        to2 += "\\";  
        oFSO.CopyFile(from1, to1);  
        oFSO.CopyFile(from2, to1);  
        oFSO.CopyFile(from3, to1);  
        oFSO.CopyFile(from4, to1);  
        oFSO.CopyFile(from5, to1);  
        oFSO.CopyFile(from6, to1);  
        oFSO.CopyFile(from7, to1);  
        oFSO.CopyFile(from8, to1);  
        oFSO.CopyFile(from9, to1);  

        oFSO.CopyFile(from1, to2);  
        oFSO.CopyFile(from2, to2);  
        oFSO.CopyFile(from3, to2);  
        oFSO.CopyFile(from4, to2);  
        oFSO.CopyFile(from5, to2);  
        oFSO.CopyFile(from6, to2);  
        oFSO.CopyFile(from7, to2);  
        oFSO.CopyFile(from8, to2);  
        oFSO.CopyFile(from9, to2);  
     ]]>  
    </script>  
   </job>  
</package>              

6.2 Macosx (FXTalk-post-image.sh)

If you edit the following script in Windows there is high probability that it will cause weird failures and errors(due to \n \r characters), just recreate it on Mac OSX and redo it.
- Do not forget to change the info.plist for jdk.
#!/bin/bash

#The following are required for DMG image to have the ext folder
cd ../images/dmg.image/FXTalk.app/Contents/PlugIns/jdk1.7.0_17.jdk/Contents/Home/jre/lib
mkdir ext
cp -p /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar ext
cp -p /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/dnsns.jar ext
cp -p /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/localedata.jar ext
cp -p /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/sunec.jar ext
cp -p /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar ext
cp -p /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/zipfs.jar ext

#The following are required to patch the .app package
mkdir /Users/srikalyc/NetbeansProjects/FXTalk/dist/bundles/FXTalk.app/Contents/PlugIns/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext
cp /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar /Users/srikalyc/NetbeansProjects/FXTalk/dist/bundles/FXTalk.app/Contents/PlugIns/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext
cp /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/dnsns.jar /Users/srikalyc/NetbeansProjects/FXTalk/dist/bundles/FXTalk.app/Contents/PlugIns/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext
cp /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/localedata.jar /Users/srikalyc/NetbeansProjects/FXTalk/dist/bundles/FXTalk.app/Contents/PlugIns/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext
cp /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/sunec.jar /Users/srikalyc/NetbeansProjects/FXTalk/dist/bundles/FXTalk.app/Contents/PlugIns/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext
cp /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar /Users/srikalyc/NetbeansProjects/FXTalk/dist/bundles/FXTalk.app/Contents/PlugIns/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext
cp /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/zipfs.jar /Users/srikalyc/NetbeansProjects/FXTalk/dist/bundles/FXTalk.app/Contents/PlugIns/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext
 
 

Please leave your suggestions, comments and feedback here.

Monday, January 14, 2013

Is there no WriterOutputStream in java ?

I have been wondering why there is no WriterOutputStream available in java.io package.  I was presented with a weird situation where there was a 3rd party API we had to live with ,which accepted only OutputStream , but forced to use the available PrintWriter which deals with text unlike byte streams. If you want to do the other way round its many to one mapping . See fig 1

       


So you just put strings , integers etc into the writer and behind the scenes they are just converted to stream of bytes. But when doing bytes to text you need to know what was the Character encoding used to represent the text as bytes, most often than not it is UTF-8 , assuming this we will go ahead and write a simple WriterOutputStream which will satiate the above said need (a stream wrapper for our writer) to specific extent.



 1 package javatest;
 2 
 3 import java.io.IOException;
 4 import java.io.OutputStream;
 5 import java.io.PrintWriter;
 6 
 7 /**
 8  * This class may blow up if your stream was encoded with different character code(than UTF-8)
 9  * in which case just re implement (extend this class or write your own) with
10  * a Character Decoder for the appropriate character set.
11  * @author srikalyc
12  */
13 public class WriterOutputStream extends OutputStream {
14     private PrintWriter writer;
15 
16     public WriterOutputStream(PrintWriter writer) {
17         this.writer = writer;
18     }
19 
20     /**
21      * Because only the lower order byte is used.
22      * @param b
23      * @throws IOException 
24      */
25     @Override
26     public void write(int b) throws IOException {
27         writer.write((byte)(b & 0x000000ff));
28     }
29 
30     @Override
31     public void write(byte[] b) throws IOException {
32         writer.append(new String(b));
33         flush();
34     }
35 
36     @Override
37     public void flush() throws IOException {
38         writer.flush();
39     }
40 
41     @Override
42     public void close() throws IOException {
43         writer.close();
44     }
45 
46     
47 }
48 
 
There are other ways to go about this as well, for smaller streams you could do something like this
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        //Pass it to the 3rd party method which accepts only streams
        thirdPartyClass.method(baos) ;
        writer.println(baos.toString());


The baos acts like a nexus between the stream and text worlds. This could have been more efficiently
handled with ByteBuffers etc. Apache IO has a superb collection of Classes that does similar things in more extensive ways.