This post has been moved.
Posts on computing, technology related to java and distributed computing
Saturday, September 27, 2014
Friday, September 19, 2014
Saturday, April 5, 2014
BMDiff - Bentley and Mclroy's compression of long common strings
[1] - http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
[2] - http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.11.8470&rep=rep1&type=pdf
Source Code
Github project BMDiff.
Dry run performance numbers
- JITed version maximum speed
Encoding - 35MB/s
Decoding - 330MB/s
The speeds reported above are due to tests with following parameters
a) 3Gz dual core iMac (8GB RAM) with 1GB allotted to the VM(JDK 1.8 preview).
b) Block length = 32
c) UTF-8 text file sizes 150KB and 7MB.
The speed changes when block length changes(smaller block lengths decrease the speed, larger block length decreases compression).
- There is scope for speed improvement which is the target for next version as this version the target was to produce correct enc/dec.
- There is a slight twist to the original algorithm in the implementation which is when repeated string occur together then still they are chained and represented using notation but for the decoder to decipher this properly for such cases the encoding is instead.
Related Bugs in the open:
HBASE - https://issues.apache.org/jira/browse/HBASE-2655Hadoop - https://issues.apache.org/jira/browse/HADOOP-5793
Thursday, March 20, 2014
TapTime, a selective profiler
About TapTime :
A very light weight mechanism based on byte code injection to get method wall clock times and basic metrics like call count, lowest and max CPU times. Why not dynamic proxy ?(Reflection calls overhead and requirement to rewrite the Monitored Class to implement and enforce interface which is painful as you may only want to trace 1 method but when more methods have to tapped the interface has to be changed. Not practical at all. How is this different from any other profiling tool ? It is suitable for selective profiling unlike most profilers which instrument the entire application, this tool allows the developer to select only interesting parts of the application to be profiled.
How to use the utility:
1. Include javassist.jar and TapTime.jar in the build path /class path.2. To trace methods at runtime annotate them with @Tap
Usage:
1) This has to be the very first line(preferably in a static initializer)
2) Do not maintain any static roots in the main class of those classes whose methods you want to tap.
The utility methods users may invoke
1) printStats();Method avg,min,max time clocked including independant counts.
2) printMethodsTracked();
All the tapped methods are listed out.
3) printCallStackSimple();
Ex: prints like following CALL STACK(SIMPLE)
0(1) 1(1) 1(0) 2(1) 3(1) 3(0) 3(1) 3(0) 2(0) 1(1) 1(0) 0(0)
n(m) - where n = methodId and m = 0/1 where 1 = entered into method, 0 = left from method.
4) printCallStackHierarchy();
Same as printCallStackSimple() but prints in a tree like format for methods traced instead of mundane numbers.
5) generateCommonCallSequences(); // This is the flagship method for statistics. Prints the 1 ,2 3 upto deepest level of all possible call sequence counts.
Ex:
abhi->1
snowy->1
arjun->1
tintin->snowy->1
arjun->abhi->1
tintin->2
Deep dive:
We initially created a TapClassLoader with a custom findClass() implemented which injected instrumentation code on specific classes whose methods are annotated for Tap. Though this can do the inverse delegation of class loading it had to use reflection to instantiate objects of classes under Tap. This is cumbersome from user’s perspective and transfers the burden onto user. We can also make this the System class loader through -Djava.system.class.loader=taptime.TapClassLoader ,unfortunate reality is that all your classes still will be loaded by the parent class loader because of delegation model and not by your TapClassLoader. This applies even when you do
Thread.currentThread().setContextClassloader(tapClassLoader).Though you can say that this is my context class loader java will delegate the call to parent and if parent class loader is able to load the class then thats it. But if you have some weird location from which you are loading classes which are not visible to the parent class loader then you are lucky and can load the classes from the current class loader which is again because of delegation model(This is how class loaders in web containers like tomcat work). So we chose an alternate way - got rid of class loader, instead relied on temporal facts of class loading to modify and persist the bytecode. The only obvious disadvantage of this approach is we need to do a clean rebuild every time you want to run the program which is acceptable considering the smaller build time for java. This way we can be agnostic of existence of multiple threads in your program.
Currently we support specifying root package whose sub packages will be searched for Tapped methods. Plan to support multiple root packages in the future. This has not gone through rigorous testing, please let me know if you find any bugs. I hope this tool will be helpful.
Source Code:
- Hosted as TapTime at Github .Import into Netbeans and run build to generate artifacts.
Feel free to leave your feedback. Hope this tool is of some use to some one.
Saturday, March 8, 2014
TrieMap - A Trie for Java
Trie is a very simple data structure in concept, however the
implementation is a little involved. Suppose you want to have the trie
accept entries like
[ {1,3,4},1 ] , [{1}, 3], [{3,4}, 6] ....
where the values within {} form the keys and the lone value outside it corresponding values then this TrieMap comes in handy for such situations. We can think of behaviors like
1) Adding/updating existing value
Ex: {1,3,4}’s value set to supplied value
2) Incrementing prefix partial key’s corresponding values
Ex: {3,4},6 will update {3}’s value as well as {3,4}’s value
3) Incrementing prefixes of all suffixes of whole key.
Ex:{1,3,4} will update {1} {3} {4} {1,3} {3,4} and {1,3,4}’s values respectively .
4) Iterate through all the possible <K,V> combinations.
Find the open source implementation at github here. This implementation accepts list of numbers(can be int, byte, double etc).
A HashMap could have been used as well. The beauty with Trie is say you have a set of keys
{2,3,4,5}
{2,3}
{2,3,4}
then in a Trie 2 3 and 4 are stored only once because they all belong to same level(if you imagine a Trie as a tree) but a HashMap stores them as many number of times as they appear. There is also a test case within to showcase the usage. Feel free to leave a feedback.
[ {1,3,4},1 ] , [{1}, 3], [{3,4}, 6] ....
where the values within {} form the keys and the lone value outside it corresponding values then this TrieMap comes in handy for such situations. We can think of behaviors like
1) Adding/updating existing value
Ex: {1,3,4}’s value set to supplied value
2) Incrementing prefix partial key’s corresponding values
Ex: {3,4},6 will update {3}’s value as well as {3,4}’s value
3) Incrementing prefixes of all suffixes of whole key.
Ex:{1,3,4} will update {1} {3} {4} {1,3} {3,4} and {1,3,4}’s values respectively .
4) Iterate through all the possible <K,V> combinations.
Find the open source implementation at github here. This implementation accepts list of numbers(can be int, byte, double etc).
A HashMap could have been used as well. The beauty with Trie is say you have a set of keys
{2,3,4,5}
{2,3}
{2,3,4}
then in a Trie 2 3 and 4 are stored only once because they all belong to same level(if you imagine a Trie as a tree) but a HashMap stores them as many number of times as they appear. There is also a test case within to showcase the usage. Feel free to leave a feedback.
Sunday, April 28, 2013
FXTalk , a Java(FX) based gtalk / jabber client
FXTalk is a cross-platform rich UI chat client demonstrating the abilities of
JavaFX platform.
DMG
App
2) Windows (32 bit)
Msi
Exe
We rely on a 3rd party open source library called Smack V 3.2.2. This library though is wonderful has some shortcomings which we address by fixing the smack library and re building it to meet our needs.
1. Resources
1.1 Sources
The source code is in Github .1.2 Executable bundles
1) Mac OSX(built on Lion, but works on Snowleopard, Lion and mountain lion). On Mountain lion however due to Gatekeeper intervention you need to run the app with Cntrl+Click for the 1st time and subsequently you can launch the app normally.DMG
App
2) Windows (32 bit)
Msi
Exe
2. Addressing the short comings of Smack library
We rely on a 3rd party open source library called Smack V 3.2.2. This library though is wonderful has some shortcomings which we address by fixing the smack library and re building it to meet our needs.
1) Most proxies may not allow connecting to it via itself . So setting proxy and creating a socket to proxy will try to route through itself and fail.
Fix: org.jivesoftware.smack.proxy.HTTPProxySocketFactory in the method httpProxifiedSocket() change the following line Socket socket = new Socket(proxyhost,proxyPort); to Socket socket = new Socket(Proxy.NO_PROXY); socket.connect( new InetSocketAddress(proxyhost,proxyPort)); |
Fix:
org.jivesoftware.smack.proxy.ProxyInfo 1 ) Add new enum ProxyType.HTTPS and method forHttpsProxy() 2 ) Within getSocketFactory() method create HTTPSProxySocketFactory() if the ProxyType is HTTPS. 2 ) Implement your own HTTPSProxySocketFactory which creates a normal HTTP connection to the proxy and then tunnel SSL through it. |
3. Quick look at Features
- Supports user subscription,friend request, setting avatar(Avatar editor pops up when you click your image), changing presence, set status, signout, set default account,delete accounts,set proxy, etc.
- Supports File transfer (by default we enable only IBB) hence please do not try to transfer large files.
- Password is encrypted by key which is secure randomly generated when ran the first time. Any one who can read your home folder can read the key and hence can decrypt your password.We disable read/write/exec access for any user other than the one who runs the App hence this approach is "Safe enough" . If you still think it is not enough for you (OR) your machine is shared machine then do not save the account.
- Tested successfully on GTalk and Oracle(Jabber) systems.
- Tested on MacOSX(lion) and Win 7(and native bundles are available for download see Resources Section 6).
4. Features planned for future releases
- Group chat
- Themable chat bubbles
- Sock5 based file transfer.
- Jingle based Voice chat.
5. Snapshots
6. Native Bundles
Generating native bundles is not straight forward as the bundling skips ext from jre and hene fetures like https access will not be available. To circumvent this we use custom configuration scripts for deployment.
- Set environment variable JAVAFX_ANT_DEBUG to true
- Create windows script package/windows/FXTalk-post-image.wsf
- Create shell script for MacOSX script package/macosx/FXTalk-post-image.sh
- Copy your custom resources to
package/macosx package/windows package/linux
- Run
ant -lib . jfx-build-native
6.1 Windows (FXTalk-post-image.wsf)
<?xml version="1.0" ?> <package> <job id="postImage"> <script language="JScript"> <![CDATA[ var oFSO = new ActiveXObject("Scripting.FileSystemObject"); var oFolder = oFSO.getFolder("."); var from1 = "C:\\Program Files\\Java\\jdk1.7.0_17\\jre\\lib\\ext\\sunjce_provider.jar"; var from2 = "C:\\Program Files\\Java\\jdk1.7.0_17\\jre\\lib\\ext\\access-bridge.jar"; var from3 = "C:\\Program Files\\Java\\jdk1.7.0_17\\jre\\lib\\ext\\jaccess.jar"; var from4 = "C:\\Program Files\\Java\\jdk1.7.0_17\\jre\\lib\\ext\\sunpkcs11.jar"; var from5 = "C:\\Program Files\\Java\\jdk1.7.0_17\\jre\\lib\\ext\\dnsns.jar"; var from6 = "C:\\Program Files\\Java\\jdk1.7.0_17\\jre\\lib\\ext\\localedata.jar"; var from7 = "C:\\Program Files\\Java\\jdk1.7.0_17\\jre\\lib\\ext\\sunec.jar"; var from8 = "C:\\Program Files\\Java\\jdk1.7.0_17\\jre\\lib\\ext\\sunmscapi.jar"; var from9 = "C:\\Program Files\\Java\\jdk1.7.0_17\\jre\\lib\\ext\\zipfs.jar"; var to1 = "C:\\Users\\srikalyc\\Documents\\NetBeansProjects\\FXTalk\\dist\\bundles\\FXTalk\\runtime\\jre\\lib\\ext"; var to2 = ".\\FXTalk\\runtime\\jre\\lib\\ext"; if (!oFSO.FolderExists(to1)) { oFSO.CreateFolder(to1); } if (!oFSO.FolderExists(to2)) { oFSO.CreateFolder(to2); } to1 += "\\"; to2 += "\\"; oFSO.CopyFile(from1, to1); oFSO.CopyFile(from2, to1); oFSO.CopyFile(from3, to1); oFSO.CopyFile(from4, to1); oFSO.CopyFile(from5, to1); oFSO.CopyFile(from6, to1); oFSO.CopyFile(from7, to1); oFSO.CopyFile(from8, to1); oFSO.CopyFile(from9, to1); oFSO.CopyFile(from1, to2); oFSO.CopyFile(from2, to2); oFSO.CopyFile(from3, to2); oFSO.CopyFile(from4, to2); oFSO.CopyFile(from5, to2); oFSO.CopyFile(from6, to2); oFSO.CopyFile(from7, to2); oFSO.CopyFile(from8, to2); oFSO.CopyFile(from9, to2); ]]> </script> </job> </package>
6.2 Macosx (FXTalk-post-image.sh)
If you edit the following script in Windows there is high probability that it will cause weird failures and errors(due to \n \r characters), just recreate it on Mac OSX and redo it.
- Do not forget to change the info.plist for jdk.
- Do not forget to change the info.plist for jdk.
#!/bin/bash #The following are required for DMG image to have the ext folder cd ../images/dmg.image/FXTalk.app/Contents/PlugIns/jdk1.7.0_17.jdk/Contents/Home/jre/lib mkdir ext cp -p /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar ext cp -p /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/dnsns.jar ext cp -p /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/localedata.jar ext cp -p /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/sunec.jar ext cp -p /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar ext cp -p /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/zipfs.jar ext #The following are required to patch the .app package mkdir /Users/srikalyc/NetbeansProjects/FXTalk/dist/bundles/FXTalk.app/Contents/PlugIns/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext cp /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar /Users/srikalyc/NetbeansProjects/FXTalk/dist/bundles/FXTalk.app/Contents/PlugIns/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext cp /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/dnsns.jar /Users/srikalyc/NetbeansProjects/FXTalk/dist/bundles/FXTalk.app/Contents/PlugIns/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext cp /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/localedata.jar /Users/srikalyc/NetbeansProjects/FXTalk/dist/bundles/FXTalk.app/Contents/PlugIns/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext cp /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/sunec.jar /Users/srikalyc/NetbeansProjects/FXTalk/dist/bundles/FXTalk.app/Contents/PlugIns/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext cp /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar /Users/srikalyc/NetbeansProjects/FXTalk/dist/bundles/FXTalk.app/Contents/PlugIns/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext cp /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext/zipfs.jar /Users/srikalyc/NetbeansProjects/FXTalk/dist/bundles/FXTalk.app/Contents/PlugIns/jdk1.7.0_17.jdk/Contents/Home/jre/lib/ext
Please leave your suggestions, comments and feedback here.
Monday, January 14, 2013
Is there no WriterOutputStream in java ?
I have been wondering why there is no WriterOutputStream available in java.io package. I was presented with a weird situation where there was a 3rd party API we had to live with ,which accepted only OutputStream , but forced to use the available PrintWriter which deals with text unlike byte streams. If you want to do the other way round its many to one mapping . See fig 1
So you just put strings , integers etc into the writer and behind the scenes they are just converted to stream of bytes. But when doing bytes to text you need to know what was the Character encoding used to represent the text as bytes, most often than not it is UTF-8 , assuming this we will go ahead and write a simple WriterOutputStream which will satiate the above said need (a stream wrapper for our writer) to specific extent.
So you just put strings , integers etc into the writer and behind the scenes they are just converted to stream of bytes. But when doing bytes to text you need to know what was the Character encoding used to represent the text as bytes, most often than not it is UTF-8 , assuming this we will go ahead and write a simple WriterOutputStream which will satiate the above said need (a stream wrapper for our writer) to specific extent.
1 package javatest; 2 3 import java.io.IOException; 4 import java.io.OutputStream; 5 import java.io.PrintWriter; 6 7 /** 8 * This class may blow up if your stream was encoded with different character code(than UTF-8) 9 * in which case just re implement (extend this class or write your own) with 10 * a Character Decoder for the appropriate character set. 11 * @author srikalyc 12 */ 13 public class WriterOutputStream extends OutputStream { 14 private PrintWriter writer; 15 16 public WriterOutputStream(PrintWriter writer) { 17 this.writer = writer; 18 } 19 20 /** 21 * Because only the lower order byte is used. 22 * @param b 23 * @throws IOException 24 */ 25 @Override 26 public void write(int b) throws IOException { 27 writer.write((byte)(b & 0x000000ff)); 28 } 29 30 @Override 31 public void write(byte[] b) throws IOException { 32 writer.append(new String(b)); 33 flush(); 34 } 35 36 @Override 37 public void flush() throws IOException { 38 writer.flush(); 39 } 40 41 @Override 42 public void close() throws IOException { 43 writer.close(); 44 } 45 46 47 } 48
There are other ways to go about this as well, for smaller streams you could do something like this
ByteArrayOutputStream baos = new ByteArrayOutputStream();
//Pass it to the 3rd party method which accepts only streams
thirdPartyClass.method(baos) ;
writer.println(baos.toString());
The baos acts like a nexus between the stream and text worlds. This could have been more efficiently
handled with ByteBuffers etc. Apache IO has a superb collection of Classes that does similar things in more extensive ways.
Subscribe to:
Posts (Atom)