Google calendar

Tuesday 28 June 2016

Communication WebSocket and concurrency

The goal is to achieve multiple notes to be downloaded from the peers. Hence the concurrency, the downloading should not block the main running thread.This is common to IpfsNotebookrepo or Bittorentrepo. So how should the design be ?

Here is the current IpfsNotebookrepo class.
The get(hash : Multihash) and get(url : MagnetURL) are blocking calls. It waits till it downloads from peer. Hence they have to be run in a thread. Hence various approaches are
  1. IpfsNotebookRepo implements Runnable and submit it to scheduler. But I will have to create new IpfsNotebookRepo instances everytime.
  2. Create a class IpfsDownloadTask implements Runnable/Callable . Should this class be nested , inner or a separate class. If it is a separate class it should contain IpfsNotebookRepo instance as a member to call .get method.
 I have created a separate example project just focusing on the main part. 


here is the code..
 
Currently I have used callbacks from google-gauva. After the download is complete  send method is called with appropriate operation to notify the user.

So here are my questions 
  1. IpfsTask class call method currently calls getNote which just returns uppercase, actually it will be returning the note in string from peer. Where should this class be ? inner, separate ? If separate , it should contain IpfsNotebookRepo instance ?
  2. After the note is downloaded I need to call the importNote from Notebook Server class which actually adds the note and broadcasts. How to achieve this ?
  3. Ipfs servlet listens on separate url path for websocket. Should it be part of Notebook server path ? 
I thinks design will be common to Bittorrent as well. So I would be grateful if you would give your help and advice on the design of communication.

Monday 6 June 2016

DHT in java

Available libraries for torrent in java.

  1. frostwire-jlibtorrent
  2. ttorrent
 Comparison 

Sr No.
Feature
Frostwire-jlibtorrent
Ttorrent
1
DHT
Yes
No
2
Magnet uri
Yes
No
3
License
MIT license
Apaches Software License 2.0

Rest main features are present in both.
Ttorrent has more stars than frostwire-jlibtorrent on github.


References
1] https://github.com/frostwire/frostwire-jlibtorrent
2] https://github.com/mpetazzoni/ttorrent 

Sunday 5 June 2016

Dat

Dat

dat is similar to ipfs, p2p file sharing. Earlier dat's goal was to allow sharing and versioning of tabular data(csv), json. dat alpha was more about syncing non tabular files, single centralized repository like Dropbox. you can read here more about how dat evolved.


Current api 1.0 has only two commands dat link and dat <share-link>. The debug flag prints more output like bittorent-dht node queries.

Unlike ipfs dat sha256 hash also considers file modes(permissions) among the other filesystem metadata. Dat uses a variety of different methods to discover peers that have the data it's looking for, including DNS, Multicast DNS, UDP, and TCP. Like ipfs it also has bootstrap nodes.

Key differences to BitTorrent

Although file sharing using Hyperdrive on the surface could seem similar to tools such as BitTorrent there are a few key differences.

  1. Not all metadata needs to synced up front

  2. BitTorrent requires you to fetch all metadata relating to magnet link from a single peer before allowing you to fetch any file content from any other peer. This also makes it difficult to share archives of 1000s of files using BitTorrent.
  3. Flexible and consistently small block sizes

  4. BitTorrent requires a fixed block size that usually grows with the size of the content you are sharing. This is related the above mentioned fact that all metadata needs to be exchanged from a single peer before any content can be exchanged. By increasing the block size you decrease the number of hashes you need to exchange up front. Flexible block sizes also allow for more non file related use cases, such as the file metadata feed described above.
  5. Deduplication

  6. BitTorrent inlines all files into a single feed. This combined with the fixed block sizes makes deduplication hard and you often end up downloading the same files multiple times if an update to a torrent is published.
  7. Multiplexed swarms

  8. Unlike BitTorrent, wires can be reused to share multiple swarms which results in a smaller connection overhead when downloading or uploading multiple feeds shared between two or more peers.

Good read, documentation [1] 
and the dependencies [2]

Friday 3 June 2016

IPFS

IPFS is Inter Planetary Filesystem. It was presented by Juan Benet of Stanford. IPFS is a P2P based exchange of Git objects using Bittorent protocol in a single swarm in a single repository.In his paper he talks about how it can be the permanent distributed web. IPFS provides a high throughput content-addressed block storage model, with content-addressed hyper links. This forms a generalized Merkle DAG, a data structure upon which one can build versioned file system.

Key features :
  •  DHT
    • IPFS uses S/Kademlia DHT to  find peers in the network, query for providers, get and put values.Each node has a Public key and NodeId is hash of the key. 
    • Kademlia uses the XOR distance to store values in the closest nodes. Resistance to sybill attacks.It requires nodes to create a PKI key pair, derive their identity from it, and sign their messages to each other.
    •  It also uses some features of Coral.
  • Block Exchange
    • Bittorrent protocol has block exchange of data. These pieces are exchanged based on some strategy like tit-for-tat or rarest piece first. IPFS uses Bitswap strategy.
    • Unlike BitTorrent, BitSwap is not limited to the blocks in one torrent. BitSwap operates as a persistent marketplace where node can acquire the blocks they need, regardless of what files those blocks are part of.
    • This strategy makes use of Bitswap credit and debt ratio. debt ratio increases if node receives more bytes than it has sent. Peers send blocks to debtor peers probabilistically.
  • Merkle DAG
    • IPFS objects are closely related to Git objects. IPFS builds a Merkle DAG, a directed acyclic graph where links between objects are cryptographic hashes of the targets embedded in the sources.
    •  the hash is multihash defined as
      <1-byte hash function code><1-byte digest size in bytes><hash function output>
      Most of the hashes start with "Qm" because the hash used is SHA256 and the length is 32.
  • Mutable Namespace

IPFS Objects and Merkle DAG

 IPFS Object has the following structure
  • Links - an array of links it references
  • Data - byte array. blob of size < 256 kb
IPFS Link has the following structure
  • Name - string name for the link.
  • Hash - hash of the linked ipfs object
  • Size - total size of target object 
So here I have an example directory. file.txt and ss have the same content, their hash have to be same.


ipfs object get QmawZYe7nVgbonstM9YLkbJPrwaSMAJ7nkWsPFxHJbCLRF
on the root object gives this output.

{
  "Links": [
    {
      "Name": "2A94M5J1Z",
      "Hash": "QmNhPUwuUQ1uD1n22h2CEBFLKPCExCiVc7rcgHmMftmzsv",
      "Size": 12562
    },
    {
      "Name": "bank-full.csv",
      "Hash": "QmXhyWEd21XEv4pJGHbxoFq6oud3HhADQjw6f5xR4NwDvo",
      "Size": 4611473
    },
    {
      "Name": "file.txt",
      "Hash": "QmXrP2yBFo1jvWw2WnY1mdCYJdiabW1WCmQwsYw1Ltfd2M",
      "Size": 32
    },
    {
      "Name": "shogun",
      "Hash": "QmdWtUhQzAX6e2xpDxZTJEwobHzUTuuVBWaYM8D5rzMTQs",
      "Size": 622130
    }
  ],
  "Data": "\u0008\u0001"
}
As you can see the link names are the name of the files or directories but  for individual file the links don't have names. Also if a file < 256Kb it does not reference any objects i.e links array is empty. file.txt is small and bank-full.csv is large.
ipfs object get QmXhyWEd21XEv4pJGHbxoFq6oud3HhADQjw6f5xR4NwDvo
on bank-full.csv

{
  "Links": [
    {
      "Name": "",
      "Hash": "QmRA9jHW1DFa4brtGSSmWeEpXRX5apS7zxvAfgbJ3F599N",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmNN8xinNToC6sz7xHMcBe6YPyd8Ryx3wWqkEeRYUTEEhn",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmbSXZPGz7GiMz3iP6r7V6zMCxhT2EzTZGVkdJc3mcXPkj",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmUEEzoSFDVQwKSZmQMW8U79jUptjLkJAcjMbZjoWrsnKa",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmQwWkwAiHDuuuYTX8S1Hbks7USkfaD7A5Vf8Qmpyz1uaP",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmY4QEsrrCWdmqAKtSUZWtpTPd58niySdHsq4YXH59ZpiK",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "Qmbp6oskBFZGE3AQhjmm8ZRzZ1rCRaWzUg34zQK7SP8Mxm",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmQyn37YawL1mCGs3SNmyLNRi1AuXsaNNwWVxkzuomTvQX",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmbjD9fqBk9kGF9W5vFFLHcnfiiXH8zE2pVRwBTWjxGdV3",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmU42pLqrKNp3hDNfgw74omWaqLrjMBWw3Uvx98d2CNn2u",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmPNrToiZUfUEC2w75bw51GizQPP9xwm6wa56vKgGfHZW3",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmSu1UK8xqvDbWZSTvHzYxEPz2qLNTcii5NVd7NSnDcSAm",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmVxGKfp77DfPUjvzKfKx8bpYDbSHZtrmSXzz8wyD7t7nH",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmZaiEbhTiXt7rvwPSR9FS6WyEosj2KmZdLqxPeZ8WCYrt",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmWhkGkiw5REEqntnke2v6SbzqpF5SctuwKtwngu28sARv",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmQsv8Nbfvt1RjtxbU5gyQLVprJ6Uz81N5HdAiffJ6zRoX",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmTZwqWCVjs876DQBQxZhG5XngVPpXAz8h8fsRonMQGruW",
      "Size": 262158
    },
    {
      "Name": "",
      "Hash": "QmaMFU4hByFEpAX6ZEvcvia2Su8xwbKMfTTL74VMh3rYRM",
      "Size": 153914
    }
  ],
  "Data": "\b\u0002\u0018���\u0002 ��\u0010 ��\u0010 ��\u0010 ��\u0010 ��\u0010 ��\u0010 ��\u0010 ��\u0010 ��\u0010 ��\u0010 ��\u0010 ��\u0010 ��\u0010 ��\u0010 ��\u0010 ��\u0010 ��\u0010 ��\t"
}
ipfs object get QmXrP2yBFo1jvWw2WnY1mdCYJdiabW1WCmQwsYw1Ltfd2M
on file.txt or ss

{
  "Links": [],
  "Data": "\b\u0002\u0012\u0018dfad\nf\nc\na\nadkfakdfmaaa\n\u0018\u0018"
}

Merkle Dag

Merkle tree/Dag is used in Git objects and bitcoin, cryptography. Each node has a hash and it is hash of its children hashes combined.The root hash is the final hash of the object.

The leaf nodes contain the data. The links array is empty. Large files which has many ipfs objects/blocks does not have have link names for each one.graphmd can also be used to visualize the graph.

Versioning
Ipfs uses git like commit trees. Unchanged files point to previous objects. In Ipfs files are divided so if a part of large file is changed only that new object will be added to tree, rest will be deduplicated.
  1. block : a variable-size block of data.
  2. list : a collection of blocks or other lists.
  3. tree : a collection of blocks, lists, or other trees.
  4. commit : a snapshot in the version history of a tree.

Sharing Files 

Coming to main point use case, sharing files with peers. So I added some files sent the hash to my friend and asked him to get
ipfs  get QmawZYe7nVgbonstM9YLkbJPrwaSMAJ7nkWsPFxHJbCLRF
But it did not download on his PC waiting for a long time. I didn't understand why it didn't download.Also he was not in the list of my peers(ipfs swarm peers) but he was able to download via the browser.
https://ipfs.io/ipfs/QmawZYe7nVgbonstM9YLkbJPrwaSMAJ7nkWsPFxHJbCLRF

The problem was he had different version of ipfs than mine. You can check via ipfs id.     
     "AgentVersion": "go-libp2p/0.1.0",
     "ProtocolVersion": "ipfs/0.1.0"

 So after downloading the same version. I was able to download the file he sent instantly. His id is the highlighted one.

And also able to download the file via ipfs-java-api .

Things to note : java-ipfs-api requires target jdk 1.8 . When I tried to run my code I got major minor version error.Also before running the code the daemon should be running.
ipfs daemon

ipfs objects are pinned which are added via ipfs add. you can see all the list
ipfs pin ls  will show all the pinned ipfs objects and you are serving them when you run the daemon.

IPNS

If the content changes the hash changes, if you need to serve some mutable content you can do via ipns. All you have to do is add the ipfs-path to your public key.
ipfs name publish /ipfs/QmawZYe7nVgbonstM9YLkbJPrwaSMAJ7nkWsPFxHJbCLRF

/ipns/<your pubic key>  will download the above linked contents. Hence using this ipns  link we can add new ipfs path to our public key and other users do not need to get this new ipfs link.

Monday 9 May 2016

Some silly questions & xml-pull request

Questions

  1. To debug I add the following line in zeppelin-env.sh
    export ZEPPELIN_JAVA_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=9009"
    I use Intellij Idea IDE and then remote debug.I set the breakpoints and able to watch all the variables. when I run the paragraph I get Socketexception as paragraph output, after taking too much time to run but commenting the debug line in zeppelin-env.sh I don't get the error ?
  2. The dependency structure of project is as follows 
    zeppelin-server <= zeppelin-zengine <= zeppelin-interpreter
    So do I have to always package the root project or just the module in which I made change, followed by zeppelin root package ?
    <dependency>
          <groupId>${project.groupId}</groupId>
          <artifactId>zeppelin-interpreter</artifactId>
          <version>${project.version}</version>
    </dependency>
    
    Where does it look for dependency in target folder of zeppelin-interpreter or .m2 local repository. ?

Xml regarding issues

The config member of Paragraph class is Map<String,Object> or to be precise HashMap<String,Object> following the flow this is how it is set..
In NotebookServer the OnMeassagereceived() function has a switch case based on the operation. The updateParagraph and runParagraph methods take the config value from Message.data and also other values etc. like params. The runtime values of config values is com.google.gson.internal.StringMap. The values can itslef be Float, Integer, Boolean , Arraylist<StringMap>. So the issue is only the root level entries are mapped in Xml and the graph element of config is empty in Xml.

Second issue is mapping back from Xml to Note. The error I am getting is InvocationTargetException not Jaxbexception just after unmarshalling. So I was not able to properly debug line
Hence to work around this I delete the notes in notebook-xml and re run so the notebookreposync sync(0,1) converts to xml back for me.

Monday 25 April 2016

Week 1 - XmlNotebookRepo (Store Notebooks in XML format)

 Goal

The goal would be to have .xml representation of the notebook persisted in local filesystem along with existing .json one. Could be just note.xml in the same folder, or could be `./notebook-xml/<noteId>/note.xml`
It should save the same notebook, but in XML format, just in the local filesystem.

So here is how I approached..
  • Created XmlNotebookRepo java class in package org.zeppelin.notebook.repo; copied code from VFSNotebookRepo changed the storage directory at this line
    this.filesystemRoot = new URI(new File(
            conf.getRelativeDir(filesystemRoot.getPath() + "-xml")).getAbsolutePath()); 
  •  Added the following in zeppelin-site.xml
    <property>
            <name>zeppelin.notebook.storage</name>
            <value>org.apache.zeppelin.notebook.repo.XmlNotebookRepo</value>
            <description>notebook persistence layer implementation</description>
    </property> 
    
    
  • So now zeppelin.notebook.storage would have two properties but while remote debugging I found that it has only one value. I also tried uncommenting the GitNotebookRepo storage property but still the value was one.

    the allStorageClassNames variable did not contain comma separated class names.
So I proceeded with  the XmlNotebookRepo itself.

JAXB Usage

I read and created some JAXB examples created some Employee, Student, Address examples with composition.Generated the XML output, different types of annotations @XmlRootElement , @XmlElementWrapper , @XmlElement , @XmlAccessorType(XmlAccessType.FIELD) @XmlTransient etc. This blog[1] was quite useful.
  1. A no-arg constructor is required. 
  2. Public getter/setter or @XmlElement 
  3. Also java collection are mapped to Xml like Map,List,Set. @XmlElementWrapper to create a wrapping element. 

Mapping of Interfaces.

Caused by: com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of IllegalAnnotationExceptions java.util.List is an interface, and JAXB can't handle interfaces 

The solution to this is to use @XmlAnyElement with @XmlRootElement and passing .class of the classes that implement it to JAXBContext.newInstance. Here is the code to demonstrate. Here are the following classes[2]
  1. Address.java 
  2. Cow.java
  3. Employee.java
  4. Student.java
  5. XMLTest.java
Now moving to zeppelin, Modifying Note.java with the annotations, also Paragraph.java needs to modified but I thought lets start first with simple fields to mapped and later move to complex members like AngularObjects, paragraphs, config, Info.
Fields like NoteInterpreterLoader, JobListenerFactory , NotebookRepo have to be transient , we don't want them to be mapped in XML file so I used the annotation @XmlTransient and still I am getting error at this line in XmlNotebookRepo save() method.
Please help.. I have spent a day solving this I don't know how to proceed further. 
Before running the server again please delete the note in  notebook-xml as I have not handled loadallNotes() which loads all saved notes. Breakpoint is at save method in XmlNotebookrepo which is hit after creating Note in the UI.

Errors are temporary, giving up is permanent

So I have figured out what was wrong. I read about XmlAccessorTypes like Field,Property and Public and also XmlAdapters. Now I have my Note saved in Xml partially AngularObjects and GUI-config is remaining and loading of notes i.e unmarshalling.

working ...

Links