Horizon of Stars: Data

Showing posts with label Data. Show all posts

Wednesday, December 03, 2008

Transparency and ending the Financial Crisis

Mike Masnick has interesting post on the requirement for transparency in order to do trade. The basic premise is that lack of transparency therefore information is at the root of the crisis as people simply don't know what the value of anything is any more.

This is along the lines of my own thoughts. The subprime crisis was the trigger where people realised they had no idea what everything was worth and it spiralled from there.

As information or lack thereof is at the heart of this crisis, I believe that the continual injections of cash and purchase of assets is only prolonging the issue. The system is relatively stable now so the key is radical transparency. The banks, financial institutions, hedge funds etc. need to open their books to 3rd parties (trusted 3rd parties) in order for the information to be found.

Once that is done, investors will regain confidence in their ability to value companies and assets. At the moment they can't and so won't risk their money.

This will probably require Government legislation to force the opening but it has to happen. The more public the information is made the fast this whole crisis will be solved and then everyone can move onto fixing the damange being done to the real economy (you know the part that creates real wealth and improved living standards?).

Friday, October 03, 2008

Data Half-life: Time Dependent Relevancy

Data Half-Life is not an indication of the importance of a particular piece of information. It is actually a measure of how long a piece of information is relevant. Relevance is not a substitute for importance. It is dependent on context and the information itself. So a low data half-life means that the piece of information will quickly lose its relevancy. A high data half-life means the relevancy will drop slowly.

Consider the story that Clay Shirky related in his keynote at Web 2.0 Expo in New York. In this story someone changed their relationship status from engaged to single. This information is highly relevant to some people and not very relevant to most others. Given that data half-life reflects the broader relevance of the information to a person’s network, it has a low data half-life. It is generally not relevant to most of the people in the network.

Now they many want to know or feel the need to know, that does not mean it is relevant to them. It is easy to mistake the desired to know or the need to know as relevant. Desire to know has no bearing of the information’s data half-life.

By having a low data half-life the relationship status will only travel only so far through the person’s network, thereby avoiding the result in Clay Shirky’s story. Data Half-life is represents how time dependent the information is. The more time dependent some data is, the lower the half-life and the less time dependent the higher the half-life.

Tags: Filters

Thursday, October 02, 2008

Privacy Filters and Facebook

In my previous post I used privacy in Facebook as an example of how data filters could work. One point I glossed over was how currently Facebook, indeed all social sites, fail with social distance. Unfortunately, social distance is a necessary for privacy filters to work satisfactorily.

Facebook has one major flaw, once a person is a friend in Facebook they are treated the same as all other contacts whether the connection comes from bumping into the person at a pub or someone you grew up with. It collapses the privacy or social distance between two people. The social distance can be considered how strong the connection between two people is. Social distance provides a measure of both strong and weak ties as articulated by Mark Granovetter.

Without some measure of social distance or strength of connections, any privacy filter is going to fail. The social graph fails to represent the real world connections between people properly.

Facebook attempts to use groupings of friends to approximate social distance but this is cumbersome to use. The manual nature of setting up and categorising everyone into groups is a major barrier to use. People are lazy.

What is needed is an automated method for calculating social distance. Social distance is calculated (and this is how Mark Granovetter categorised connections) by the frequency of communications. Measuring frequency of communications is difficult for Facebook. While Facebook can measure wall posts, internal emails, poking etc., so much more of our communication occurs outside of Facebook, outside of the wall; whether through email, IMs, phone calls, SMS, twitter parties attended etc.; that the frequency of communication within the wall is not a reasonable approximation for the wider frequency of communication.

The key measure of social distance – communication – is hard to quantify as it is dispersed through many different channels. Trying to capture the frequency of communication via porting the data in is one method of dealing with the issue. The other, probably more realistic, method is to start off with some rules and use what can be easily quantified to refine the measure of connection strength overtime.

The rules would look at what is known generically about social connections. Some of rules are:

Married is a strong connection
The same surname is a strong connection
If strong connections to friends with which you have strong connections then you probably have a strong connection

Some of these rules will dictate a very strong connection (first rule) while others will dictate varying strengths dependent on factors such as prior connections with other friends (third rule). All connections start as very weak and are refined first by application of the rules and then overtime by measures of frequency of communication.

Privacy filters all start with knowing the distance between two end points whether physical in case of centuries before or by social distance in the case of today. Until Facebook and any other social-based site has a measure of social distance privacy filters are going to be mediocre at best and more often prone to failure.

Tags: Privacy, Facebook, Filters

Friday, September 26, 2008

Failure of Filters

The title from this post is taken from the keynote that Clay Shirky delivered at the NY Web2.0 Expo in September 2008. The premise of the keynote is that the “information overload” we are facing is not a problem but a fact (one that has been around since Gutenberg and his movable type press) and what we are seeing now is the collapse of the traditional filters that mediated the information overload.

The existing filters for information were founded in the difficulty of moving information over distance. The various communications technologies of the 20th Century have steadily eroded the tyranny of distance. The web completed the destruction of distance filters by removing all concept of spatial distance for information.
Our sense of privacy is again bounded up in the hassle in moving information over distance. This physical distance is the basis for the whole concept of privacy. The closer we are to other people the less privacy we expect. We found that to be a reasonable rule of thumb as those closest to us (community, family, friends) are likely to spatially close to us. We only now need privacy safeguards because the rule of thumb no longer applies – spatial distance is meaningless for information now.

Information overload and privacy issues are a rooted in us expecting that filters based on spatial distance to continue working in a world where information has no spatial component. Any filters built with this expectation don’t work. Instead we have to create a new framework for filters that don’t rely on spatial distance.

By borrowing ideas from science we can create a framework that doesn’t rely on spatial distance. The framework is based on data half-life, data permeability and data potential. Data half-life is the measure of how long the bit of data takes to lose half of its relevancy/ importance. Data permeability is a measure of how hard it is for data to move over a period of time – think fluid moving through a filter. Data potential is the initial potential for the data to move – think potential energy in Newtonian dynamics.

The interaction of these three parameters determines how far and how quickly information can travel within an environment where spatial distance has no meaning. An analogy will help illustrate how the parameters behave together to filter information.

Let’s say we have some information – death of the chief of a village. The village has good roads and the news is to be sent by horse. This information will go far as it important (chief of a village), it is easy for the information to move on the road and the horse is quick. If, however, the death is not the chief then the news won’t travel as far it is not as important. It is the interaction between the data half-life (how important the person is), the data permeability (how easy it is to move the information) and data potential (how fast the information can move) which determines how far the information will travel.

Changing the parameters creates a varied set of filters that determines how far and how fast information will defuse. Each connection has a level of data permeability with information coming in assigned a data half-life and a data potential. The information only passes the filter when the data half-life and data potential are enough to overcome the data permeability.

To illustrate consider changing your relationship status in Facebook. If someone changes their status from relationship to single they don’t necessarily want the information to spread quickly through their “facebook friends” as their friends will include work colleagues and friends of friends only met once. Instead each of their connections should have different data permeability and depending on information (data half-life and data potential) it will show up in some of the connections news feeds right away, some in days, some in weeks and others never at all.

There is no single way to create and calculate data half-life, data potential and data permeability. Various developers will come up with their own methods. Some of which will work and others that won’t. Hopefully further down the track we will see a standardisation on calculating the parameters based on accepted criteria for each type of information – personal, communications, knowledge etc.

Tags: Filters, Information Overload, Privacy, Clay Shirky

Friday, February 22, 2008

Meddling polies should focus on what matters

I understand the need for regulation and laws, but it annoys me when polies meddle in areas for the sake of being seen to do some thing. The UK government is threatening legislation to penalise ISPs for not blocking privacy. The really idiotic thing is the UK government has already said their idea falls afoul of UK and EU data privacy laws. So what's going to happen? The government will pass the law, the ISPs will challenge in the UK and EU and the law will be rule illegal. So back to drawing board with the only accomplished is a lot of money spent on lawyers and the government been seen to do something.

Beyond the technical hurdles, there is also the political fallout from the law being overturned plus the damage this will do to a government tettering on the brink. I wonder if this new push has arisen when someone pointed out wholesale kicking off of downloaders is likely to lose Labour votes.

To me, this is massive over-reaction to a relative minor problem. Stopping downloading of content is not going to be a saviour of society nor cure its ills. There are far more pressing problems in the economy that WILL do massive amount of damage to peoples lives - credit crisis, stagflation, deep recession - that the government should be worrying about. Protecting an outdated business model is NOT the governments job.

This legislation has the air of desperation. Of a government looking down a yawing canyon of irrelevance and trying deperately to be seen to be doing something, anything.

Fiddling while Rome(London) burns.

Tags: UK Government, UK, ISPs, Policy

Sunday, January 06, 2008

Language and problem solving

In the most recent New Scientist (Vol 197 No 2637) there is an interesting article discussing the issue of language and how it frames problems. The perspective of the article was that English's newtonian way of describing the world failed to frame questions properly for quantum and other similar non-newtonian physics. The article even goes so far to say that the lack of progress in non-newtonian physics is because problems are framed via the language with a newtonian world view.

Does the same problem exist in the world of the internet? While I realise the Internet world is great at creating new words, these are still framed by the overall language. A language that is "newtonian". As Internet shifts to flows and systems as opposed to objects and links, do we need to look at how we frame the discussion via language to open up the problem solving juices of the internet community? New next wave of innovation will be less around nouns towards verbs, the doing rather than the being and yet we still primarily use nouns in discussing the web and its evolution. Should verbs that describe process, systems and flow be the primary descriptors of the next web?

The article describes an example of Montagnais phrase "Hipiskapigoka iagusit". It very, very roughly translates to "singing health", a process, within which a medicine man and sick person exist. However, a dictionary written in 1729 translated into something that emphasised the objects and not the process. The web is shifting to loosely coupled processes as opposed to objects. I wonder whether the discussion of Robert Scoble's recent tiff with Facebook, would have evolved differently if the language emphasised process (say maintaining contacts) as opposed to data (the contacts themselves). The discussion was about who owned what objects (the contact data) rather than what the ins and outs of maintaining contacts. Another example is the current discussion going on about whether data is a commodity or not. Again the language is of objects rather than flow. How would this discussion evolve if it was frame by a language of flow (verbs) as opposed to objects (nouns)?

The same questions can be asked of programming. Everyone expresses the need to ramp up parallel programming to take advantage of the distributed nature of the internet and multi-core processes. However, can any real problem be solve properly while the language used to frame the problem is based on objects rather than flow? Does the conceptual framework that underpins object orientated programming preclude successful problem solving in the parallel world? Yes there are languages that focus specifically on parallel programming but I am also talking about the language used to describe and communicate the problem. These will need to respond to the requirements of a parallel world for people to solve problems and communicate solutions.

A lot of questions asked. I don't have the answers and I expect no one will for a while. It is interesting to step away from objects and consider things from a flow perspective. I even think I need to re-visit my recent post of Data Ecosystems and look at it from the perspective of flow rather than objects

Tags: Data, Language, Programming, Internet, Physics, Data Ecosystems

Horizon of Stars

Wednesday, December 03, 2008

Transparency and ending the Financial Crisis

Friday, October 03, 2008

Data Half-life: Time Dependent Relevancy

Thursday, October 02, 2008

Privacy Filters and Facebook

Friday, September 26, 2008

Failure of Filters

Friday, February 22, 2008

Meddling polies should focus on what matters

Sunday, January 06, 2008

Language and problem solving

Subscribe

Twitter Updates

Photos

About Me

Labels

Blog Archive