Pages

Showing posts with label data mining. Show all posts
Showing posts with label data mining. Show all posts

Tuesday, 1 February 2011

Sometimes, To Be Ethical Is To Be A Fool.

If free information distribution is considered to be an ethical practice today, at what point does information retention become unethical?


[caption id="" align="alignright" width="250" caption="Google Labs"]G_labs[/caption]


A very good example to illustrate this dilemma would be the products of Google, more specifically the services offered under their Labs feature. The first step in increasing worldwide public access to information was the Google Books project. The unevenness of opportunities presented by lack of space or time was nullified by the access to a range of books written in many languages and of many genres (especially of the classics corpora). On that primary level, any competence that involved information as a principal player saw the latter’s transformation into a tradable commodity. It began to be subjected to the same abuse that money faced: hoarding, thriftiness and deficiency. Therefore, he who hoarded, he who was thrifty and he who caused deficiency was inculpated for unethical practices.

Now, with too much information swimming around the cybersphere, data visualization has been resurrected with greater responsibility and, as a matter of an axiom, greater power. In between the two eras, that of data acquisition and perception, there was a period dominated quietly by a backstage hero called data mining: with more information on more things coming out every second, the proverbial gap between the winner and the loser began to narrow down because the two factions were only separated by the knowledge of what information was worthy and what was not. However, when we bit off more than we could chew, it was soon not a matter of what but of how. When we began to find out more than we ought to have known about the past, the future becomes less of a certainty and more of a possibility.

In line with that thought, Google brought in its Ngram Viewer (NV). A simple extension of the Google Books venture, NV brought together simple data mining, graphical data visualization and hundreds of thousands of books written in the last 200 years in 7 languages to leave the user with a new kind of data, ripe for interpretation. Visit the viewer here and see for yourself how the usage of the words “gay” and “homosexual” has varied in frequency over the years, and how it can be understood to show our perception of the words themselves: the more often they were used, the more they featured in discussion, the more they impacted us.

In this secondary level of information distribution – with the world as such tending to greater access limited by vaguer boundaries – could there be such a thing as information hoarding? Definitely. Compare this scenario you’re in to a ladder: you’re on the bottom rung, raw data is on the top-most rung. Before the raw data can reach you, the number of other filters it goes through on the way is increasing. Even though the greater challenge has been to engender new perspectives, there is also the challenge of leaving some information to be interpreted. On the primary level, the access to the information is increased. On the secondary level, it is classified more logically. On the third level, when it reaches you, you retain a responsibility still to decide:

  1. How you use it

  2. Why you use it, and

  3. Whom do you use it with


Therefore, the ethics of this day and age have not been blurred by the repeated refinements but have only been rendered into a finer and finer line, bent this way and that by corporate greed, capitalist agendas and an overriding anarchism performed as an act of rebellion in most cases. The withholding of information does not spell misdemeanour but, more often than not, caution. This is the very nature of capitalism: to address greed by fostering the need to compete in its players. To be completely ethical in such a day and age is to be a fool.

Tuesday, 11 January 2011

On The Reawakening Of Dreams

As I was writing the entrance test that’s part of my application to the Columbia University today, my flow was broken, nay individuated, by the third and last question in the paper: “If given one month to report on a topic, what would the topic be? How would you go about studying and reporting it, and what media would you use to garner the maximum width of audience? Ensure that you don’t exceed 500 words.”

Of course, the last line was a terrible jolt to me; since I wasn’t being allowed to use the word-count companion, I began to type slowly, deliberately, counting each word as I put it down. Looking up at the clock, I saw that I had some 30 minutes remaining before the time would be up. I stopped typing and paused to think.

What would I report on? I had known the answer to that one for some four years, “The Impact Of Languages On Society”, but I could not go beyond thewhat of it all. You see, since the time I had completely structured the dream, per se, for myself, a lot of things had changed – the answers to most, if not all, of thehows had assumed different shapes and, with them, the whys, too. For example, if I were to present any statistical data after sampling and surveying (the methods for which have not changed significantly in a long time), I would have done so with tables with a small write-up accompanying each table. Now, I’ll have the tables, yes, but they wouldn’t be the nadirs of my hypotheses. Now, I have the Google Trendalyzer – more recently, it powered the Google Zeitgeist – together with Hans Rosling‘s Gapminder. With the coming of opportunities in programming and data visualization, the gap between raw data and the intended conclusion may have changed for the better. However, by being allowed to assume multiple perspectives with unchanging ease, the width of the audience that understood the praxis grew because the solution was now compatible with all the different ways in which the problem was being perceived by different people.



 


[caption id="" align="aligncenter" width="300" caption="Prof. Rosling"]Professor Hans Rosling visited the Swedish pav...[/caption]


 

With that also increased involvement: presenting problems and solutions as seemingly dissociated elements only alienates the target audience because a) they feel excluded, b) they see no valid argument, or c) both. With the coming of Gapminder, which is a sterling example towards illustrating the consequential upgrading of perspectives it heralded (and, subsequently, the Trendalyzer), initiating increased audience participation became a 2-step process. In other words, affordable.



Soon, audience-participation and audience-inclusion was everywhere, eventually but quickly transcending crowd-sourcing into cloud-networking, where proactive attempts at bettering it only made it more intuitive. It was no longer necessary that I had to have all the resources to execute my projects; I could even be so much as a singular contributor – the plurality would be derived from a global network of research groups.

What did this mean for my hows? It meant that the long hours I had vouchsafed for perfect data representation had become short minutes, and I had time now to do so many other things – perhaps even spend them coming up with new ways to garner more meaningful data and chamfering the the conclusions. With more participation easierly (yeah, that’s a made-up word, but you get the semantic drift) available, undertaking standalone projects, or even aspiring to do so, would be foolish. In other words, unaffordable.

I went on to complete my paper so quickly that the examiner was surprised. I am sure I exceeded the word-limit but a few words, but I’m not worried. I’m sure they’ll get the point.

By widening the scope of the problem to include a malleated range of parameters to understand change at one end and widening the compatibility of solutions to address a longer list of issues at the other end, technology and the latitude of human thought have reawakened my dreams to a brighter world.