- Do some training (seriously). Hadoop and MapReduce isn't grandfathers programming and many of the ideas & principles you would otherwise use don't suit MapReduce/Hadoop. Those coming from large-scale scientific data background will have a head start. Cloudera has some great training vids available on their site and also offer a training course which I attended and found good.
- Use a local version to test. I found Karamsphere to be very useful for testing the scripts written and working out the bugs. The Cloudera has a virtual machine that is very, very useful for doing Pig and Hive testing.
- Streaming jobs are your friend. Using streaming jobs is a great way to get Hadoop based processing of the ground.
- Follow these pointers from Pete Warden. Increase allocated memory size (particularly if using PHP via ini_set), use the -jobconf stream.recordreader.compression=gzip etc
- Delete outputs. MapReduce jobs will fail if you don't delete the output directory from a previous run. This one will get you into trouble all the time.
Wednesday, July 07, 2010
Posted by Simon Cast at 14:17
Sunday, February 14, 2010
Image via WikipediaThe failure of Copenhagen along with the sheer complexity of ETS (one which is going probably do more for financial institutions than the real economy) requires stepping back to re-consider how to achieve the aims of reducing CO2 emissions. In the post "The Fallacy of the Carbon Market" I made the point that market-based reduction methods don't have to be carbon based.
Here I want to look at using a negawatt based market to reduce energy demand. Markets consist of supply and demand. The supply of negawatts is easy - it is all the effiecency changes that can be done (insulation, improved appliances etc.). The sticking point is demand. How to create demand for negawatts?
The renewable energy targets provide the mechanism for creating demand, by allowing negawatts to count towards those targets, power companies can choose to use negawatts rather than other forms of renewable energy to meet their obligations.
Negawatts would be created by doing an audit of end user (household, office, factory etc) to benchmark the energy consumption. The purchaser then pays for improvements (adding insulation, triple glazing the windows, more efficient HVAC etc). The difference in energy consumption after the improvements are installed is benchmarked. The amount of negawatts is the difference between the before and after benchmarks. These negawatts would count to the power companies renewable energy target for 1 to 5 years.
The advantages of this approach are:
- Much simpler measurment, audit and verficiation
- Doesn't impose large scale price increases on end-users
- Increases productivity of the economy generally
- Frees up end-user cash for other users
Negawatts address the simple physics problem that we can't build enough renewable energy sources in the time required in order to effectively replace enough carbon based energy production. A sustainable negawatt market will drive the development of new efficiency measures and devices leading to situation that energy demand falls as our ability to generate substantial energy from renewable sources increases. We are addressing the problem from both a supply and demand side, achieving a better overall solution.
Image by Tom Raftery via FlickrThe discussion around the iPad has missed what I see as a vital use case. One that is becoming increasingly relevant in many countries around the world. Previously, I had written about a device that could provide unblocked accessed to the internet and outside world. The iPad is that device.
Coupled with portable satellite broadband basestation and some mesh networking software, a million iPads would make for a very difficult internet access method for oppresive regimes to block.