FABRIC – LEARNING PART 2

Okay, this is day two and i want to put fabric to develop some thing meaningful.
Lets start with amazon EC2.

I want to develop an interactive script where i get to choose ec2 region, ec2 flavour etc.

being blessed with python we have boto, a swiss army knife for amazon web services.

i assume you have all set (aws account, security group, key pair, access key id, secret access key)

so this is not going to be fabric but i want this on the way as we need to create server then do stuff with faric.

from fabric.api import *
from fabric.colors import green as _green, yellow as _yellow
import boto
import boto.ec2
import time

def create_ec2():
	ubuntu_images = {	"ap-northeast-1":"ami-bfc0afbe",
						"ap-southeast-1":"ami-92aef9c0",
						"ap-southeast-2":"ami-57f16f6d",
						"eu-west-1":"ami-dea653a9",
						"sa-east-1":"ami-4da10150",
						"us-east-1":"ami-951524fc",
						"us-west-1":"ami-b0784af5",
						"us-west-2":"ami-36d6b006",
					}
	ec2_region = ""
	ec2_key = "your Access Key ID"
	ec2_secret = "Secret Access Key"
	ec2_key_pair = "key pair name you created in the region you will select in this program"
	ec2_security = ("default",)
	ec2_instancetype = "m1.small"
	regions = boto.ec2.regions()

	i=0
	for r in regions:
		r = str(r)
		if r[r.index(':')+1:] =="us-gov-west-1":
			continue

		print str(i) + " : " + r[r.index(':')+1:]
		i += 1

	i = raw_input("your choice : ")
	ec2_region = str(regions[int(i)])
	ec2_region = ec2_region[ec2_region.index(':')+1:]

	conn = boto.ec2.connect_to_region(ec2_region, aws_access_key_id=ec2_key, aws_secret_access_key=ec2_secret)
	print(_green("connected to : "  + ec2_region))
	print ubuntu_images[ec2_region]
	
	reservation = conn.run_instances(ubuntu_images[ec2_region],
						key_name = ec2_key_pair,
						security_groups =  ec2_security,
						instance_type = ec2_instancetype
					)
	instance = reservation.instances[0]
	while instance.state == u'pending':
		print(_yellow("Instance state: %s" % instance.state))
		time.sleep(2)
		instance.update()
	print(_green("Instance state: %s" % instance.state))

	print(_green("Instance created succeffully!"))

	print(_green("Allocating new elastic ip..."))
	ip = conn.allocate_address()
	print(_green("Attaching elastic ip to instance..."))
	ip.associate(instance_id=instance.id)

	print(_green("Public dns: %s" % instance.public_dns_name))

	print "Instance created and attached to public ip : " + ip.public_ip
	return

now the out put is…

hotice@ashwin-ws:~$ fab create_ec2
0 : ap-southeast-1
1 : ap-southeast-2
2 : us-west-2
3 : us-east-1
4 : us-west-1
5 : sa-east-1
6 : ap-northeast-1
7 : eu-west-1
your choice : 3
connected to : us-east-1
ami-951524fc
Instance state: pending
Instance state: pending
Instance state: pending
Instance state: pending
Instance state: pending
Instance state: pending
Instance state: running
Instance created succeffully!
Allocating new elastic ip...
Attaching elastic ip to instance...
Public dns: ec2-54-196-108-1.compute-1.amazonaws.com
Instance created and attached to public ip : 54.204.13.213

okay… this is not fabric, this is just python boto script. as i said earlier, i will do the fabric stuff in next of this series…

Fabric – Learning part1

Hmm, finally here i am writing my first blog post after a long time to break silence in my techie brain.

Okay, lets start with the story, i can configure Linux production server with uwsgi, nginx, vsftpd, mysql, mongodb, postfix, php, python tools, and other relevant pieces of software single-handedly and i learnt it the hard way and do the hard way for granular performance and security. And there came a pseudo friend of mine asking me why didn’t i do it with fabric which is awesome as i am python expert.

So, let me walk through my experience while i try it for first time.

first install fabric with pip

sudo pip install fabric

then use your favorite editor to create file called

fabfile.py

and save it with the following content.

def hello():
    print "Hello dolly"

now lets run it with

$ fab hello
hello dolly

Done.

wow we got it, so we need to keep all the code in fabfile.py (i think this is standard name with my current knowledge) and run functions in it by calling fab

ok, now lets pass some parameters to it

def hello(name="world"):
    print("Hello %s!" % name)
$ fab hello:name=ashwin
hello ashwin

Done.

hmm its good.

this is it for now, lets put it to real use in next part.

i am followed http://docs.fabfile.org/en/1.4.0/tutorial.html in this tutorial.

part 2 Go for it…

Cloudata: A New Open Source BigTable Inspired Database

Cloudata is a new open source implementation of Google’s BigTable paper. It can be found on Github here. It appears to be the project of a Korean developer named YKKwon.

As noted at MyNoSQL, there are only a couple commits and it’s not clear how serious this project is. But it will be of interest to big data, MapReduce and BigTable buffs.

Cloudata differentiates itself from Hadoop by offering an indexed but still non-relational database, but is probably more comporable to HBase and Hypertable, which are also open source BigTable implementations. The project’s website claims Cloudata can retrieve data within a few milliseconds.

Here’s a list of the current features:

Basic data service

  • Single row operation(get, put)
  • Multi row operation(like, between, scanner)
  • Data uploader(DirectUploader)
  • MapReduce(TabletInputFormat)
  • Simple cloudata query and supports JDBC driver

Table Management

  • split
  • distribution
  • compaction

Utility

  • Web based Monitor
  • CLI Shell

Failover

  • Master failover
  • TabletServer failover

Change log Server

  • Reliable fast appendable change log server

Support language

  • Java, RESTful API, Thrift

Getting Started with Hadoop and Map Reduce

Have you been wanting to learn Hadoop, but have no idea how to get started? Carlo Scarioni has a basic Hadoop tutorial that covers installing Hadoop, creating a Hadoop Distributed File System (HDFS), moving files into HDFS, and creating a simple Hadoop application. The tutorial also introduces the basic concepts of Map Reduce.

It doesn’t, however, get into distributing the application, which is the main point of using Hadoop in the first place. Scarioni leaves that to a future tutorial. But if you want to get your feet wet with Hadoop and/or Map Reduce, this seems like a pretty good place to start.

also gives us a pretty concise explanation of what Hadoop is:

Hadoop is an open source project for processing large datasets in parallel with the use of low level commodity machines.Hadoop is build on two main parts. An special file system called Hadoop Distributed File System (HDFS) and the Map Reduce Framework.

The HDFS File System is an optimized file system for distributed processing of very large datasets on commodity hardware.

The map reduce framework works in two main phases to process the data. Which are the Map phase and the Reduce phase.

Setup High Performance Server

High Performance Server? yeah. Its no wonder that we can setup high performance server.

Server – A computer running a service. yep. its true servers are nothing but computers running appropriate services.

So whats a server? Hardware or Software or both? What about the big 6 feet high machines we think as servers?

A server can be any computer running service(s).

You can make your laptop as a server or desktop or even mobile phone. but performance matters.

Servers performance depend on hardware and software. we can upgrade hardware to improve performance but upgrading software may fix bugs, add new features to it but no change in performance or can reduce performance.

Our major concern is choosing right software for the right job. There is no all in one software, even if there is it will be slow or buggy. We need to choose each and every piece of software to make a better system.

I will illustrate this with an example.

Requirement : We 10 people from 6 geographic locations intend to develop a social network which may serve 100 million users.

Analysis : Assume that we have best hardware, now focus on software. Software includes Operating System, Programming Language, Database Server, Web Server, SCM Server, Cache server. Our software need to be small memory footprint, bug/virus free, secure, fast.

If we see existing high performance implementations like hotmail, yahoo, facebook, twitter, google its like this,

  • Google : Python, C++, Plan9 (i think), Bigtable, GWS, GFS, mysql, …
  • Facebook : Linux, C++, C, PHP, Scribe, Thrift, Apache Cassandra, Hip hop, Tornado, Apache Hive, Varnish, Mysql, Memcached.
  • Hotmail : IIS, Windows server, .Net, …
  • Youtube : Memcached, …
  • Yahoo : PHP, Mysql, Free BSD, …
  • Twitter : Ruby, memcached, …

OS : CentOS or Ubuntu We have lot of options to choose in Operating Systems like Windows/Solaris/Unix/BSD/Linux/Mac/Plan9/AIX But Linux kernel is in active development than any other, fast/scalable/secure and nearly as bug/virus free as UNIX. Linux is having lot of flavors so we choose advanced and know version among them.

Programming Language : C, PHP, Javascript, Python, Haskel. We have several kinds of languages like Procedural, Object Oriented, Dynamic, Functional, Event Driven, Structured, Statically typed, Dynamically typed and few more. Each kind having its advantages. Normally we come to choose programming language according to our need/purpose rather than what we fancy. In general we have C, C++, Perl, Python, Ruby, .net(VB, C#, F#), Java, PHP, Haskel, Scheme, Javascript, Tcl,… C is the best to do low level stuff, PHP to play with http/HTML, Javascript to enrich presentation layer, python good at playing with text, heskel to do any exteded programming to take advantage of functional language.

DBMS : Mysql or apache cassandra. We choose fast and distributed database management systems from Mysql, MS Sql Server, Postgresql, Oracle, DB2, …

Web Server : nginx. A Webserver serve’s performance depends on how it process request and how it handles scripts(PHP, Ruby, Python, Java,….). We have Lots of Webservers like Lighttpd, nginx, Apache, Cherokee, IIS, Tomcat, Glash fish, Weblogic, mongrel, Webrick,….

SCM Server : Git. Git is the best of its kind when compared with Mercurial, Subversion, Bazaar, CVS, Bitkeeper and several others.

Cache Server : Varnish

In future I will try to screen cast the best setup i can.