Problem statement: - Process this text file and count the number email messages per organisation (i.e. Domain name of the email address consider as unique organisation name)
Database Schema:-
Table with two columns ORGANISATION and COUNT of Emails.
< CREATE TABLE EmailCounts (ORGANISATION TEXT, COUNT INTEGER) >
How to connect SQLite database ? :-
Python provides support for SQLite database in sqlite3. First, we need to import this in and followed by connect method is used to connect database. The connect operation makes a “connection” to the database stored in the file "emaildb.sqlite" in the current directory. If the file does not exist, it will be created.Using conn.cursor() we get a hadle of file (like a file handle we obtain using open() method). Once cursor(handle) is obtained, we can use to perform operations on the data stored in the database using execute() method. Below sample code demonstrate the same in sequence.
import sqlite3 conn = sqlite3.connect('emaildb.sqlite') cur = conn.cursor() cur.execute('DROP TABLE IF EXISTS EmailCounts ') cur.execute('CREATE TABLE EmailCounts (ORGANISATION TEXT, COUNT INTEGER)') conn.close()
Now we will process the this text file which contains raw data and insert data into database. Consider following code lines for processing the raw data in file.
Raw data:-
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
Return-Path: <postmaster@collab.sakaiproject.org>
Received: from murder (mail.umich.edu [141.211.14.90])
by frankenstein.mail.umich.edu (Cyrus v2.3.8) with LMTPA;
Sat, 05 Jan 2008 09:14:16 -0500
X-Sieve: CMU Sieve 2.3
... ...... . ..
...... . ..............
We need to open file mbox.txt and read each line, when we get Line starting from From ... process it and filter out domain name. If That domain name is not in Database Insert into database else update the count(increment count by 1).
Sample program to count emails corresponding to organisation
#http://www.pythonlearn.com/code/mbox.txt import sqlite3 import sys conn = sqlite3.connect('emaildb.sqlite') cur = conn.cursor() #drop table in database, if exists cur.execute('DROP TABLE IF EXISTS EmailCounts ') #create table EmailCounts cur.execute('CREATE TABLE EmailCounts (ORGANISATION TEXT, COUNT INTEGER)') if ( len(sys.argv) < 2 ) : print 'Invalid arguments, Input file missing, exiting !!' exit filename = sys.argv[1] filehandle = open(filename) print "Processing input data file......" for line in filehandle: if not line.startswith('From: ') : continue emailPart = line.split() #stephen.marquard@uct.ac.za email = emailPart[1] org1=email.split("@") #stephen.marquard orgVal = org1[1] #print org1[1] cur.execute('SELECT COUNT FROM EmailCounts WHERE ORGANISATION = ? ', (orgVal, )) row = cur.fetchone() if row is None: cur.execute('''INSERT INTO EmailCounts (ORGANISATION, COUNT) VALUES ( ?, 1 )''', ( orgVal, ) ) else : cur.execute('UPDATE EmailCounts SET COUNT=COUNT+1 WHERE ORGANISATION = ?', (orgVal, )) # This statement commits outstanding changes to disk each # time through the loop - the program can be made faster # by moving the commit so it runs only after the loop completes conn.commit() # https://www.sqlite.org/lang_select.html sqlstr = 'SELECT ORGANISATION, COUNT FROM EmailCounts ORDER BY COUNT DESC LIMIT 10' print "\n-------------------------" print "Organization---Counts" print "-------------------------" for row in cur.execute(sqlstr) : print str(row[0]), row[1] #close cur/handle cur.close()
Sample Output:-
[zytham@s158519-vm backup]$ /usr/bin/python emailCounts.py mbox.txt
Processing input data file......
-------------------------
Organization---Counts
-------------------------
iupui.edu 536
umich.edu 491
indiana.edu 178
caret.cam.ac.uk 157
vt.edu 110
uct.ac.za 96
media.berkeley.edu 56
ufp.pt 28
gmail.com 25
et.gatech.edu 17
Really informative Blog...Thanks for sharing...Waiting for next update...
ReplyDeleteWordpress Training in Chennai
Wordpress course in Chennai
Wordpress Training Chennai
Wordpress Training in Porur
Wordpress Training in Anna Nagar
Struts Training in Chennai
clinical sas training in chennai
Spring Training in Chennai
Photoshop Classes in Chennai
LoadRunner Training in Chennai
This comment has been removed by the author.
ReplyDeleteInformative content,thanks for sharing...waiting for next update...
ReplyDeletejavascript training in chennai
js class
javascript course
javascript training in OMR
javascript training in Porur
javascript training in chennai
core java training in chennai
Photoshop Classes in Chennai
Manual Testing Training in Chennai
Drupal Training in Chennai
The blog which you have shared is more creative... Waiting for your upcoming data...
ReplyDeletePython Training in Chennai
Python course in Chennai
Python Classes in Chennai
Best Python Training Institute in Chennai
Training in Tnagar
training in Thiruvanmiyur
Big data training in chennai
Software testing training in chennai
Selenium Training in Chennai
JAVA Training in Chennai
Nice Blog...Thanks for sharing the article waiting for next update...
ReplyDeleteArtificial Intelligence Course in Chennai
best artificial intelligence training in chennai
C C++ Training in Chennai
javascript training in chennai
Html5 Training in Chennai
QTP Training in Chennai
Spring Training in Chennai
DOT NET Training in Chennai
Really wonderful blog! Thanks for taking your valuable time to share this with us. Keep us updated with more such blogs.
ReplyDeleteAWS Training in Chennai
AWS Training
AWS Training institute in Chennai
DevOps Training in Chennai
Azure Training in Chennai
VMware Training in Chennai
AWS Training in Velachery
AWS Training in Tambaram
AWS Training in Tnagar
AWS Training in Anna nagar
Good Blog!!! Thanks for sharing this great information with us
ReplyDeleteSelenium Training in Chennai
Selenium Course in Chennai
selenium certification in chennai
Best selenium Training Institute in Chennai
Selenium Training in Velachery
Selenium training in Adyar
Python Training in Chennai
Software testing training in chennai
JAVA Training in Chennai
iso 27001 certification services
ReplyDeleteiso 27001 certification in delhi
ISO 9001 Certification in Noida
iso 22000 certification in Delhi
iso certification in noida
ReplyDeleteiso certification in delhi
ce certification in delhi
iso 14001 certification in delhi
iso 22000 certification cost
iso consultants in noida
we have provide the best fridge repair service.
ReplyDeleteWashing Machine Repair In Faridabad
LG Washing Machine Repair In Faridabad
Videocon Washing Machine Service Centre In Faridabad
IFB Washing Machine service centre in faridabad
Samsung Washing Machine Repair In Faridabad
Washing Machine Repair in Noida
godrej washing machine repair in noida
whirlpool Washing Machine Repair in Noida
IFB washing Machine Repair in Noida
LG Washing Machine Repair in Noida
we have provide the best ppc service.
ReplyDeleteppc company in gurgaon
website designing company in Gurgaon
PPC company in Noida
seo company in gurgaon
PPC company in Mumbai
PPC company in Chandigarh
Digital Marketing Company
Rice Bags Manufacturers
ReplyDeletePouch Manufacturers
wall putty bag manufacturers
fertilizer bag manufacturers
seed bag manufacturers
gusseted bag manufacturers
bopp laminated bags manufacturer
Lyrics with music