May 13, 2016

Database CRUD Operation in Python- SQLite and Python

In this post we will use Python to work with data in SQLite database files. We will create table, insert records into it, update table and finally perform SELECT based on some condition.

Problem statement
: - Process this text file and count the number email messages per organisation (i.e. Domain name of the email address consider as unique organisation name)

Database Schema:-
Table with two columns ORGANISATION and COUNT of Emails.
< CREATE TABLE EmailCounts (ORGANISATION TEXT, COUNT INTEGER) >

SQLite database:- SQLite is a light weight relational database management system. In contrast to many other database management systems, SQLite is not a client–server database engine. Rather, it is embedded into the end program.(Source: Wiki)

How to connect SQLite database ? :-

Python provides support for SQLite database in sqlite3. First, we need to import this in and followed by connect method is used to connect database. The connect operation makes a “connection” to the database stored in the file "emaildb.sqlite" in the current directory. If the file does not exist, it will be created.
Using conn.cursor() we get a hadle of file (like a file handle we obtain using open() method). Once cursor(handle) is obtained, we can use to perform operations on the data stored in the database using execute() method. Below sample code demonstrate the same in sequence.

import sqlite3

conn = sqlite3.connect('emaildb.sqlite')
cur = conn.cursor()

cur.execute('DROP TABLE IF EXISTS EmailCounts ')
cur.execute('CREATE TABLE EmailCounts (ORGANISATION TEXT, COUNT INTEGER)')

conn.close()

Now we will process the this text file which contains raw data and insert data into database. Consider following code lines for processing the raw data in file.
Raw data:-
From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
Return-Path: <postmaster@collab.sakaiproject.org>
Received: from murder (mail.umich.edu [141.211.14.90])
by frankenstein.mail.umich.edu (Cyrus v2.3.8) with LMTPA;
Sat, 05 Jan 2008 09:14:16 -0500
X-Sieve: CMU Sieve 2.3
... ......   . ..
......  . ..............

We need to open file mbox.txt and read each line, when we get Line starting from From ...  process it and filter out domain name. If That domain name is not in Database Insert into database else update the count(increment count by 1).

Sample program to count emails corresponding to organisation

#http://www.pythonlearn.com/code/mbox.txt

import sqlite3
import sys

conn = sqlite3.connect('emaildb.sqlite')
cur = conn.cursor()

#drop table in database, if exists
cur.execute('DROP TABLE IF EXISTS EmailCounts ')

#create table EmailCounts
cur.execute('CREATE TABLE EmailCounts (ORGANISATION TEXT, COUNT INTEGER)')

if ( len(sys.argv) < 2 ) :
    print 'Invalid arguments, Input file missing, exiting !!'
    exit
filename = sys.argv[1]
filehandle = open(filename)
print "Processing input data file......"
for line in filehandle:
    if not line.startswith('From: ') : 
  continue
    emailPart = line.split()
    #stephen.marquard@uct.ac.za
    email = emailPart[1]
    org1=email.split("@")
    
    #stephen.marquard
    orgVal = org1[1]
    #print org1[1]
    cur.execute('SELECT COUNT FROM EmailCounts WHERE ORGANISATION = ? ', (orgVal, ))
    row = cur.fetchone()
    if row is None:
        cur.execute('''INSERT INTO EmailCounts (ORGANISATION, COUNT) 
                VALUES ( ?, 1 )''', ( orgVal, ) )
    else : 
        cur.execute('UPDATE EmailCounts SET COUNT=COUNT+1 WHERE ORGANISATION = ?', 
            (orgVal, ))
    # This statement commits outstanding changes to disk each 
    # time through the loop - the program can be made faster 
    # by moving the commit so it runs only after the loop completes
    conn.commit()

# https://www.sqlite.org/lang_select.html
sqlstr = 'SELECT ORGANISATION, COUNT FROM EmailCounts ORDER BY COUNT DESC LIMIT 10'

print "\n-------------------------"
print "Organization---Counts"
print "-------------------------"
for row in cur.execute(sqlstr) :
    print str(row[0]), row[1]

#close cur/handle 
cur.close()

Sample Output:-
[zytham@s158519-vm backup]$ /usr/bin/python emailCounts.py mbox.txt
Processing input data file......

-------------------------
Organization---Counts
-------------------------
iupui.edu 536
umich.edu 491
indiana.edu 178
caret.cam.ac.uk 157
vt.edu 110
uct.ac.za 96
media.berkeley.edu 56
ufp.pt 28
gmail.com 25
et.gatech.edu 17

Location: Hyderabad, Telangana, India

17 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Great Article. Thank you for sharing! Really an awesome post for every one.

    IEEE Final Year projects Project Centers in Chennai are consistently sought after. Final Year Students Projects take a shot at them to improve their aptitudes, while specialists like the enjoyment in interfering with innovation. For experts, it's an alternate ball game through and through. Smaller than expected IEEE Final Year project centers ground for all fragments of CSE & IT engineers hoping to assemble. Final Year Project Domains for IT It gives you tips and rules that is progressively critical to consider while choosing any final year project point.

    Spring Framework has already made serious inroads as an integrated technology stack for building user-facing applications. Spring Framework Corporate TRaining the authors explore the idea of using Java in Big Data platforms.
    Specifically, Spring Framework provides various tasks are geared around preparing data for further analysis and visualization. Spring Training in Chennai

    ReplyDelete
  3. Great efforts put to publish these kinds of articles that are very useful to know. I’m thoroughly enjoying your blog. And Good comments create great relations. You’re doing an excellent job. Keep it up.

    Magento Development Training Course in Chennai Zuan Education

    Selenium Training Course in Chennai Zuan Education

    ReplyDelete