Any help on this problem will be greatly appreciated. So basically I want to run a query to my SQL database and store the returned data as Pandas data structure. I have attached code for query. I am reading the documentation on Pandas, but I have problem to identify the return type of my query. I tried to print the query result, but it doesn’t give any useful information.
from sqlalchemy import create_engine engine2 = create_engine('mysql://THE DATABASE I AM ACCESSING') connection2 = engine2.connect() dataid = 1022 resoverall = connection2.execute("SELECT sum(BLABLA) AS BLA, sum(BLABLABLA2) AS BLABLABLA2, sum(SOME_INT) AS SOME_INT, sum(SOME_INT2) AS SOME_INT2, 100*sum(SOME_INT2)/sum(SOME_INT) AS ctr, sum(SOME_INT2)/sum(SOME_INT) AS cpc FROM daily_report_cooked WHERE campaign_id = '%s'"%dataid)
So I sort of want to understand what’s the format/datatype of my variable “resoverall” and how to put it with PANDAS data structure.
Here’s the shortest code that will do the job:
from pandas import DataFrame df = DataFrame(resoverall.fetchall()) df.columns = resoverall.keys()
You can go fancier and parse the types as in Paul’s answer.
Edit: Mar. 2015
import pandas as pd df = pd.read_sql(sql, cnxn)
Via mikebmassey from a similar question
import pyodbc import pandas.io.sql as psql cnxn = pyodbc.connect(connection_info) cursor = cnxn.cursor() sql = "SELECT * FROM TABLE" df = psql.frame_query(sql, cnxn) cnxn.close()
If you are using SQLAlchemy’s ORM rather than the expression language, you might find yourself wanting to convert an object of type
sqlalchemy.orm.query.Query to a Pandas data frame.
Here is one way to do it, starting with a Query object called ‘query’:
data_records = [rec.__dict__ for rec in query.all()] df = pandas.DataFrame.from_records(data_records)
I’m curious to know if there’s a better approach, but this did the trick for me in two lines.
pandas now has a
read_sql function. You definitely want to use that instead.
I can’t help you with SQLAlchemy — I always use pyodbc, MySQLdb, or psychopg2 as needed. But when doing so, a function as simple as the one below tends to suit my needs:
import decimal import pydobc import numpy as np import pandas cnn, cur = myConnectToDBfunction() cmd = "SELECT * FROM myTable" cur.execute(cmd) dataframe = __processCursor(cur, dataframe=True) def __processCursor(cur, dataframe=False, index=None): ''' Processes a database cursor with data on it into either a structured numpy array or a pandas dataframe. input: cur - a pyodbc cursor that has just received data dataframe - bool. if false, a numpy record array is returned if true, return a pandas dataframe index - list of column(s) to use as index in a pandas dataframe ''' datatypes =  colinfo = cur.description for col in colinfo: if col == unicode: datatypes.append((col, 'U%d' % col)) elif col == str: datatypes.append((col, 'S%d' % col)) elif col in [float, decimal.Decimal]: datatypes.append((col, 'f4')) elif col == datetime.datetime: datatypes.append((col, 'O4')) elif col == int: datatypes.append((col, 'i4')) data =  for row in cur: data.append(tuple(row)) array = np.array(data, dtype=datatypes) if dataframe: output = pandas.DataFrame.from_records(array) if index is not None: output = output.set_index(index) else: output = array return output
Like Nathan, I often want to dump the results of a sqlalchemy or sqlsoup Query into a Pandas data frame. My own solution for this is:
query = session.query(tbl.Field1, tbl.Field2) DataFrame(query.all(), columns=[column['name'] for column in query.column_descriptions])
resoverall is a sqlalchemy ResultProxy object. You can read more about it in the sqlalchemy docs, the latter explains basic usage of working with Engines and Connections. Important here is that
resoverall is dict like.
Pandas likes dict like objects to create its data structures, see the online docs
Good luck with sqlalchemy and pandas.
This question is old, but I wanted to add my two-cents. I read the question as ” I want to run a query to my [my]SQL database and store the returned data as Pandas data structure [DataFrame].”
From the code it looks like you mean mysql database and assume you mean pandas DataFrame.
import MySQLdb as mdb import pandas.io.sql as sql from pandas import * conn = mdb.connect('<server>','<user>','<pass>','<db>'); df = sql.read_frame('<query>', conn)
conn = mdb.connect('localhost','myname','mypass','testdb'); df = sql.read_frame('select * from testTable', conn)
This will import all rows of testTable into a DataFrame.
pyodbc together. You’ll have to modify your connection string (
connstr) according to your database specifications.
import pyodbc import pandas as pd # MSSQL Connection String Example connstr = "Server=myServerAddress;Database=myDB;User Id=myUsername;Password=myPass;" # Query Database and Create DataFrame Using Results df = pd.read_sql("select * from myTable", pyodbc.connect(connstr))
pyodbc with several enterprise databases (e.g. SQL Server, MySQL, MariaDB, IBM).
Long time from last post but maybe it helps someone…
Shorted way than Paul H:
my_dic = session.query(query.all()) my_df = pandas.DataFrame.from_dict(my_dic)
best way I do this
db.execute(query) where db=db_class() #database class mydata=[x for x in db.fetchall()] df=pd.DataFrame(data=mydata)
Here is mine. Just in case if you are using “pymysql”:
import pymysql from pandas import DataFrame host = 'localhost' port = 3306 user = 'yourUserName' passwd = 'yourPassword' db = 'yourDatabase' cnx = pymysql.connect(host=host, port=port, user=user, passwd=passwd, db=db) cur = cnx.cursor() query = """ SELECT * FROM yourTable LIMIT 10""" cur.execute(query) field_names = [i for i in cur.description] get_data = [xx for xx in cur] cur.close() cnx.close() df = DataFrame(get_data) df.columns = field_names
If the result type is ResultSet, you should convert it to dictionary first. Then the DataFrame columns will be collected automatically.
This works on my case:
df = pd.DataFrame([dict(r) for r in resoverall])
For those that works with the mysql connector you can use this code as a start. (Thanks to @Daniel Velkov)
import pandas as pd import mysql.connector # Setup MySQL connection db = mysql.connector.connect( host="<IP>", # your host, usually localhost user="<USER>", # your username password="<PASS>", # your password database="<DATABASE>" # name of the data base ) # You must create a Cursor object. It will let you execute all the queries you need cur = db.cursor() # Use all the SQL you like cur.execute("SELECT * FROM <TABLE>") # Put it all to a data frame sql_data = pd.DataFrame(cur.fetchall()) sql_data.columns = cur.column_names # Close the session db.close() # Show the data print(sql_data.head())