Importing and analyzing a Quickbooks general ledger with Pandas¶
Pandas is amazing. What is pandas? Well, I'm glad you asked. Per http://pandas.pydata.org/, pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
For us accountants, Pandas is a complete game changer for the analysis, auditing, and completely beefing up your ability to do powerful data analytics. In this example we'll be pulling in a quickbooks general ledger from excel, but you can import multiple file formats. The more standardized the format, the better it will play with Pandas. The best formats are CSV or pipe delimited, but as we show here, excel can be used as well.
This post is not a complete tutorial on how to use Pandas, rather it's more of a show and tell of what's possible with Pandas. And it's only scratching the surface!
Ok, to get started, first I'll import the pandas library. For display purposes, I'll format the numbers so they will show commas and two decimal places.
import pandas as pd
pd.options.display.float_format = '{:,.2f}'.format
Now we'll import the sample Quickbooks general ledger excel file.¶
If you're trying this at home, download the source file here https://github.com/danshorstein/python4cpas/raw/master/01_Pandas_and_Jupyter_notebook_starter/general-ledger-sample.xlsx and save it to the same location as your jupyter notebook.
gl = pd.read_excel(r'general-ledger-sample.xlsx')
Let's see what this puppy looks like.¶
I'll use the .head() method to view the first 5 rows of data.
gl.head()
Yuck! That's ugly! Help?¶
There are several issues with this file we need to fix. As I mentioned above, excel files are not super friendly for data analysis or Pandas.
1) I see some blank columns we need to delete (Column head = Unnamed: x and details are all "NaN", which stands for Not a Number). I'll also fill in any NaNs in the Debit and Credit column with 0s.
2) The first 5 columns were imported as a big nested index. Don't worry about what that means, we'll fix that next. This is one of the issues that can come up when importing from an excel file.
Fortunately, I've written some code to fix the format.¶
I won't get into the details of what the code to fix the file is doing, but I attempted to write it in a way that would work with any standard formatted quickbooks excel general ledger. Feel free to try it out and let me know if it works for you. Here's the code, but don't worry about what it means for now. Just know it's awesome.
def acct_append(row, new_accts):
if pd.isnull(row[1]):
new_accts.append(row[0])
else:
new_accts.append('{} | {}'.format(*row))
def fix_qb_gl(gl):
gl = gl.dropna(axis=1, how='all')
main_acct = list(gl.index.get_level_values(1))
sub_acct = list(gl.index.get_level_values(2))
acct = list(zip(main_acct, sub_acct))
new_accts = []
acct_append(acct[0], new_accts)
for idx, (m, s) in enumerate(acct[1:]):
if str(m).startswith('Total'):
m = 'DELETE'
if str(s).startswith('Total'):
s = 'DELETE'
idx += 1
acct[idx] = m, s
if pd.isnull(m): # Fill NA if main is NA
acct[idx] = acct[idx - 1][0], acct[idx][1]
if pd.isnull(s): # If main is NA, then fill NA if sub is NA
acct[idx] = acct[idx][0], acct[idx-1][1]
acct_append(acct[idx], new_accts) # Create the new acct
gl = gl.reset_index(drop=True)
gl['Acct'] = pd.Series(new_accts)
gl[['Debit', 'Credit']] = gl[['Debit', 'Credit']].fillna(0)
gl['Net'] = gl.apply(lambda x: (x['Debit'] - x['Credit']
if 'DELETE' not in x['Acct']
else 0), axis=1)
gl = gl.fillna('NA')
gl = gl.where(gl['Net'] != 0).dropna()
columns = ['Acct', 'Type', 'Date', 'Num', 'Name', 'Memo',
'Split', 'Debit', 'Credit', 'Net']
gl = gl[columns]
gl['Date'] = gl['Date'].apply(pd.datetime.date)
return gl
gl = fix_qb_gl(gl)
Let's see how it looks now...¶
gl.head()
Ah, much better! Now it's looking like a GL! Let's see some more info about this file.¶
FYI, the file format in Pandas is called a DataFrame. You can get information about the size and type of data by calling the .info() method.
gl.info()
Now let's create a pivot table summing up the account activies.¶
pivot = gl.pivot_table(values=['Debit', 'Credit', 'Net'], index='Acct', aggfunc='sum', margins=True)
accts = list(gl.Acct.unique())
accts.append('All')
pivot.loc[accts]
gl[gl.Acct == 'Office Equipment']
Let's see all transactions that are over $30,000¶
gl.where(abs(gl.Net) > 30000).dropna()
Now let's see the transactions that occurred between 1/26/18 and 1/27/18¶
start_date = pd.datetime(2018,1,26).date()
end_date = pd.datetime(2018,1,27).date()
gl.where((start_date <= gl.Date) & (gl.Date <= end_date)).dropna()
AWESOME¶
Well there you go, I've shown that you can fairly easily import a quickbooks general ledger file from excel into pandas, fix the formatting issues, and run some simple queries. More to come next time! Oh ps, you can save any of these to excel or csv, which I'll get more into in a future post.
-Dan