Thursday, February 23, 2017

Importing and analyzing quickbooks gl detail with Pandas

QuickBooksImport!!!!

Importing and analyzing a Quickbooks general ledger with Pandas

Pandas is amazing. What is pandas? Well, I'm glad you asked. Per http://pandas.pydata.org/, pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

For us accountants, Pandas is a complete game changer for the analysis, auditing, and completely beefing up your ability to do powerful data analytics. In this example we'll be pulling in a quickbooks general ledger from excel, but you can import multiple file formats. The more standardized the format, the better it will play with Pandas. The best formats are CSV or pipe delimited, but as we show here, excel can be used as well.

This post is not a complete tutorial on how to use Pandas, rather it's more of a show and tell of what's possible with Pandas. And it's only scratching the surface!

Ok, to get started, first I'll import the pandas library. For display purposes, I'll format the numbers so they will show commas and two decimal places.

In [1]:
import pandas as pd
pd.options.display.float_format = '{:,.2f}'.format

Now we'll import the sample Quickbooks general ledger excel file.

If you're trying this at home, download the source file here https://github.com/danshorstein/python4cpas/raw/master/01_Pandas_and_Jupyter_notebook_starter/general-ledger-sample.xlsx and save it to the same location as your jupyter notebook.

In [2]:
gl = pd.read_excel(r'general-ledger-sample.xlsx')

Let's see what this puppy looks like.

I'll use the .head() method to view the first 5 rows of data.

In [3]:
gl.head()
Out[3]:
Type Unnamed: 1 Date Unnamed: 3 Num Unnamed: 5 Adj Unnamed: 7 Name Unnamed: 9 Memo Unnamed: 11 Split Unnamed: 13 Debit Unnamed: 15 Credit Unnamed: 17 Balance
NaN Company Checking Account NaN NaN NaN NaN nan NaT nan NaN nan nan nan NaN nan NaN nan NaN nan nan nan nan nan 12,349.00
NaN NaN NaN NaN Check nan 2018-01-01 nan 5001 nan nan nan Kuver Property nan Janaury Rent nan Rent nan nan nan 583.75 nan 11,765.25
NaN Check nan 2018-01-01 nan 5000 nan nan nan Online Accounting nan Set up QuickBooks file nan Accounting Fees nan nan nan 225.00 nan 11,540.25
NaN Deposit nan 2018-01-01 nan NaN nan nan nan NaN nan Deposit nan -SPLIT- nan 31,349.00 nan nan nan 42,889.25
NaN Check nan 2018-01-05 nan 5002 nan nan nan Deborah Wood (Owner) nan Petty Cash nan Petty Cash Account nan nan nan 500.00 nan 42,389.25

Yuck! That's ugly! Help?

There are several issues with this file we need to fix. As I mentioned above, excel files are not super friendly for data analysis or Pandas.

1) I see some blank columns we need to delete (Column head = Unnamed: x and details are all "NaN", which stands for Not a Number). I'll also fill in any NaNs in the Debit and Credit column with 0s.

2) The first 5 columns were imported as a big nested index. Don't worry about what that means, we'll fix that next. This is one of the issues that can come up when importing from an excel file.

Fortunately, I've written some code to fix the format.

I won't get into the details of what the code to fix the file is doing, but I attempted to write it in a way that would work with any standard formatted quickbooks excel general ledger. Feel free to try it out and let me know if it works for you. Here's the code, but don't worry about what it means for now. Just know it's awesome.

In [4]:
def acct_append(row, new_accts):
    if pd.isnull(row[1]):
        new_accts.append(row[0])
    else:
        new_accts.append('{} | {}'.format(*row))


def fix_qb_gl(gl):
    gl = gl.dropna(axis=1, how='all')
    main_acct = list(gl.index.get_level_values(1))
    sub_acct = list(gl.index.get_level_values(2))
    acct = list(zip(main_acct, sub_acct))
    new_accts = []
    acct_append(acct[0], new_accts)
    
    for idx, (m, s) in enumerate(acct[1:]):
        if str(m).startswith('Total'):
            m = 'DELETE'       
        if str(s).startswith('Total'):
            s = 'DELETE'
        idx += 1
        acct[idx] = m, s

        if pd.isnull(m):  # Fill NA if main is NA
            acct[idx] = acct[idx - 1][0], acct[idx][1]
            
            if pd.isnull(s):  # If main is NA, then fill NA if sub is NA
                acct[idx] = acct[idx][0], acct[idx-1][1]

        acct_append(acct[idx], new_accts)  # Create the new acct
    gl = gl.reset_index(drop=True)
    gl['Acct'] = pd.Series(new_accts)
    gl[['Debit', 'Credit']] = gl[['Debit', 'Credit']].fillna(0)
    gl['Net'] = gl.apply(lambda x: (x['Debit'] - x['Credit']
                                    if 'DELETE' not in x['Acct']
                                    else 0), axis=1)
    gl = gl.fillna('NA')
    gl = gl.where(gl['Net'] != 0).dropna()
    columns = ['Acct', 'Type', 'Date', 'Num', 'Name', 'Memo',
               'Split', 'Debit', 'Credit', 'Net']
    gl = gl[columns]
    gl['Date'] = gl['Date'].apply(pd.datetime.date)
    return gl
In [5]:
gl = fix_qb_gl(gl)

Let's see how it looks now...

In [6]:
gl.head()
Out[6]:
Acct Type Date Num Name Memo Split Debit Credit Net
1 Company Checking Account Check 2018-01-01 5001 Kuver Property Janaury Rent Rent 0.00 583.75 -583.75
2 Company Checking Account Check 2018-01-01 5000 Online Accounting Set up QuickBooks file Accounting Fees 0.00 225.00 -225.00
3 Company Checking Account Deposit 2018-01-01 NA NA Deposit -SPLIT- 31,349.00 0.00 31,349.00
4 Company Checking Account Check 2018-01-05 5002 Deborah Wood (Owner) Petty Cash Petty Cash Account 0.00 500.00 -500.00
5 Company Checking Account Check 2018-01-08 5003 Gas & Electic Company ACCT# 098703 Utilities 0.00 65.08 -65.08

Ah, much better! Now it's looking like a GL! Let's see some more info about this file.

FYI, the file format in Pandas is called a DataFrame. You can get information about the size and type of data by calling the .info() method.

In [7]:
gl.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5011 entries, 1 to 5446
Data columns (total 10 columns):
Acct      5011 non-null object
Type      5011 non-null object
Date      5011 non-null object
Num       5011 non-null object
Name      5011 non-null object
Memo      5011 non-null object
Split     5011 non-null object
Debit     5011 non-null float64
Credit    5011 non-null float64
Net       5011 non-null float64
dtypes: float64(3), object(7)
memory usage: 430.6+ KB

Now let's create a pivot table summing up the account activies.

In [8]:
pivot = gl.pivot_table(values=['Debit', 'Credit', 'Net'], index='Acct', aggfunc='sum', margins=True)
accts = list(gl.Acct.unique())
accts.append('All')
pivot.loc[accts]
Out[8]:
Credit Debit Net
Acct
Company Checking Account 403,171.13 480,976.45 77,805.32
Petty Cash Account 0.00 500.00 500.00
Accounts Receivable 375,976.45 408,309.53 32,333.08
Inventory Asset 173,360.75 131,301.50 -42,059.25
Prepaids | Prepaid Insurance 6,875.00 6,875.00 0.00
Undeposited Funds 375,976.45 375,976.45 0.00
Computer & Office Equipment 0.00 2,500.00 2,500.00
Accumulated Depreciation 923.04 0.00 -923.04
Accounts Payable 147,728.80 131,031.50 -16,697.30
QuickBooks Credit Card 3,453.60 2,204.48 -1,249.12
Customer Deposits 0.00 3,500.00 3,500.00
Line of Credit 106,208.85 25,000.00 -81,208.85
Payroll Liabilities | Payroll Taxes Payable 14,251.22 11,377.50 -2,873.72
Payroll Liabilities | SEC125 Payable 1,400.00 1,350.00 -50.00
Deborah Wood Equity | Deborah Wood's Time to Jobs 114,450.00 114,450.00 0.00
Deborah Wood Equity | Deborah Wood Draws 0.00 135,000.00 135,000.00
Opening Bal Equity 375.00 0.00 -375.00
Revenue | Revenue 411,809.53 0.00 -411,809.53
Direct Labor | Wages - Sales-Inside 0.00 2,500.00 2,500.00
Direct Labor | Wages - Warehouse 0.00 19,705.00 19,705.00
Freight Costs 0.00 1,810.00 1,810.00
Packaging Materials 0.00 1,752.50 1,752.50
Purchases (Cost of Goods) 69.00 180,030.75 179,961.75
Sales Commission (outside reps) 0.00 3,990.80 3,990.80
Advertising Expense 0.00 2,000.00 2,000.00
Business License & Fees 0.00 710.23 710.23
Car/Truck Expense | Car Lease 0.00 6,756.00 6,756.00
Car/Truck Expense | Gas 0.00 907.64 907.64
Car/Truck Expense | Insurance-Auto 0.00 1,440.00 1,440.00
Car/Truck Expense | Registration & License 0.00 546.00 546.00
Car/Truck Expense | Repairs & Maintenance 0.00 1,700.23 1,700.23
Conferences and Seminars 0.00 4,700.00 4,700.00
Contributions 0.00 2,500.00 2,500.00
Depreciation Expense 0.00 923.04 923.04
Dues and Subscriptions 0.00 1,900.00 1,900.00
Insurance | General Liability Insurance 0.00 2,760.00 2,760.00
Insurance | Owner's Health Insurance 0.00 4,272.00 4,272.00
Insurance | Professional Liability Insuranc 0.00 6,875.00 6,875.00
Insurance | Worker's Compensation 0.00 2,782.08 2,782.08
Maintenance/Janitorial 0.00 2,841.95 2,841.95
Marketing Expense 0.00 4,982.00 4,982.00
Meals and Entertainment 0.00 1,376.35 1,376.35
Office Equipment 0.00 1,100.00 1,100.00
Postage and Delivery 0.00 1,098.00 1,098.00
Professional Fees | Accounting Fees 0.00 2,544.00 2,544.00
Professional Fees | Legal Fees 0.00 600.00 600.00
Professional Fees | Payroll Service Fees 0.00 1,529.24 1,529.24
Promotional Expense 0.00 2,021.00 2,021.00
Rent 0.00 7,005.00 7,005.00
Repairs | Computer Repairs 0.00 390.00 390.00
Supplies 0.00 6,199.36 6,199.36
Telephone 0.00 4,003.44 4,003.44
Travel 0.00 3,452.23 3,452.23
Utilities 0.00 501.59 501.59
Wages | Employee Benefits 0.00 2,253.96 2,253.96
Wages | Payroll Tax Expenses 0.00 4,608.57 4,608.57
Wages | Wages - Office Staff 0.00 6,312.00 6,312.00
Other Expense | Interest Expense 0.00 2,296.45 2,296.45
All 2,136,028.82 2,136,028.82 0.00

Here are some examples of things you can do now that we have the GL in Pandas.

Let's look at the detail for just one account: Office Equipment

In [9]:
gl[gl.Acct == 'Office Equipment']
Out[9]:
Acct Type Date Num Name Memo Split Debit Credit Net
4893 Office Equipment Check 2018-01-25 5015 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4894 Office Equipment Check 2018-02-22 5042 Lincoln Office Equip. Fax machine Company Checking Account 450.00 0.00 450.00
4895 Office Equipment Check 2018-02-26 5049 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4896 Office Equipment Check 2018-03-26 5072 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4897 Office Equipment Check 2018-04-25 5098 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4898 Office Equipment Check 2018-05-25 5126 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4899 Office Equipment Check 2018-06-22 5145 Lincoln Office Equip. Printer Company Checking Account 350.00 0.00 350.00
4900 Office Equipment Check 2018-06-25 5153 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4901 Office Equipment Check 2018-07-25 5181 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4902 Office Equipment Check 2018-08-24 5211 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4903 Office Equipment Check 2018-09-25 5237 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4904 Office Equipment Check 2018-10-25 5263 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4905 Office Equipment Check 2018-11-26 5286 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4906 Office Equipment Check 2018-12-27 5311 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00

Let's see all transactions that are over $30,000

In [10]:
gl.where(abs(gl.Net) > 30000).dropna()
Out[10]:
Acct Type Date Num Name Memo Split Debit Credit Net
3 Company Checking Account Deposit 2018-01-01 NA NA Deposit -SPLIT- 31,349.00 0.00 31,349.00
78 Company Checking Account Transfer 2018-03-05 NA NA Funds Transfer Line of Credit 50,000.00 0.00 50,000.00
145 Company Checking Account Deposit 2018-05-01 NA NA Deposit -SPLIT- 34,194.85 0.00 34,194.85
381 Company Checking Account Deposit 2018-12-01 NA NA Deposit -SPLIT- 35,550.60 0.00 35,550.60
1923 Line of Credit Transfer 2018-03-05 NA NA Funds Transfer Company Checking Account 0.00 50,000.00 -50,000.00

Now let's see the transactions that occurred between 1/26/18 and 1/27/18

In [11]:
start_date = pd.datetime(2018,1,26).date()
end_date = pd.datetime(2018,1,27).date()
gl.where((start_date <= gl.Date) & (gl.Date <= end_date)).dropna()
Out[11]:
Acct Type Date Num Name Memo Split Debit Credit Net
21 Company Checking Account Check 2018-01-26 5019 Deborah Wood (Owner) Owners Draw Deborah Wood Draws 0.00 10,500.00 -10,500.00
664 Inventory Asset Bill 2018-01-26 b8796 Peacock Bulb Manufacturing Company Halogen Lamp, 20 watts, Luminous Intensity Accounts Payable 180.00 0.00 180.00
665 Inventory Asset Bill 2018-01-26 b8796 Peacock Bulb Manufacturing Company Specialty, Stage & Studio Bulbs 60 watt (set ... Accounts Payable 1,250.00 0.00 1,250.00
666 Inventory Asset Bill 2018-01-26 b8796 Peacock Bulb Manufacturing Company Specialty Bulbs, 100 watt (6 pack) Accounts Payable 712.50 0.00 712.50
667 Inventory Asset Bill 2018-01-26 b8796 Peacock Bulb Manufacturing Company Cand. Light, 20 watts (8 pack) Accounts Payable 3,600.00 0.00 3,600.00
668 Inventory Asset Bill 2018-01-26 b8796 Peacock Bulb Manufacturing Company Halogen Lamp, Volts:3.5 Tubular Accounts Payable 100.00 0.00 100.00
669 Inventory Asset Bill 2018-01-26 b8796 Peacock Bulb Manufacturing Company Fluorescent Lamp, T-12, Medium Bipin (30pack) Accounts Payable 3,300.00 0.00 3,300.00
1778 Accounts Payable Bill 2018-01-26 b8796 Peacock Bulb Manufacturing Company NA -SPLIT- 0.00 9,142.50 -9,142.50
2856 Deborah Wood Equity | Deborah Wood Draws Check 2018-01-26 5019 Deborah Wood (Owner) Owners Draw Company Checking Account 10,500.00 0.00 10,500.00

AWESOME

Well there you go, I've shown that you can fairly easily import a quickbooks general ledger file from excel into pandas, fix the formatting issues, and run some simple queries. More to come next time! Oh ps, you can save any of these to excel or csv, which I'll get more into in a future post.

-Dan