Thursday, February 23, 2017

Importing and analyzing quickbooks gl detail with Pandas

QuickBooksImport!!!!

Importing and analyzing a Quickbooks general ledger with Pandas

Pandas is amazing. What is pandas? Well, I'm glad you asked. Per http://pandas.pydata.org/, pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

For us accountants, Pandas is a complete game changer for the analysis, auditing, and completely beefing up your ability to do powerful data analytics. In this example we'll be pulling in a quickbooks general ledger from excel, but you can import multiple file formats. The more standardized the format, the better it will play with Pandas. The best formats are CSV or pipe delimited, but as we show here, excel can be used as well.

This post is not a complete tutorial on how to use Pandas, rather it's more of a show and tell of what's possible with Pandas. And it's only scratching the surface!

Ok, to get started, first I'll import the pandas library. For display purposes, I'll format the numbers so they will show commas and two decimal places.

In [1]:
import pandas as pd
pd.options.display.float_format = '{:,.2f}'.format

Now we'll import the sample Quickbooks general ledger excel file.

If you're trying this at home, download the source file here https://github.com/danshorstein/python4cpas/raw/master/01_Pandas_and_Jupyter_notebook_starter/general-ledger-sample.xlsx and save it to the same location as your jupyter notebook.

In [2]:
gl = pd.read_excel(r'general-ledger-sample.xlsx')

Let's see what this puppy looks like.

I'll use the .head() method to view the first 5 rows of data.

In [3]:
gl.head()
Out[3]:
Type Unnamed: 1 Date Unnamed: 3 Num Unnamed: 5 Adj Unnamed: 7 Name Unnamed: 9 Memo Unnamed: 11 Split Unnamed: 13 Debit Unnamed: 15 Credit Unnamed: 17 Balance
NaN Company Checking Account NaN NaN NaN NaN nan NaT nan NaN nan nan nan NaN nan NaN nan NaN nan nan nan nan nan 12,349.00
NaN NaN NaN NaN Check nan 2018-01-01 nan 5001 nan nan nan Kuver Property nan Janaury Rent nan Rent nan nan nan 583.75 nan 11,765.25
NaN Check nan 2018-01-01 nan 5000 nan nan nan Online Accounting nan Set up QuickBooks file nan Accounting Fees nan nan nan 225.00 nan 11,540.25
NaN Deposit nan 2018-01-01 nan NaN nan nan nan NaN nan Deposit nan -SPLIT- nan 31,349.00 nan nan nan 42,889.25
NaN Check nan 2018-01-05 nan 5002 nan nan nan Deborah Wood (Owner) nan Petty Cash nan Petty Cash Account nan nan nan 500.00 nan 42,389.25

Yuck! That's ugly! Help?

There are several issues with this file we need to fix. As I mentioned above, excel files are not super friendly for data analysis or Pandas.

1) I see some blank columns we need to delete (Column head = Unnamed: x and details are all "NaN", which stands for Not a Number). I'll also fill in any NaNs in the Debit and Credit column with 0s.

2) The first 5 columns were imported as a big nested index. Don't worry about what that means, we'll fix that next. This is one of the issues that can come up when importing from an excel file.

Fortunately, I've written some code to fix the format.

I won't get into the details of what the code to fix the file is doing, but I attempted to write it in a way that would work with any standard formatted quickbooks excel general ledger. Feel free to try it out and let me know if it works for you. Here's the code, but don't worry about what it means for now. Just know it's awesome.

In [4]:
def acct_append(row, new_accts):
    if pd.isnull(row[1]):
        new_accts.append(row[0])
    else:
        new_accts.append('{} | {}'.format(*row))


def fix_qb_gl(gl):
    gl = gl.dropna(axis=1, how='all')
    main_acct = list(gl.index.get_level_values(1))
    sub_acct = list(gl.index.get_level_values(2))
    acct = list(zip(main_acct, sub_acct))
    new_accts = []
    acct_append(acct[0], new_accts)
    
    for idx, (m, s) in enumerate(acct[1:]):
        if str(m).startswith('Total'):
            m = 'DELETE'       
        if str(s).startswith('Total'):
            s = 'DELETE'
        idx += 1
        acct[idx] = m, s

        if pd.isnull(m):  # Fill NA if main is NA
            acct[idx] = acct[idx - 1][0], acct[idx][1]
            
            if pd.isnull(s):  # If main is NA, then fill NA if sub is NA
                acct[idx] = acct[idx][0], acct[idx-1][1]

        acct_append(acct[idx], new_accts)  # Create the new acct
    gl = gl.reset_index(drop=True)
    gl['Acct'] = pd.Series(new_accts)
    gl[['Debit', 'Credit']] = gl[['Debit', 'Credit']].fillna(0)
    gl['Net'] = gl.apply(lambda x: (x['Debit'] - x['Credit']
                                    if 'DELETE' not in x['Acct']
                                    else 0), axis=1)
    gl = gl.fillna('NA')
    gl = gl.where(gl['Net'] != 0).dropna()
    columns = ['Acct', 'Type', 'Date', 'Num', 'Name', 'Memo',
               'Split', 'Debit', 'Credit', 'Net']
    gl = gl[columns]
    gl['Date'] = gl['Date'].apply(pd.datetime.date)
    return gl
In [5]:
gl = fix_qb_gl(gl)

Let's see how it looks now...

In [6]:
gl.head()
Out[6]:
Acct Type Date Num Name Memo Split Debit Credit Net
1 Company Checking Account Check 2018-01-01 5001 Kuver Property Janaury Rent Rent 0.00 583.75 -583.75
2 Company Checking Account Check 2018-01-01 5000 Online Accounting Set up QuickBooks file Accounting Fees 0.00 225.00 -225.00
3 Company Checking Account Deposit 2018-01-01 NA NA Deposit -SPLIT- 31,349.00 0.00 31,349.00
4 Company Checking Account Check 2018-01-05 5002 Deborah Wood (Owner) Petty Cash Petty Cash Account 0.00 500.00 -500.00
5 Company Checking Account Check 2018-01-08 5003 Gas & Electic Company ACCT# 098703 Utilities 0.00 65.08 -65.08

Ah, much better! Now it's looking like a GL! Let's see some more info about this file.

FYI, the file format in Pandas is called a DataFrame. You can get information about the size and type of data by calling the .info() method.

In [7]:
gl.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5011 entries, 1 to 5446
Data columns (total 10 columns):
Acct      5011 non-null object
Type      5011 non-null object
Date      5011 non-null object
Num       5011 non-null object
Name      5011 non-null object
Memo      5011 non-null object
Split     5011 non-null object
Debit     5011 non-null float64
Credit    5011 non-null float64
Net       5011 non-null float64
dtypes: float64(3), object(7)
memory usage: 430.6+ KB

Now let's create a pivot table summing up the account activies.

In [8]:
pivot = gl.pivot_table(values=['Debit', 'Credit', 'Net'], index='Acct', aggfunc='sum', margins=True)
accts = list(gl.Acct.unique())
accts.append('All')
pivot.loc[accts]
Out[8]:
Credit Debit Net
Acct
Company Checking Account 403,171.13 480,976.45 77,805.32
Petty Cash Account 0.00 500.00 500.00
Accounts Receivable 375,976.45 408,309.53 32,333.08
Inventory Asset 173,360.75 131,301.50 -42,059.25
Prepaids | Prepaid Insurance 6,875.00 6,875.00 0.00
Undeposited Funds 375,976.45 375,976.45 0.00
Computer & Office Equipment 0.00 2,500.00 2,500.00
Accumulated Depreciation 923.04 0.00 -923.04
Accounts Payable 147,728.80 131,031.50 -16,697.30
QuickBooks Credit Card 3,453.60 2,204.48 -1,249.12
Customer Deposits 0.00 3,500.00 3,500.00
Line of Credit 106,208.85 25,000.00 -81,208.85
Payroll Liabilities | Payroll Taxes Payable 14,251.22 11,377.50 -2,873.72
Payroll Liabilities | SEC125 Payable 1,400.00 1,350.00 -50.00
Deborah Wood Equity | Deborah Wood's Time to Jobs 114,450.00 114,450.00 0.00
Deborah Wood Equity | Deborah Wood Draws 0.00 135,000.00 135,000.00
Opening Bal Equity 375.00 0.00 -375.00
Revenue | Revenue 411,809.53 0.00 -411,809.53
Direct Labor | Wages - Sales-Inside 0.00 2,500.00 2,500.00
Direct Labor | Wages - Warehouse 0.00 19,705.00 19,705.00
Freight Costs 0.00 1,810.00 1,810.00
Packaging Materials 0.00 1,752.50 1,752.50
Purchases (Cost of Goods) 69.00 180,030.75 179,961.75
Sales Commission (outside reps) 0.00 3,990.80 3,990.80
Advertising Expense 0.00 2,000.00 2,000.00
Business License & Fees 0.00 710.23 710.23
Car/Truck Expense | Car Lease 0.00 6,756.00 6,756.00
Car/Truck Expense | Gas 0.00 907.64 907.64
Car/Truck Expense | Insurance-Auto 0.00 1,440.00 1,440.00
Car/Truck Expense | Registration & License 0.00 546.00 546.00
Car/Truck Expense | Repairs & Maintenance 0.00 1,700.23 1,700.23
Conferences and Seminars 0.00 4,700.00 4,700.00
Contributions 0.00 2,500.00 2,500.00
Depreciation Expense 0.00 923.04 923.04
Dues and Subscriptions 0.00 1,900.00 1,900.00
Insurance | General Liability Insurance 0.00 2,760.00 2,760.00
Insurance | Owner's Health Insurance 0.00 4,272.00 4,272.00
Insurance | Professional Liability Insuranc 0.00 6,875.00 6,875.00
Insurance | Worker's Compensation 0.00 2,782.08 2,782.08
Maintenance/Janitorial 0.00 2,841.95 2,841.95
Marketing Expense 0.00 4,982.00 4,982.00
Meals and Entertainment 0.00 1,376.35 1,376.35
Office Equipment 0.00 1,100.00 1,100.00
Postage and Delivery 0.00 1,098.00 1,098.00
Professional Fees | Accounting Fees 0.00 2,544.00 2,544.00
Professional Fees | Legal Fees 0.00 600.00 600.00
Professional Fees | Payroll Service Fees 0.00 1,529.24 1,529.24
Promotional Expense 0.00 2,021.00 2,021.00
Rent 0.00 7,005.00 7,005.00
Repairs | Computer Repairs 0.00 390.00 390.00
Supplies 0.00 6,199.36 6,199.36
Telephone 0.00 4,003.44 4,003.44
Travel 0.00 3,452.23 3,452.23
Utilities 0.00 501.59 501.59
Wages | Employee Benefits 0.00 2,253.96 2,253.96
Wages | Payroll Tax Expenses 0.00 4,608.57 4,608.57
Wages | Wages - Office Staff 0.00 6,312.00 6,312.00
Other Expense | Interest Expense 0.00 2,296.45 2,296.45
All 2,136,028.82 2,136,028.82 0.00

Here are some examples of things you can do now that we have the GL in Pandas.

Let's look at the detail for just one account: Office Equipment

In [9]:
gl[gl.Acct == 'Office Equipment']
Out[9]:
Acct Type Date Num Name Memo Split Debit Credit Net
4893 Office Equipment Check 2018-01-25 5015 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4894 Office Equipment Check 2018-02-22 5042 Lincoln Office Equip. Fax machine Company Checking Account 450.00 0.00 450.00
4895 Office Equipment Check 2018-02-26 5049 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4896 Office Equipment Check 2018-03-26 5072 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4897 Office Equipment Check 2018-04-25 5098 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4898 Office Equipment Check 2018-05-25 5126 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4899 Office Equipment Check 2018-06-22 5145 Lincoln Office Equip. Printer Company Checking Account 350.00 0.00 350.00
4900 Office Equipment Check 2018-06-25 5153 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4901 Office Equipment Check 2018-07-25 5181 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4902 Office Equipment Check 2018-08-24 5211 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4903 Office Equipment Check 2018-09-25 5237 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4904 Office Equipment Check 2018-10-25 5263 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4905 Office Equipment Check 2018-11-26 5286 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00
4906 Office Equipment Check 2018-12-27 5311 Pace Shipping Supplies postage meter rental Company Checking Account 25.00 0.00 25.00

Let's see all transactions that are over $30,000

In [10]:
gl.where(abs(gl.Net) > 30000).dropna()
Out[10]:
Acct Type Date Num Name Memo Split Debit Credit Net
3 Company Checking Account Deposit 2018-01-01 NA NA Deposit -SPLIT- 31,349.00 0.00 31,349.00
78 Company Checking Account Transfer 2018-03-05 NA NA Funds Transfer Line of Credit 50,000.00 0.00 50,000.00
145 Company Checking Account Deposit 2018-05-01 NA NA Deposit -SPLIT- 34,194.85 0.00 34,194.85
381 Company Checking Account Deposit 2018-12-01 NA NA Deposit -SPLIT- 35,550.60 0.00 35,550.60
1923 Line of Credit Transfer 2018-03-05 NA NA Funds Transfer Company Checking Account 0.00 50,000.00 -50,000.00

Now let's see the transactions that occurred between 1/26/18 and 1/27/18

In [11]:
start_date = pd.datetime(2018,1,26).date()
end_date = pd.datetime(2018,1,27).date()
gl.where((start_date <= gl.Date) & (gl.Date <= end_date)).dropna()
Out[11]:
Acct Type Date Num Name Memo Split Debit Credit Net
21 Company Checking Account Check 2018-01-26 5019 Deborah Wood (Owner) Owners Draw Deborah Wood Draws 0.00 10,500.00 -10,500.00
664 Inventory Asset Bill 2018-01-26 b8796 Peacock Bulb Manufacturing Company Halogen Lamp, 20 watts, Luminous Intensity Accounts Payable 180.00 0.00 180.00
665 Inventory Asset Bill 2018-01-26 b8796 Peacock Bulb Manufacturing Company Specialty, Stage & Studio Bulbs 60 watt (set ... Accounts Payable 1,250.00 0.00 1,250.00
666 Inventory Asset Bill 2018-01-26 b8796 Peacock Bulb Manufacturing Company Specialty Bulbs, 100 watt (6 pack) Accounts Payable 712.50 0.00 712.50
667 Inventory Asset Bill 2018-01-26 b8796 Peacock Bulb Manufacturing Company Cand. Light, 20 watts (8 pack) Accounts Payable 3,600.00 0.00 3,600.00
668 Inventory Asset Bill 2018-01-26 b8796 Peacock Bulb Manufacturing Company Halogen Lamp, Volts:3.5 Tubular Accounts Payable 100.00 0.00 100.00
669 Inventory Asset Bill 2018-01-26 b8796 Peacock Bulb Manufacturing Company Fluorescent Lamp, T-12, Medium Bipin (30pack) Accounts Payable 3,300.00 0.00 3,300.00
1778 Accounts Payable Bill 2018-01-26 b8796 Peacock Bulb Manufacturing Company NA -SPLIT- 0.00 9,142.50 -9,142.50
2856 Deborah Wood Equity | Deborah Wood Draws Check 2018-01-26 5019 Deborah Wood (Owner) Owners Draw Company Checking Account 10,500.00 0.00 10,500.00

AWESOME

Well there you go, I've shown that you can fairly easily import a quickbooks general ledger file from excel into pandas, fix the formatting issues, and run some simple queries. More to come next time! Oh ps, you can save any of these to excel or csv, which I'll get more into in a future post.

-Dan

15 comments:

  1. Thanks a lot, your blog is so helpful!
    I am an internal auditor, and also a newbie to Python. Your post inspired me a lot on how to solve my daily problems that I cannot solved for many years. Keep up your good work!!!

    ReplyDelete
    Replies
    1. Glad you find it useful! More (hopefully) helpful Python stuff is coming soon!

      Delete
    2. Hi Daniel, would you mind sharing a little bit more on how to tidy up the format of the data. As I think not many of us using quickbook but different accounting systems, every time we have to tidy the data before we can enter to any analysis work, and for my experience it is the most time consuming and frustrating workload. It would be so helpful that if there is any handy guide for us to deal with those formatting issues. Thanks!

      Delete
    3. That's a great idea Roy. I don't know how simple it would be to create a generic guide to tidying up data, but perhaps I could come up with a few common issues that I find, and ways to handle them. If you have any examples you'd want me to walk through, I'd be happy to do so (just please remove any sensitive data) - if so, email to python4cpas@gmail.com

      Delete
  2. Hi,

    Just curious, but what are the primary advantages of using python for data analytics over say VB.NET? I'd imagine the latter plays better with Excel, which lends itself better to BI overall. Or is this just an incorrect assumption on my part?

    ReplyDelete
    Replies
    1. Hey Redeam, thanks for the question.

      I am not really a programmer, I've just started picking up on it over the past few years. I can do a little with visual basic, but when I discovered Python, everything changed. It's just so easy to use, has tons of support, and you can do so much with very little code. But I'd say it's probably just a personal preference. If you are good at using VB.NET, that's probably better for doing work with Excel. But otherwise, I think the advantages of Python's ease of use and fantastic library of packages and modules makes it a great language for doing data analytics (for example, web scraping/requests, datetime calculations, importing many different varieties of data types, etc is really easy with Python, and I have no idea about VB.NET)

      Delete
    2. Haha, thanks for responding! My experience is very similar to yours (i.e. I am an accountant, recently a CPA, doing accounting at a bank), except in terms of programming, I started with VB 6.0 and then moved to VB.NET and SQL. It is just recently that I started learning python, so I thought I'd pick your brain a bit. Any recommended resources to get started? I've mostly been reading PEPs, wikipedia, stack, and also the book 'Automate the Boring Stuff.' Also, are there any libraries to look into specifically that are targeted at our profession or are just generally useful (in addition to Pandas)? both BeautifulSoup and openpyxl seem interesting. Thanks again!

      Delete
  3. Nice, glad to find another data oriented CPA! One of my goals in starting this blog was to try and find other data oriented / pythonic accounting people, as there aren't a ton of us (yet...), and get a little community going. One day I hope to get a few open source accounting projects going. Congrats on getting your CPA! As far as good resources go, I think that's a great idea for my next blog post, so thanks for the idea. I'll get a good list together of some recommended resources for Python beginners and put it up soon.

    Regarding accounting oriented Python, there's really not much that I could find out there. I've seen a few github projects that mention accounting, but I didn't find them very user friendly or useful, but there may be others I haven't found yet. That's one area I would love to try and use this blog to help contribute, and get some open source accounting python projects out there.

    For now though, here are some recommendations to get you going:

    TRAINING:
    Coursera has some great beginner Python courses. But since you are not new to programming, just new to Python, I'd actually recommend taking one or both of these courses: https://training.talkpython.fm/courses/explore_python_jumpstart/python-language-jumpstart-building-10-apps and https://training.talkpython.fm/courses/explore_pythonic_code/write-pythonic-code-like-a-seasoned-developer. They are the only Python courses I've ever paid for, but I got so much more value out of them than the cost of the course.

    PODCASTS:
    Check out these three podcasts: talk python (https://talkpython.fm/episodes/all), python bytes (https://pythonbytes.fm/episodes/all), and Podcast.__init__ (https://www.podcastinit.com/) - they are packed full of interesting interviews, interesting libraries and Python news, and great for inspiration about new projects and just cool stuff you can do with Python. Also gives a fantastic insight into the Python community

    PYTHON VIDEOS:
    There are several excellent Python conferences held each year, and most of them post their talks on youtube. Check out ones that interest you. Also, David Beazley is a superstar coder and entertaining speaker, and has some great videos. This one's a fun one to start with: https://youtu.be/j6VSAsKAj98

    BOOKS:
    I have all three of these, and they all have different approaches but are great ways to get more understanding of how to use Python and the different libraries. The automate the boring stuff is good too, but he keeps it at a pretty basic level and only give a few examples, where as these books dive in a little deeper, and present the code in a more Pythonic way.
    https://www.amazon.com/gp/product/1449319793/
    https://www.amazon.com/gp/product/1449340377/
    https://www.amazon.com/gp/product/1491946008/

    ReplyDelete
    Replies
    1. Hi Dan,

      I am excited to find your blog. As I am a data oriented CPA and looking for similar minds too. :) I learnt VB a little bit back in high school and never touched coding again until I find VB very useful in cleaning accounting data during work. I learnt and used VB during the past couple years. I recently learn about Python and find it a more powerful tool than VB. So I started learning it and came across your blog. Love your posts and looking forward to more.

      Delete
    2. Thanks for the comment, glad to see you are following this. Stay tuned for more hopefully soon! In the meantime check out my latest post here https://www.python4cpas.com/2018/12/?m=1

      Delete
  4. Daniel. Awesome website idea and I look forward to future posts! I'm a corporate controller and always looking to add to my skillsets, so I've recently picked up learning Python.
    My questions is, from the above analysis, I don't see how this couldn't have been achieved quicker and easier by using Pivot Tables? Maybe I'm just not understanding the full potential of pandas.

    Thanks,
    RN!



    ReplyDelete
    Replies
    1. Hey Robert, thanks for reading and for the comment! I presume you are referring to Excel's Pivot Tables rather than the pandas.pivot_table() method. I agree in this example, it probably would have been just as easy to filter and view the data in Excel Pivot Tables once it's in pivot table-friendly format. I also agree that you (and honestly I) haven't discovered the full potential of pandas. One thing pandas is really good at is cleaning up data, so start there if you're looking to get into it. After that, check out its capabilities with dealing with dates. It's also nice to be able to automate a big part of the process if you have a multi-step process with different data sources; Python and pandas is great for situations like that, where you can collect data from different sources, munge it together and bring into a pandas DataFrame, and even run some specific views, and save them to file or on an intranet page, or even email them out. Pivot Tables can do parts of that, but not the full thing.

      Delete
  5. Hi Daniel,

    Have you tried applying Python codes on VAT analysis/reconciliation? I am in a statutory reporting team which mainly responsible for VAT reconciliation, tax and year end audit. Currently I have challenge dealing with VAT reconciliation.

    As most people would know, product sold could be classified as exempted, zero rated or standard rated. So my job is trying to identify the nature of VAT derived from sales. At the moment, my team is using excel and SAP to match the sales to its nature. However, the challenge is we have a list of items for a sales invoice which only some items are subject to VAT. Worse, the invoice is in PDF format. So I have to manually check each sales record on SAP (like few thousands or more per month) and cross check with invoice.

    I learn some basic Python skill, however, so far I haven't seen particular skill to extract numbers in PDF and cross check with Excel.

    ReplyDelete
    Replies
    1. Hey there, thanks for the comment! I have not done any analysis on VAT per se, but your work sounds like a great opportunity for Python. I actually recently discovered a library that does a great job of extracting data from PDF files, called pdfplumber, and I plan to do a post on that soon. I'll try and show an example of how to compare items from a PDF file with items from an excel file :)

      Delete
  6. Hi Daniel, this is really helpful. It will wonderful a little explanation of the code (#), specially the part where you change the format of the file to make it workable. I do not only want to copy but to be able to write it myself at some point.

    ReplyDelete