beancount.ingest

Code to help identify, extract, and file external downloads.

This packages contains code to help you build importers and drive the process of identifying which importer to run on an externally downloaded file, extract transactions from them and file away these files under a clean and rigidly named hierarchy for preservation.

beancount.ingest.cache

A file wrapper which acts as a cache for on-demand evaluation of conversions.

This object is used in lieu of a file in order to allow the various importers to reuse each others’ conversion results. Converting file contents, e.g. PDF to text, can be expensive.

beancount.ingest.cache.contents(filename)

A converter that just reads the entire contents of a file.

Parameters

num_bytes – The number of bytes to read.

Returns

A converter function.

beancount.ingest.cache.get_file(filename)

Create or reuse a globally registered instance of a FileMemo.

Note: the FileMemo objects’ lifetimes are reused for the duration of the process. This is usually the intended behavior. Always create them by calling this constructor.

Parameters

filename – A path string, the absolute name of the file whose memo to create.

Returns

A FileMemo instance.

beancount.ingest.cache.head(num_bytes=8192)

A converter that just reads the first bytes of a file.

Parameters

num_bytes – The number of bytes to read.

Returns

A converter function.

beancount.ingest.cache.mimetype(filename)

A converter that computes the MIME type of the file.

Returns

A converter function.

beancount.ingest.extract

Extract script.

Read an import script and a list of downloaded filenames or directories of downloaded files, and for each of those files, extract transactions from it.

beancount.ingest.extract.add_arguments(parser)

Add arguments for the extract command.

beancount.ingest.extract.extract(importer_config, files_or_directories, output, entries=None, options_map=None, mindate=None, ascending=True, detect_duplicates_func=None)

Given an importer configuration, search for files that can be imported in the list of files or directories, run the signature checks on them, and if it succeeds, run the importer on the file.

A list of entries for an existing ledger can be provided in order to perform de-duplication and a minimum date can be provided to filter out old entries.

Parameters
  • importer_config – A list of (regexps, importer) pairs, the configuration.

  • files_or_directories – A list of strings, filenames or directories to be processed.

  • output – A file object, to be written to.

  • entries – A list of directives loaded from the existing file for the newly extracted entries to be merged in.

  • options_map – The options parsed from existing file.

  • mindate – Optional minimum date to output transactions for.

  • ascending – A boolean, true to print entries in ascending order, false if descending is desired.

  • detect_duplicates_func – An optional function which accepts a list of lists of imported entries and a list of entries already existing in the user’s ledger. See function find_duplicate_entries(), which is the default implementation for this.

beancount.ingest.extract.extract_from_file(filename, importer, existing_entries=None, min_date=None, allow_none_for_tags_and_links=False)

Import entries from file ‘filename’ with the given matches,

Also cross-check against a list of provided ‘existing_entries’ entries, de-duplicating and possibly auto-categorizing.

Parameters
  • filename – The name of the file to import.

  • importer – An importer object that matched the file.

  • existing_entries – A list of existing entries parsed from a ledger, used to detect duplicates and automatically complete or categorize transactions.

  • min_date – A date before which entries should be ignored. This is useful when an account has a valid check/assert; we could just ignore whatever comes before, if desired.

  • allow_none_for_tags_and_links – A boolean, whether to allow plugins to generate Transaction objects with None as value for the ‘tags’ or ‘links’ attributes.

Returns

A list of new imported entries.

Raises

Exception – If there is an error in the importer’s extract() method.

beancount.ingest.extract.find_duplicate_entries(new_entries_list, existing_entries)

Flag potentially duplicate entries.

Parameters
  • new_entries_list – A list of pairs of (key, lists of imported entries), one for each importer. The key identifies the filename and/or importer that yielded those new entries.

  • existing_entries – A list of previously existing entries from the target ledger.

Returns

A list of lists of modified new entries (like new_entries_list), potentially with modified metadata to indicate those which are duplicated.

beancount.ingest.extract.main()
beancount.ingest.extract.print_extracted_entries(entries, file)

Print a list of entries.

Parameters
  • entries – A list of extracted entries.

  • file – A file object to write to.

beancount.ingest.extract.run(args, _, importers_list, files_or_directories, detect_duplicates_func=None)

Run the subcommand.

beancount.ingest.file

Filing script.

Read an import script and a list of downloaded filenames or directories of downloaded files, and for each of those files, move the file under an account corresponding to the filing directory.

beancount.ingest.file.add_arguments(parser)

Add arguments for the extract command.

beancount.ingest.file.file(importer_config, files_or_directories, destination, dry_run=False, mkdirs=False, overwrite=False, idify=False, logfile=None)

File importable files under a destination directory.

Given an importer configuration object, search for files that can be imported under the given list of files or directories and moved them under the given destination directory with the date computed by the module prepended to the filename. If the date cannot be extracted, use a reasonable default for the date (e.g. the last modified time of the file itself).

If ‘mkdirs’ is True, create the destination directories before moving the files.

Parameters
  • importer_config – A list of importer instances that define the config.

  • files_or_directories – a list of files of directories to walk recursively and hunt for files to import.

  • destination – A string, the root destination directory where the files are to be filed. The files are organized there under a hierarchy mirrorring that of the chart of accounts.

  • dry_run – A flag, if true, don’t actually move the files.

  • mkdirs – A flag, if true, make all the intervening directories; otherwise, fail to move files to non-existing dirs.

  • overwrite – A flag, if true, overwrite an existing destination file.

  • idify – A flag, if true, remove whitespace and funky characters in the destination filename.

  • logfile – A file object to write log entries to, or None, in which case no log is written out.

beancount.ingest.file.file_one_file(filename, importers, destination, idify=False, logfile=None)

Move a single filename using its matched importers.

Parameters
  • filename – A string, the name of the downloaded file to be processed.

  • importers – A list of importer instances that handle this file.

  • destination – A string, the root destination directory where the files are to be filed. The files are organized there under a hierarchy mirrorring that of the chart of accounts.

  • idify – A flag, if true, remove whitespace and funky characters in the destination filename.

  • logfile – A file object to write log entries to, or None, in which case no log is written out.

Returns

The full new destination filename on success, and None if there was an error.

beancount.ingest.file.main()
beancount.ingest.file.move_xdev_file(src_filename, dst_filename, mkdirs=False)

Move a file, potentially across devices.

Parameters
  • src_filename – A string, the name of the file to copy.

  • dst_filename – A string, where to copy the file.

  • mkdirs – A flag, true if we should create a non-existing destination directory.

beancount.ingest.file.run(args, parser, importers_list, files_or_directories, detect_duplicates_func=None)

Run the subcommand.

beancount.ingest.identify

Identify script.

Read an import script and a list of downloaded filenames or directories of 2downloaded files, and for each of those files, identify which importer it should be associated with.

beancount.ingest.identify.add_arguments(parser)

Add arguments for the identify command.

beancount.ingest.identify.find_imports(importer_config, files_or_directories, logfile=None)

Given an importer configuration, search for files that can be imported in the list of files or directories, run the signature checks on them and return a list of (filename, importers), where ‘importers’ is a list of importers that matched the file.

Parameters
  • importer_config – a list of importer instances that define the config.

  • files_or_directories – a list of files of directories to walk recursively and hunt for files to import.

  • logfile – A file object to write log entries to, or None, in which case no log is written out.

Yields

Triples of filename found, textified contents of the file, and list of importers matching this file.

beancount.ingest.identify.identify(importers_list, files_or_directories)

Run the identification loop.

Parameters
  • importers_list – A list of importer instances.

  • files_or_directories – A list of strings, files or directories.

beancount.ingest.identify.main()
beancount.ingest.identify.run(_, __, importers_list, files_or_directories, detect_duplicates_func=None)

Run the subcommand.

beancount.ingest.importer

Importer protocol.

All importers must comply with this interface and implement at least some of its methods. A configuration consists in a simple list of such importer instances. The importer processes run through the importers, calling some of its methods in order to identify, extract and file the downloaded files.

Each of the methods accept a cache.FileMemo object which has a ‘name’ attribute with the filename to process, but which also provides a place to cache conversions. Use its convert() method whenever possible to avoid carrying out the same conversion multiple times. See beancount.ingest.cache for more details.

Synopsis:

name(): Return a unique identifier for the importer instance. identify(): Return true if the identifier is able to process the file. extract(): Extract directives from a file’s contents and return of list of entries. file_account(): Return an account name associated with the given file for this importer. file_date(): Return a date associated with the downloaded file (e.g., the statement date). file_name(): Return a cleaned up filename for storage (optional).

Just to be clear: Although this importer will not raise NotImplementedError exceptions (it returns default values for each method), you NEED to derive from it in order to do anything meaningful. Simply instantiating this importer will not match not provide any useful information. It just defines the protocol for all importers.

class beancount.ingest.importer.ImporterProtocol

Interface that all source importers need to comply with.

FLAG = '*'
extract(file, existing_entries=None)

Extract transactions from a file.

If the importer would like to flag a returned transaction as a known duplicate, it may opt to set the special flag “__duplicate__” to True, and the transaction should be treated as a duplicate by the extraction code. This is a way to let the importer use particular information about previously imported transactions in order to flag them as duplicates. For example, if an importer has a way to get a persistent unique id for each of the imported transactions. (See this discussion for context: https://groups.google.com/d/msg/beancount/0iV-ipBJb8g/-uk4wsH2AgAJ)

Parameters
  • file – A cache.FileMemo instance.

  • existing_entries – An optional list of existing directives loaded from the ledger which is intended to contain the extracted entries. This is only provided if the user provides them via a flag in the extractor program.

Returns

A list of new, imported directives (usually mostly Transactions) extracted from the file.

file_account(file)

Return an account associated with the given file.

Note: If you don’t implement this method you won’t be able to move the files into its preservation hierarchy; the bean-file command won’t work.

Also, normally the returned account is not a function of the input file–just of the importer–but it is provided anyhow.

Parameters

file – A cache.FileMemo instance.

Returns

The name of the account that corresponds to this importer.

file_date(file)

Attempt to obtain a date that corresponds to the given file.

Parameters

file – A cache.FileMemo instance.

Returns

A date object, if successful, or None if a date could not be extracted. (If no date is returned, the file creation time is used. This is the default.)

file_name(file)

A filter that optionally renames a file before filing.

This is used to make tidy filenames for filed/stored document files. If you don’t implement this and return None, the same filename is used. Note that if you return a filename, a simple, RELATIVE filename must be returned, not an absolute filename.

Parameters

file – A cache.FileMemo instance.

Returns

The tidied up, new filename to store it as.

identify(file)

Return true if this importer matches the given file.

Parameters

file – A cache.FileMemo instance.

Returns

A boolean, true if this importer can handle this file.

name()

Return a unique id/name for this importer.

Returns

A string which uniquely identifies this importer.

beancount.ingest.regression

Support for implementing regression tests on sample files using nose.

NOTE: This itself is not a regression test. It’s a library used to create regression tests for your importers. Use it like this in your own importer code:

def test():
importer = Importer([], {

‘FILE’ : ‘Assets:US:MyBank:Main’,

}) yield from regression.compare_sample_files(importer, __file__)

WARNING: This is deprecated. Nose itself has been deprecated for a while and Beancount is now using only pytest. Ignore this and use beancount.ingest.regression_ptest instead.

class beancount.ingest.regression.ImportFileTestCase(importer)

Base class for importer tests that compare output to an expected output text.

maxDiff = None
test_expect_extract(filename, msg)

Extract entries from a test file and compare against expected output.

If an expected file (as <filename>.extract) is not present, we issue a warning. Missing expected files can be written out by removing them before running the tests.

Parameters

filename – A string, the name of the file to import using self.importer.

Raises

AssertionError – If the contents differ from the expected file.

test_expect_file_date(filename, msg)

Compute the imported file date and compare to an expected output.

If an expected file (as <filename>.file_date) is not present, we issue a warning. Missing expected files can be written out by removing them before running the tests.

Parameters

filename – A string, the name of the file to import using self.importer.

Raises

AssertionError – If the contents differ from the expected file.

test_expect_file_name(filename, msg)

Compute the imported file name and compare to an expected output.

If an expected file (as <filename>.file_name) is not present, we issue a warning. Missing expected files can be written out by removing them before running the tests.

Parameters

filename – A string, the name of the file to import using self.importer.

Raises

AssertionError – If the contents differ from the expected file.

test_expect_identify(filename, msg)

Attempt to identify a file and expect results to be true.

Parameters

filename – A string, the name of the file to import using self.importer.

Raises

AssertionError – If the contents differ from the expected file.

exception beancount.ingest.regression.ToolNotInstalled

An error to be used by converters when necessary software isn’t there.

Raising this exception from your converter code when the tool is not installed will make the tests defined in this file skipped instead of failing. This will happen when you test your converters on different computers and/or platforms.

beancount.ingest.regression.compare_sample_files(importer, directory=None, ignore_cls=None)

Compare the sample files under a directory.

Parameters
  • importer – An instance of an Importer.

  • directory – A string, the directory to scour for sample files or a filename in that directory. If a directory is not provided, the directory of the file from which the importer class is defined is used.

  • ignore_cls – An optional base class of the importer whose methods should not trigger the addition of a test. For example, if you are deriving from a base class which is already well-tested, you may not want to have a regression test case generated for those methods. This was used to ignore methods provided from a common backwards compatibility support class.

Yields

Generated tests as per nose’s requirements (a callable and arguments for it).

beancount.ingest.regression.find_input_files(directory)

Find the input files in the module where the class is defined.

Parameters

directory – A string, the path to a root directory to check for.

Yields

Strings, the absolute filenames of sample input and expected files.

beancount.ingest.regression_pytest

Support for implementing regression tests on sample files using pytest.

This module provides definitions for testing a custom importer against a set of existing downloaded files, running the various importer interface methods on it, and comparing the output to an expected text file. (Expected test files can be auto-generated using the –generate option). You use it like this:

from beancount.ingest import regression_pytest … import mymodule …

# Create your importer instance used for testing. importer = mymodule.Importer(…)

# Select a directory where your test files are to be located. directory = …

# Create a test case using the base in this class.

@regression_pytest.with_importer(importer) @regression_pytest.with_testdir(directory) class TestImporter(regtest.ImporterTestBase):

pass

Also, to add the –generate option to ‘pytest’, you must create a conftest.py somewhere in one of the roots above your importers with this module as a plugin:

pytest_plugins = “beancount.ingest.regression_pytest”

See beancount/example/ingest for a full working example.

How to invoke the tests:

Via pytest. First run your test with the –generate option to generate all the expected files. Then inspect them visually for correctness. Finally, check them in to preserve them. You should be able to regress against those correct outputs in the future. Use version control to your advantage to visualize the differences.

class beancount.ingest.regression_pytest.ImporterTestBase
test_extract(importer, file, pytestconfig)

Extract entries from a test file and compare against expected output.

test_file_account(importer, file, pytestconfig)

Compute the selected filing account and compare to an expected output.

test_file_date(importer, file, pytestconfig)

Compute the imported file date and compare to an expected output.

test_file_name(importer, file, pytestconfig)

Compute the imported file name and compare to an expected output.

test_identify(importer, file)

Attempt to identify a file and expect results to be true.

This method does not need to check against an existing expect file. It is just assumed it should return True if your test is setup well (the importer should always identify the test file).

beancount.ingest.regression_pytest.compare_contents_or_generate(actual_string, expect_fn, generate)

Compare a string to the contents of an expect file.

Assert if different; auto-generate otherwise.

Parameters
  • actual_string – The expected string contents.

  • expect_fn – The filename whose contents to read and compare against.

  • generate – A boolean, true if we are to generate the tests.

beancount.ingest.regression_pytest.find_input_files(directory)

Find the input files in the module where the class is defined.

Parameters

directory – A string, the path to a root directory to check for.

Yields

Strings, the absolute filenames of sample input and expected files.

beancount.ingest.regression_pytest.pytest_addoption(parser)

Add an option to generate the expected files for the tests.

beancount.ingest.regression_pytest.with_importer(importer)

Parametrizing fixture that provides the importer to test.

beancount.ingest.regression_pytest.with_testdir(directory)

Parametrizing fixture that provides files from a directory.

beancount.ingest.scripts_utils

Common front-end to all ingestion tools.

class beancount.ingest.scripts_utils.TestScriptsBase(methodName='runTest')
FILES = {'Downloads/Subdir/bank.csv': 'DATE,TRANSACTION ID,DESCRIPTION,QUANTITY,SYMBOL,PRICE,COMMISSION,AMOUNT,NET CASH BALANCE,REG FEE,SHORT-TERM RDM FEE,FUND REDEMPTION FEE, DEFERRED SALES CHARGE\n07/02/2013,10223506553,ORDINARY DIVIDEND (HDV),,HDV,,,31.04,31.04,,,,\n07/02/2013,10224851005,MONEY MARKET PURCHASE,,,,,-31.04,0.00,,,,\n07/02/2013,10224851017,MONEY MARKET PURCHASE (MMDA10),31.04,MMDA10,,,0.00,0.00,,,,\n09/30/2013,10561187188,ORDINARY DIVIDEND (HDV),,HDV,,,31.19,31.19,,,,\n09/30/2013,10563719172,MONEY MARKET PURCHASE,,,,,-31.19,0.00,,,,\n09/30/2013,10563719198,MONEY MARKET PURCHASE (MMDA10),31.19,MMDA10,,,0.00,0.00,,,,\n***END OF FILE***\n', 'Downloads/Subdir/readme.txt': 'Some random text file.\n', 'Downloads/ofxdownload.ofx': "OFXHEADER:100\nDATA:OFXSGML\nVERSION:102\nSECURITY:NONE\nENCODING:USASCII\nCHARSET:1252\nCOMPRESSION:NONE\nOLDFILEUID:NONE\nNEWFILEUID:NONE\n\n<OFX><SIGNONMSGSRSV1><SONRS><STATUS><CODE>0<SEVERITY>INFO<MESSAGE>Login successful</STATUS><DTSERVER>20131217204544.559[-7:MST]<LANGUAGE>ENG<FI><ORG>OFCT<FID>3011</FI><ORIGIN.ID>FMPWeb<START.TIME>20131217204544</SONRS></SIGNONMSGSRSV1><CREDITCARDMSGSRSV1><CCSTMTTRNRS><TRNUID>0<STATUS><CODE>0<SEVERITY>INFO</STATUS><CCSTMTRS><CURDEF>USD<CCACCTFROM><ACCTID>092243467384967<DOWNLOAD.FLAG>false<DOWNLOAD.TYPE>downloadSince<AMEX.BASICACCT>090341355486768<DAYS.SINCE>true<AMEX.ROLE>B<AMEX.UNIVID>iHJPMCPVMUZESUMTMIASKPSHBZOJZQMZ</CCACCTFROM><BANKTRANLIST><DTSTART>20131213050000.000[-7:MST]<DTEND>20131217050000.000[-7:MST]<STMTTRN><TRNTYPE>DEBIT<DTPOSTED>20131206000000.000[-7:MST]<DTUSER>20131206000000.000[-7:MST]<TRNAMT>-75<FITID>132124581254980455<REFNUM>140941621247980353<NAME>Cvzndybfhlgsy Kbptkt010-743-2492<MEMO>87278814438304-062-9392</STMTTRN><STMTTRN><TRNTYPE>DEBIT<DTPOSTED>20131208000000.000[-7:MST]<DTUSER>20131207000000.000[-7:MST]<TRNAMT>-29.5<FITID>139251640671720832<REFNUM>411944529384600439<NAME>YJTEJSYC JXJ 38137 80223112202<MEMO>841814901332133213240</STMTTRN><STMTTRN><TRNTYPE>DEBIT<DTPOSTED>20131208000000.000[-7:MST]<DTUSER>20131208000000.000[-7:MST]<TRNAMT>-96.73<FITID>518223640481029842<REFNUM>349922421383839452<NAME>TEMSRB TQBHHWZO CZYKCGDX.LAR/CD<MEMO>BVR5D49Q7S3 IWOUXSFCCIZ</STMTTRN><STMTTRN><TRNTYPE>DEBIT<DTPOSTED>20131209000000.000[-7:MST]<DTUSER>20131208000000.000[-7:MST]<TRNAMT>-45.49<FITID>410313240598642566<REFNUM>201153532386740368<NAME>JWNNJ VPVHHV - HWKZIGH QXWR <MEMO>35905 UJGZDQD IUTFL</STMTTRN><STMTTRN><TRNTYPE>DEBIT<DTPOSTED>20131209000000.000[-7:MST]<DTUSER>20131208000000.000[-7:MST]<TRNAMT>-01.7<FITID>118954331459601590<REFNUM>112944250496740196<NAME>ZPIKRWGV EBQVUE 4521XJT AYDM <MEMO>227092130 2924489277</STMTTRN><STMTTRN><TRNTYPE>DEBIT<DTPOSTED>20131209000000.000[-7:MST]<DTUSER>20131208000000.000[-7:MST]<TRNAMT>-30.9<FITID>118335238578609388<REFNUM>542324610398801568<NAME>SBMUZYXG XRB 98038 03324302420<MEMO>853700608200014392232</STMTTRN><STMTTRN><TRNTYPE>DEBIT<DTPOSTED>20131209000000.000[-7:MST]<DTUSER>20131206000000.000[-7:MST]<TRNAMT>-39.72<FITID>141044448255701269<REFNUM>230245232285603469<NAME>SITEH NIHOX HTYZBAWP392-139-734<MEMO>84165624260 423-151-8227</STMTTRN><STMTTRN><TRNTYPE>DEBIT<DTPOSTED>20131209000000.000[-7:MST]<DTUSER>20131208000000.000[-7:MST]<TRNAMT>-22.09<FITID>111921351569432591<REFNUM>101153620388713392<NAME>HGJQJEOB PCQ 08418 1KVO VVDJ <MEMO>84408170244 1144012409</STMTTRN><STMTTRN><TRNTYPE>DEBIT<DTPOSTED>20131211000000.000[-7:MST]<DTUSER>20131210000000.000[-7:MST]<TRNAMT>-22.14<FITID>548935642111458816<REFNUM>328141439181292814<NAME>LILFN DVRIFI - LJBPFDT HYSF <MEMO>32793 IBTSNAF UDOMK</STMTTRN><STMTTRN><TRNTYPE>DEBIT<DTPOSTED>20131211000000.000[-7:MST]<DTUSER>20131210000000.000[-7:MST]<TRNAMT>-12.67<FITID>330054241010450007<REFNUM>342912468199362629<NAME>PEHKXNPZ PNW 91458 21015119128<MEMO>823891919293222482430</STMTTRN><STMTTRN><TRNTYPE>DEBIT<DTPOSTED>20131214000000.000[-7:MST]<DTUSER>20131212000000.000[-7:MST]<TRNAMT>-31<FITID>402252668937162476<REFNUM>448222302958184678<NAME>YHVWV NNPYW HRQZDUOJ201-251-533<MEMO>65093823538 334-432-6338</STMTTRN><STMTTRN><TRNTYPE>DEBIT<DTPOSTED>20131214000000.000[-7:MST]<DTUSER>20131213000000.000[-7:MST]<TRNAMT>-64.25<FITID>542124402167304547<REFNUM>222333308839462735<NAME>MRKW'G #814 HYLDQF OVNN PSCS <MEMO>89318083925 WOQO'G</STMTTRN><STMTTRN><TRNTYPE>DEBIT<DTPOSTED>20131216000000.000[-7:MST]<DTUSER>20131215000000.000[-7:MST]<TRNAMT>-42.01<FITID>210943512087240724<REFNUM>501152389299358014<NAME>ICPBVFY #2321 929883SXU QRKN <MEMO>12054513980 3062749128</STMTTRN><STMTTRN><TRNTYPE>DEBIT<DTPOSTED>20131216000000.000[-7:MST]<DTUSER>20131215000000.000[-7:MST]<TRNAMT>-5<FITID>310215309277250199<REFNUM>448945618978219897<NAME>LON BXBZA 409 QEPNUAMCO WDOD <MEMO>31345520940 4169979285</STMTTRN><STMTTRN><TRNTYPE>DEBIT<DTPOSTED>20131216000000.000[-7:MST]<DTUSER>20131215000000.000[-7:MST]<TRNAMT>-0.69<FITID>430314310166238818<REFNUM>109924680968112897<NAME>CFXUF LGTXVL - VXCJCNI EBUH <MEMO>308496 NSIWFWL RHSFP</STMTTRN><STMTTRN><TRNTYPE>DEBIT<DTPOSTED>20131216000000.000[-7:MST]<DTUSER>20131214000000.000[-7:MST]<TRNAMT>-34.43<FITID>330252321978428019<REFNUM>511914698876210997<NAME>DGCV GXD MJKRYD JLC IYLM <MEMO>67935504527 134-064-6852</STMTTRN><STMTTRN><TRNTYPE>DEBIT<DTPOSTED>20131215000000.000[-7:MST]<DTUSER>20131214000000.000[-7:MST]<TRNAMT>-24<FITID>149941491182603315<REFNUM>429354298891572407<NAME>UPTMFSAD DSD 12354 14003323410<MEMO>963402919381906089541</STMTTRN></BANKTRANLIST><LEDGERBAL><BALAMT>-3609.07<DTASOF>20131217050000.000[-7:MST]</LEDGERBAL><CYCLECUT.INDICATOR>false<PURGE.INDICATOR>false<INTL.INDICATOR>false</CCSTMTRS></CCSTMTTRNRS></CREDITCARDMSGSRSV1></OFX>\n", 'test.import': "#!/usr/bin/env python3\nfrom beancount.ingest import scripts_utils\n\nCONFIG = [\n scripts_utils._TestFileImporter(\n 'mybank-checking-ofx', 'Assets:Checking',\n 'application/x-ofx', '<FID>3011'),\n scripts_utils._TestFileImporter(\n 'mybank-credit-csv', 'Liabilities:CreditCard',\n 'text/csv', '.*DATE,TRANSACTION ID,DESCRIPTION,QUANTITY,SYMBOL'),\n]\n", 'testimport.py': "#!/usr/bin/env python3\nfrom beancount.ingest import scripts_utils\n\nCONFIG = [\n scripts_utils._TestFileImporter(\n 'mybank-checking-ofx', 'Assets:Checking',\n 'application/x-ofx', '<FID>3011'),\n scripts_utils._TestFileImporter(\n 'mybank-credit-csv', 'Liabilities:CreditCard',\n 'text/csv', '.*DATE,TRANSACTION ID,DESCRIPTION,QUANTITY,SYMBOL'),\n]\nscripts_utils.ingest(CONFIG)\n"}
setUp()

Hook method for setting up the test fixture before exercising it.

beancount.ingest.scripts_utils.create_legacy_arguments_parser(description, run_func)

Create an arguments parser for all the ingestion bean-tools.

Parameters
  • description (str) – The program description string.

  • func – A callable function to run the particular command.

Returns

An argparse.Namespace instance with the rest of arguments in ‘rest’.

beancount.ingest.scripts_utils.ingest(importers_list, detect_duplicates_func=None)

Driver function that calls all the ingestion tools.

Put a call to this function at the end of your importer configuration to make your import script; this should be its main function, like this:

from beancount.ingest.scripts_utils import ingest my_importers = [ … ] ingest(my_importers)

This more explicit way of invoking the ingestion is now the preferred way to invoke the various tools, and replaces calling the bean-identify, bean-extract, bean-file tools with a –config argument. When you call the import script itself (as as program) it will parse the arguments, expecting a subcommand (‘identify’, ‘extract’ or ‘file’) and corresponding subcommand-specific arguments.

Here you can override some importer values, such as installing a custom duplicate finding hook, and eventually more. Note that this newer invocation method is optional and if it is not present, a call to ingest() is generated implicitly, and it functions as it used to. Future configurable customization of the ingestion process will be implemented by inserting new arguments to this function, this is the motivation behind doing this.

Note that invocation by the three bean-* ingestion tools is still supported, and calling ingest() explicitly from your import configuration file will not break these tools either, if you invoke them on it; the values you provide to this function will be used by those tools.

Parameters
  • importers_list – A list of importer instances. This is used as a chain-of-responsibility, called on each file.

  • detect_duplicates_func – An optional function which accepts a list of lists of imported entries and a list of entries already existing in the user’s ledger. See function find_duplicate_entries(), which is the default implementation for this.

beancount.ingest.scripts_utils.run_import_script_and_ingest(parser, argv=None, importers_attr_name='CONFIG')

Run the import script and optionally call ingest().

This path is only called when trampolined by one of the bean-* ingestion tools.

Parameters
  • parser – The parser instance, used only to report errors.

  • importers_attr_name – The name of the special attribute in the module which defines the importers list.

Returns

An execution return code.

beancount.ingest.scripts_utils.trampoline_to_ingest(module)

Parse arguments for bean tool, import config script and ingest.

This function is called by the three bean-* tools to support the older import files, which only required a CONFIG object to be defined in them.

Parameters

module – One of the identify, extract or file module objects.

Returns

An execution return code.

beancount.ingest.similar

Identify similar entries.

This can be used during import in order to identify and flag duplicate entries.

class beancount.ingest.similar.SimilarityComparator(max_date_delta=None)

Similarity comparator of transactions.

This comparator needs to be able to handle Transaction instances which are incomplete on one side, which have slightly different dates, or potentially slightly different numbers.

EPSILON = Decimal('0.05')
beancount.ingest.similar.amounts_map(entry)

Compute a mapping of (account, currency) -> Decimal balances.

Parameters

entry – A Transaction instance.

Returns

A dict of account -> Amount balance.

beancount.ingest.similar.find_similar_entries(entries, source_entries, comparator=None, window_days=2)

Find which entries from a list are potential duplicates of a set.

Note: If there are multiple entries from ‘source_entries’ matching an entry in ‘entries’, only the first match is returned. Note that this function could in theory decide to merge some of the imported entries with each other.

Parameters
  • entries – The list of entries to classify as duplicate or note.

  • source_entries – The list of entries against which to match. This is the previous, or existing set of entries to compare against. This may be null or empty.

  • comparator – A functor used to establish the similarity of two entries.

  • window_days – The number of days (inclusive) before or after to scan the entries to classify against.

Returns

A list of pairs of entries (entry, source_entry) where entry is from ‘entries’ and is deemed to be a duplicate of source_entry, from ‘source_entries’.