beancount.ingest

Code to help identify, extract, and file external downloads.

This package contains code to help you build importers and drive the process of identifying which importer to run on an externally downloaded file, extract transactions from them and file away these files under a clean and rigidly named hierarchy for preservation.

`beancount.ingest.cache` 

A file wrapper which acts as a cache for on-demand evaluation of conversions.

This object is used in lieu of a file in order to allow the various importers to reuse each others' conversion results. Converting file contents, e.g. PDF to text, can be expensive.

`beancount.ingest.cache.contents(filename)` 

A converter that just reads the entire contents of a file.

Parameters:	num_bytes – The number of bytes to read.

Returns:	A converter function.

Source code in beancount/ingest/cache.py

def contents(filename):
    """A converter that just reads the entire contents of a file.

    Args:
      num_bytes: The number of bytes to read.
    Returns:
      A converter function.
    """
    # Attempt to detect the input encoding automatically, using chardet and a
    # decent amount of input.
    rawdata = open(filename, 'rb').read(HEAD_DETECT_MAX_BYTES)
    detected = chardet.detect(rawdata)
    encoding = detected['encoding']

    # Ignore encoding errors for reading the contents because input files
    # routinely break this assumption.
    errors = 'ignore'

    with open(filename, encoding=encoding, errors=errors) as file:
        return file.read()

`beancount.ingest.cache.get_file(filename)` 

Create or reuse a globally registered instance of a FileMemo.

Note: the FileMemo objects' lifetimes are reused for the duration of the process. This is usually the intended behavior. Always create them by calling this constructor.

Parameters:	filename – A path string, the absolute name of the file whose memo to create.

Returns:	A FileMemo instance.

Source code in beancount/ingest/cache.py

def get_file(filename):
    """Create or reuse a globally registered instance of a FileMemo.

    Note: the FileMemo objects' lifetimes are reused for the duration of the
    process. This is usually the intended behavior. Always create them by
    calling this constructor.

    Args:
      filename: A path string, the absolute name of the file whose memo to create.
    Returns:
      A FileMemo instance.

    """
    assert path.isabs(filename), (
        "Path should be absolute in order to guarantee a single call.")
    return _CACHE[filename]

`beancount.ingest.cache.head(num_bytes=8192)` 

A converter that just reads the first bytes of a file.

Parameters:	num_bytes – The number of bytes to read.

Returns:	A converter function.

Source code in beancount/ingest/cache.py

def head(num_bytes=8192):
    """A converter that just reads the first bytes of a file.

    Args:
      num_bytes: The number of bytes to read.
    Returns:
      A converter function.
    """
    def head_reader(filename):
        with open(filename, 'rb') as file:
            rawdata = file.read(num_bytes)
            detected = chardet.detect(rawdata)
            encoding = detected['encoding']
            return rawdata.decode(encoding)
    return head_reader

`beancount.ingest.cache.mimetype(filename)` 

A converter that computes the MIME type of the file.

Returns:	A converter function.

Source code in beancount/ingest/cache.py

def mimetype(filename):
    """A converter that computes the MIME type of the file.

    Returns:
      A converter function.
    """
    return file_type.guess_file_type(filename)

`beancount.ingest.extract` 

Extract script.

Read an import script and a list of downloaded filenames or directories of downloaded files, and for each of those files, extract transactions from it.

`beancount.ingest.extract.add_arguments(parser)` 

Add arguments for the extract command.

Source code in beancount/ingest/extract.py

def add_arguments(parser):
    """Add arguments for the extract command."""

    parser.add_argument('-e', '-f', '--existing', '--previous', metavar='BEANCOUNT_FILE',
                        default=None,
                        help=('Beancount file or existing entries for de-duplication '
                              '(optional)'))

    parser.add_argument('-r', '--reverse', '--descending',
                        action='store_const', dest='ascending',
                        default=True, const=False,
                        help='Write out the entries in descending order')

`beancount.ingest.extract.extract(importer_config, files_or_directories, output, entries=None, options_map=None, mindate=None, ascending=True, hooks=None)` 

Given an importer configuration, search for files that can be imported in the list of files or directories, run the signature checks on them, and if it succeeds, run the importer on the file.

A list of entries for an existing ledger can be provided in order to perform de-duplication and a minimum date can be provided to filter out old entries.

Parameters:

importer_config – A list of (regexps, importer) pairs, the configuration.
files_or_directories – A list of strings, filenames or directories to be processed.
output – A file object, to be written to.
entries – A list of directives loaded from the existing file for the newly extracted entries to be merged in.
options_map – The options parsed from existing file.
mindate – Optional minimum date to output transactions for.
ascending – A boolean, true to print entries in ascending order, false if descending is desired.
hooks – An optional list of hook functions to apply to the list of extract (filename, entries) pairs, in order. If not specified, find_duplicate_entries() is used, automatically.

Source code in beancount/ingest/extract.py

def extract(importer_config,
            files_or_directories,
            output,
            entries=None,
            options_map=None,
            mindate=None,
            ascending=True,
            hooks=None):
    """Given an importer configuration, search for files that can be imported in the
    list of files or directories, run the signature checks on them, and if it
    succeeds, run the importer on the file.

    A list of entries for an existing ledger can be provided in order to perform
    de-duplication and a minimum date can be provided to filter out old entries.

    Args:
      importer_config: A list of (regexps, importer) pairs, the configuration.
      files_or_directories: A list of strings, filenames or directories to be processed.
      output: A file object, to be written to.
      entries: A list of directives loaded from the existing file for the newly
        extracted entries to be merged in.
      options_map: The options parsed from existing file.
      mindate: Optional minimum date to output transactions for.
      ascending: A boolean, true to print entries in ascending order, false if
        descending is desired.
      hooks: An optional list of hook functions to apply to the list of extract
        (filename, entries) pairs, in order. If not specified, find_duplicate_entries()
        is used, automatically.
    """
    allow_none_for_tags_and_links = (
        options_map and options_map["allow_deprecated_none_for_tags_and_links"])

    # Run all the importers and gather their result sets.
    new_entries_list = []
    for filename, importers in identify.find_imports(importer_config,
                                                     files_or_directories):
        for importer in importers:
            # Import and process the file.
            try:
                new_entries = extract_from_file(
                    filename,
                    importer,
                    existing_entries=entries,
                    min_date=mindate,
                    allow_none_for_tags_and_links=allow_none_for_tags_and_links)
                new_entries_list.append((filename, new_entries))
            except Exception as exc:
                logging.exception("Importer %s.extract() raised an unexpected error: %s",
                                  importer.name(), exc)
                continue

    # Find potential duplicate entries in the result sets, either against the
    # list of existing ones, or against each other. A single call to this
    # function is made on purpose, so that the function be able to merge
    # entries.
    if hooks is None:
        hooks = [find_duplicate_entries]
    for hook_fn in hooks:
        new_entries_list = hook_fn(new_entries_list, entries)
    assert isinstance(new_entries_list, list)
    assert all(isinstance(new_entries, tuple) for new_entries in new_entries_list)
    assert all(isinstance(new_entries[0], str) for new_entries in new_entries_list)
    assert all(isinstance(new_entries[1], list) for new_entries in new_entries_list)

    # Print out the results.
    output.write(HEADER)
    for key, new_entries in new_entries_list:
        output.write(identify.SECTION.format(key))
        output.write('\n')
        if not ascending:
            new_entries.reverse()
        print_extracted_entries(new_entries, output)

`beancount.ingest.extract.extract_from_file(filename, importer, existing_entries=None, min_date=None, allow_none_for_tags_and_links=False)` 

Import entries from file 'filename' with the given matches,

Also cross-check against a list of provided 'existing_entries' entries, de-duplicating and possibly auto-categorizing.

Parameters:

filename – The name of the file to import.
importer – An importer object that matched the file.
existing_entries – A list of existing entries parsed from a ledger, used to detect duplicates and automatically complete or categorize transactions.
min_date – A date before which entries should be ignored. This is useful when an account has a valid check/assert; we could just ignore whatever comes before, if desired.
allow_none_for_tags_and_links – A boolean, whether to allow plugins to generate Transaction objects with None as value for the 'tags' or 'links' attributes.

Returns:	A list of new imported entries.

Exceptions:	`Exception` – If there is an error in the importer's extract() method.

Source code in beancount/ingest/extract.py

def extract_from_file(filename, importer,
                      existing_entries=None,
                      min_date=None,
                      allow_none_for_tags_and_links=False):
    """Import entries from file 'filename' with the given matches,

    Also cross-check against a list of provided 'existing_entries' entries,
    de-duplicating and possibly auto-categorizing.

    Args:
      filename: The name of the file to import.
      importer: An importer object that matched the file.
      existing_entries: A list of existing entries parsed from a ledger, used to
        detect duplicates and automatically complete or categorize transactions.
      min_date: A date before which entries should be ignored. This is useful
        when an account has a valid check/assert; we could just ignore whatever
        comes before, if desired.
      allow_none_for_tags_and_links: A boolean, whether to allow plugins to
        generate Transaction objects with None as value for the 'tags' or 'links'
        attributes.
    Returns:
      A list of new imported entries.
    Raises:
      Exception: If there is an error in the importer's extract() method.
    """
    # Extract the entries.
    file = cache.get_file(filename)

    # Note: Let the exception through on purpose. This makes developing
    # importers much easier by rendering the details of the exceptions.
    #
    # Note: For legacy support, support calling without the existing entries.
    kwargs = {}
    if 'existing_entries' in inspect.signature(importer.extract).parameters:
        kwargs['existing_entries'] = existing_entries
    new_entries = importer.extract(file, **kwargs)
    if not new_entries:
        return []

    # Make sure the newly imported entries are sorted; don't trust the importer.
    new_entries.sort(key=data.entry_sortkey)

    # Ensure that the entries are typed correctly.
    for entry in new_entries:
        data.sanity_check_types(entry, allow_none_for_tags_and_links)

    # Filter out entries with dates before 'min_date'.
    if min_date:
        new_entries = list(itertools.dropwhile(lambda x: x.date < min_date,
                                               new_entries))

    return new_entries

`beancount.ingest.extract.find_duplicate_entries(new_entries_list, existing_entries)` 

Flag potentially duplicate entries.

Parameters:	new_entries_list – A list of pairs of (key, lists of imported entries), one for each importer. The key identifies the filename and/or importer that yielded those new entries. existing_entries – A list of previously existing entries from the target ledger.

Returns:	A list of lists of modified new entries (like new_entries_list), potentially with modified metadata to indicate those which are duplicated.

Source code in beancount/ingest/extract.py

def find_duplicate_entries(new_entries_list, existing_entries):
    """Flag potentially duplicate entries.

    Args:
      new_entries_list: A list of pairs of (key, lists of imported entries), one
        for each importer. The key identifies the filename and/or importer that
        yielded those new entries.
      existing_entries: A list of previously existing entries from the target
        ledger.
    Returns:
      A list of lists of modified new entries (like new_entries_list),
      potentially with modified metadata to indicate those which are duplicated.
    """
    mod_entries_list = []
    for key, new_entries in new_entries_list:
        # Find similar entries against the existing ledger only.
        duplicate_pairs = similar.find_similar_entries(new_entries, existing_entries)

        # Add a metadata marker to the extracted entries for duplicates.
        duplicate_set = set(id(entry) for entry, _ in duplicate_pairs)
        mod_entries = []
        for entry in new_entries:
            if id(entry) in duplicate_set:
                marked_meta = entry.meta.copy()
                marked_meta[DUPLICATE_META] = True
                entry = entry._replace(meta=marked_meta)
            mod_entries.append(entry)
        mod_entries_list.append((key, mod_entries))
    return mod_entries_list

`beancount.ingest.extract.print_extracted_entries(entries, file)` 

Print a list of entries.

Parameters:	entries – A list of extracted entries. file – A file object to write to.

Source code in beancount/ingest/extract.py

def print_extracted_entries(entries, file):
    """Print a list of entries.

    Args:
      entries: A list of extracted entries.
      file: A file object to write to.
    """
    # Print the filename and which modules matched.
    # pylint: disable=invalid-name
    pr = lambda *args: print(*args, file=file)
    pr('')

    # Print out the entries.
    for entry in entries:
        # Check if this entry is a dup, and if so, comment it out.
        if DUPLICATE_META in entry.meta:
            meta = entry.meta.copy()
            meta.pop(DUPLICATE_META)
            entry = entry._replace(meta=meta)
            entry_string = textwrap.indent(printer.format_entry(entry), '; ')
        else:
            entry_string = printer.format_entry(entry)
        pr(entry_string)

    pr('')

`beancount.ingest.extract.run(args, _, importers_list, files_or_directories, hooks=None)` 

Run the subcommand.

Source code in beancount/ingest/extract.py

def run(args, _, importers_list, files_or_directories, hooks=None):
    """Run the subcommand."""

    # Load the ledger, if one is specified.
    if args.existing:
        entries, _, options_map = loader.load_file(args.existing)
    else:
        entries, options_map = None, None

    extract(importers_list, files_or_directories, sys.stdout,
            entries=entries,
            options_map=options_map,
            mindate=None,
            ascending=args.ascending,
            hooks=hooks)
    return 0

`beancount.ingest.file` 

Filing script.

Read an import script and a list of downloaded filenames or directories of downloaded files, and for each of those files, move the file under an account corresponding to the filing directory.

`beancount.ingest.file.add_arguments(parser)` 

Add arguments for the extract command.

Source code in beancount/ingest/file.py

def add_arguments(parser):
    """Add arguments for the extract command."""

    parser.add_argument('-o', '--output', '--output-dir', '--destination',
                        dest='output_dir', action='store',
                        help="The root of the documents tree to move the files to.")

    parser.add_argument('-n', '--dry-run', action='store_true',
                        help=("Just print where the files would be moved; "
                              "don't actually move them."))

    parser.add_argument('--no-overwrite', dest='overwrite',
                        action='store_false', default=True,
                        help="Don't overwrite destination files with the same name.")

`beancount.ingest.file.file(importer_config, files_or_directories, destination, dry_run=False, mkdirs=False, overwrite=False, idify=False, logfile=None)` 

File importable files under a destination directory.

Given an importer configuration object, search for files that can be imported under the given list of files or directories and moved them under the given destination directory with the date computed by the module prepended to the filename. If the date cannot be extracted, use a reasonable default for the date (e.g. the last modified time of the file itself).

If 'mkdirs' is True, create the destination directories before moving the files.

Parameters:

importer_config – A list of importer instances that define the config.
files_or_directories – a list of files of directories to walk recursively and hunt for files to import.
destination – A string, the root destination directory where the files are to be filed. The files are organized there under a hierarchy mirroring that of the chart of accounts.
dry_run – A flag, if true, don't actually move the files.
mkdirs – A flag, if true, make all the intervening directories; otherwise, fail to move files to non-existing dirs.
overwrite – A flag, if true, overwrite an existing destination file.
idify – A flag, if true, remove whitespace and funky characters in the destination filename.
logfile – A file object to write log entries to, or None, in which case no log is written out.

Source code in beancount/ingest/file.py

def file(importer_config,
         files_or_directories,
         destination,
         dry_run=False,
         mkdirs=False,
         overwrite=False,
         idify=False,
         logfile=None):
    """File importable files under a destination directory.

    Given an importer configuration object, search for files that can be
    imported under the given list of files or directories and moved them under
    the given destination directory with the date computed by the module
    prepended to the filename. If the date cannot be extracted, use a reasonable
    default for the date (e.g. the last modified time of the file itself).

    If 'mkdirs' is True, create the destination directories before moving the
    files.

    Args:
      importer_config: A list of importer instances that define the config.
      files_or_directories: a list of files of directories to walk recursively and
        hunt for files to import.
      destination: A string, the root destination directory where the files are
        to be filed. The files are organized there under a hierarchy mirroring
        that of the chart of accounts.
      dry_run: A flag, if true, don't actually move the files.
      mkdirs: A flag, if true, make all the intervening directories; otherwise,
        fail to move files to non-existing dirs.
      overwrite: A flag, if true, overwrite an existing destination file.
      idify: A flag, if true, remove whitespace and funky characters in the destination
        filename.
      logfile: A file object to write log entries to, or None, in which case no log is
        written out.
    """
    jobs = []
    has_errors = False
    for filename, importers in identify.find_imports(importer_config,
                                                     files_or_directories,
                                                     logfile):
        # If we're debugging, print out the match text.
        # This option is useful when we're building our importer configuration,
        # to figure out which patterns to create as unique signatures.
        if not importers:
            continue

        # Process a single file.
        new_fullname = file_one_file(filename, importers, destination, idify, logfile)
        if new_fullname is None:
            continue

        # Check if the destination directory exists.
        new_dirname = path.dirname(new_fullname)
        if not path.exists(new_dirname) and not mkdirs:
            logging.error("Destination directory '{}' does not exist.".format(new_dirname))
            has_errors = True
            continue

        # Check if the destination file already exists; we don't want to clobber
        # it by accident.
        if not overwrite and path.exists(new_fullname):
            logging.error("Destination file '{}' already exists.".format(new_fullname))
            has_errors = True
            continue

        jobs.append((filename, new_fullname))

    # Check if any two imported files would be colliding in their destination
    # name, before we move anything.
    destmap = collections.defaultdict(list)
    for src, dest in jobs:
        destmap[dest].append(src)
    for dest, sources in destmap.items():
        if len(sources) != 1:
            logging.error("Collision in destination filenames '{}': from {}.".format(
                dest, ", ".join(["'{}'".format(source) for source in sources])))
            has_errors = True

    # If there are any errors, just don't do anything at all. This is a nicer
    # behaviour than moving just *some* files.
    if dry_run or has_errors:
        return

    # Actually carry out the moving job.
    for old_filename, new_filename in jobs:
        move_xdev_file(old_filename, new_filename, mkdirs)

    return jobs

`beancount.ingest.file.file_one_file(filename, importers, destination, idify=False, logfile=None)` 

Move a single filename using its matched importers.

Parameters:

filename – A string, the name of the downloaded file to be processed.
importers – A list of importer instances that handle this file.
destination – A string, the root destination directory where the files are to be filed. The files are organized there under a hierarchy mirroring that of the chart of accounts.
idify – A flag, if true, remove whitespace and funky characters in the destination filename.
logfile – A file object to write log entries to, or None, in which case no log is written out.

Returns:	The full new destination filename on success, and None if there was an error.

Source code in beancount/ingest/file.py

def file_one_file(filename, importers, destination, idify=False, logfile=None):
    """Move a single filename using its matched importers.

    Args:
      filename: A string, the name of the downloaded file to be processed.
      importers: A list of importer instances that handle this file.
      destination: A string, the root destination directory where the files are
        to be filed. The files are organized there under a hierarchy mirroring
        that of the chart of accounts.
      idify: A flag, if true, remove whitespace and funky characters in the destination
        filename.
      logfile: A file object to write log entries to, or None, in which case no log is
        written out.
    Returns:
      The full new destination filename on success, and None if there was an error.
    """
    # Create an object to cache all the conversions between the importers
    # and phases and what-not.
    file = cache.get_file(filename)

    # Get the account corresponding to the file.
    file_accounts = []
    for index, importer in enumerate(importers):
        try:
            account_ = importer.file_account(file)
        except Exception as exc:
            account_ = None
            logging.exception("Importer %s.file_account() raised an unexpected error: %s",
                              importer.name(), exc)
        if account_ is not None:
            file_accounts.append(account_)

    file_accounts_set = set(file_accounts)
    if not file_accounts_set:
        logging.error("No account provided by importers: {}".format(
            ", ".join(imp.name() for imp in importers)))
        return None

    if len(file_accounts_set) > 1:
        logging.warning("Ambiguous accounts from many importers: {}".format(
            ', '.join(file_accounts_set)))
        # Note: Don't exit; select the first matching importer's account.

    file_account = file_accounts.pop(0)

    # Given multiple importers, select the first one that was yielded to
    # obtain the date and process the filename.
    importer = importers[0]

    # Compute the date from the last modified time.
    mtime = path.getmtime(filename)
    mtime_date = datetime.datetime.fromtimestamp(mtime).date()

    # Try to get the file's date by calling a module support function. The
    # module may be able to extract the date from the filename, from the
    # contents of the file itself (e.g. scraping some text from the PDF
    # contents, or grabbing the last line of a CSV file).
    try:
        date = importer.file_date(file)
    except Exception as exc:
        logging.exception("Importer %s.file_date() raised an unexpected error: %s",
                          importer.name(), exc)
        date = None
    if date is None:
        # Fallback on the last modified time of the file.
        date = mtime_date
        date_source = 'mtime'
    else:
        date_source = 'contents'

    # Apply filename renaming, if implemented.
    # Otherwise clean up the filename.
    try:
        clean_filename = importer.file_name(file)

        # Warn the importer implementor if a name is returned and it's an
        # absolute filename.
        if clean_filename and (path.isabs(clean_filename) or os.sep in clean_filename):
            logging.error(("The importer '%s' file_name() method should return a relative "
                           "filename; the filename '%s' is absolute or contains path "
                           "separators"),
                          importer.name(), clean_filename)
    except Exception as exc:
        logging.exception("Importer %s.file_name() raised an unexpected error: %s",
                          importer.name(), exc)
        clean_filename = None
    if clean_filename is None:
        # If no filename has been provided, use the basename.
        clean_filename = path.basename(file.name)
    elif re.match(r'\d\d\d\d-\d\d-\d\d', clean_filename):
        logging.error("The importer '%s' file_name() method should not date the "
                      "returned filename. Implement file_date() instead.")

    # We need a simple filename; remove the directory part if there is one.
    clean_basename = path.basename(clean_filename)

    # Remove whitespace if requested.
    if idify:
        clean_basename = misc_utils.idify(clean_basename)

    # Prepend the date prefix.
    new_filename = '{0:%Y-%m-%d}.{1}'.format(date, clean_basename)

    # Prepend destination directory.
    new_fullname = path.normpath(path.join(destination,
                                           file_account.replace(account.sep, os.sep),
                                           new_filename))

    # Print the filename and which modules matched.
    if logfile is not None:
        logfile.write('Importer:    {}\n'.format(importer.name() if importer else '-'))
        logfile.write('Account:     {}\n'.format(file_account))
        logfile.write('Date:        {} (from {})\n'.format(date, date_source))
        logfile.write('Destination: {}\n'.format(new_fullname))
        logfile.write('\n')

    return new_fullname

`beancount.ingest.file.move_xdev_file(src_filename, dst_filename, mkdirs=False)` 

Move a file, potentially across devices.

Parameters:	src_filename – A string, the name of the file to copy. dst_filename – A string, where to copy the file. mkdirs – A flag, true if we should create a non-existing destination directory.

Source code in beancount/ingest/file.py

def move_xdev_file(src_filename, dst_filename, mkdirs=False):
    """Move a file, potentially across devices.

    Args:
      src_filename: A string, the name of the file to copy.
      dst_filename: A string, where to copy the file.
      mkdirs: A flag, true if we should create a non-existing destination directory.
    """
    # Create missing directory if required.
    dst_dirname = path.dirname(dst_filename)
    if mkdirs:
        if not path.exists(dst_dirname):
            os.makedirs(dst_dirname)
    else:
        if not path.exists(dst_dirname):
            raise OSError("Destination directory '{}' does not exist.".format(dst_dirname))

    # Copy the file to its new name.
    shutil.copyfile(src_filename, dst_filename)

    # Remove the old file. Note that we copy and remove to support
    # cross-device moves, because it's sensible that the destination might
    # be on an encrypted device.
    os.remove(src_filename)

`beancount.ingest.file.run(args, parser, importers_list, files_or_directories, hooks=None)` 

Run the subcommand.

Source code in beancount/ingest/file.py

def run(args, parser, importers_list, files_or_directories, hooks=None):
    """Run the subcommand."""

    # If the output directory is not specified, move the files at the root where
    # the import configuration file is located. (Providing this default seems
    # better than using a required option.)
    if args.output_dir is None:
        if hasattr(args, 'config'):
            args.output_dir = path.dirname(path.abspath(args.config))
        else:
            import __main__ # pylint: disable=import-outside-toplevel
            args.output_dir = path.dirname(path.abspath(__main__.__file__))

    # Make sure the output directory exists.
    if not path.exists(args.output_dir):
        parser.error('Output directory "{}" does not exist.'.format(args.output_dir))

    file(importers_list, files_or_directories, args.output_dir,
         dry_run=args.dry_run,
         mkdirs=True,
         overwrite=args.overwrite,
         idify=True,
         logfile=sys.stdout)
    return 0

`beancount.ingest.identify` 

Identify script.

Read an import script and a list of downloaded filenames or directories of 2downloaded files, and for each of those files, identify which importer it should be associated with.

`beancount.ingest.identify.add_arguments(parser)` 

Add arguments for the identify command.

Source code in beancount/ingest/identify.py

def add_arguments(parser):
    """Add arguments for the identify command."""

`beancount.ingest.identify.find_imports(importer_config, files_or_directories, logfile=None)` 

Given an importer configuration, search for files that can be imported in the list of files or directories, run the signature checks on them and return a list of (filename, importers), where 'importers' is a list of importers that matched the file.

Parameters:	importer_config – a list of importer instances that define the config. files_or_directories – a list of files of directories to walk recursively and hunt for files to import. logfile – A file object to write log entries to, or None, in which case no log is written out.

Yields: Triples of filename found, textified contents of the file, and list of importers matching this file.

Source code in beancount/ingest/identify.py

def find_imports(importer_config, files_or_directories, logfile=None):
    """Given an importer configuration, search for files that can be imported in the
    list of files or directories, run the signature checks on them and return a list
    of (filename, importers), where 'importers' is a list of importers that matched
    the file.

    Args:
      importer_config: a list of importer instances that define the config.
      files_or_directories: a list of files of directories to walk recursively and
                            hunt for files to import.
      logfile: A file object to write log entries to, or None, in which case no log is
        written out.
    Yields:
      Triples of filename found, textified contents of the file, and list of
      importers matching this file.
    """
    # Iterate over all files found; accumulate the entries by identification.
    for filename in file_utils.find_files(files_or_directories):
        if logfile is not None:
            logfile.write(SECTION.format(filename))
            logfile.write('\n')

        # Skip files that are simply too large.
        size = path.getsize(filename)
        if size > FILE_TOO_LARGE_THRESHOLD:
            logging.warning("File too large: '{}' ({} bytes); skipping.".format(
                filename, size))
            continue

        # For each of the sources the user has declared, identify which
        # match the text.
        file = cache.get_file(filename)
        matching_importers = []
        for importer in importer_config:
            try:
                matched = importer.identify(file)
                if matched:
                    matching_importers.append(importer)
            except Exception as exc:
                logging.exception("Importer %s.identify() raised an unexpected error: %s",
                                  importer.name(), exc)

        yield (filename, matching_importers)

`beancount.ingest.identify.identify(importers_list, files_or_directories)` 

Run the identification loop.

Parameters:	importers_list – A list of importer instances. files_or_directories – A list of strings, files or directories.

Source code in beancount/ingest/identify.py

def identify(importers_list, files_or_directories):
    """Run the identification loop.

    Args:
      importers_list: A list of importer instances.
      files_or_directories: A list of strings, files or directories.
    """
    logfile = sys.stdout
    for filename, importers in find_imports(importers_list, files_or_directories,
                                            logfile=logfile):
        file = cache.get_file(filename)
        for importer in importers:
            logfile.write('Importer:    {}\n'.format(importer.name() if importer else '-'))
            logfile.write('Account:     {}\n'.format(importer.file_account(file)))
            logfile.write('\n')

`beancount.ingest.identify.run(_, __, importers_list, files_or_directories, hooks=None)` 

Run the subcommand.

Source code in beancount/ingest/identify.py

def run(_, __, importers_list, files_or_directories, hooks=None):
    """Run the subcommand."""
    return identify(importers_list, files_or_directories)

`beancount.ingest.importer` 

Importer protocol.

All importers must comply with this interface and implement at least some of its methods. A configuration consists in a simple list of such importer instances. The importer processes run through the importers, calling some of its methods in order to identify, extract and file the downloaded files.

Each of the methods accept a cache.FileMemo object which has a 'name' attribute with the filename to process, but which also provides a place to cache conversions. Use its convert() method whenever possible to avoid carrying out the same conversion multiple times. See beancount.ingest.cache for more details.

Synopsis:

name(): Return a unique identifier for the importer instance. identify(): Return true if the identifier is able to process the file. extract(): Extract directives from a file's contents and return of list of entries. file_account(): Return an account name associated with the given file for this importer. file_date(): Return a date associated with the downloaded file (e.g., the statement date). file_name(): Return a cleaned up filename for storage (optional).

Just to be clear: Although this importer will not raise NotImplementedError exceptions (it returns default values for each method), you NEED to derive from it in order to do anything meaningful. Simply instantiating this importer will not match not provide any useful information. It just defines the protocol for all importers.

`beancount.ingest.importer.ImporterProtocol` 

Interface that all source importers need to comply with.

`beancount.ingest.importer.ImporterProtocol.str(self)` `special` 

Return a unique id/name for this importer.

Returns:	A string which uniquely identifies this importer.

Source code in beancount/ingest/importer.py

def name(self):
    """Return a unique id/name for this importer.

    Returns:
      A string which uniquely identifies this importer.
    """
    cls = self.__class__
    return '{}.{}'.format(cls.__module__, cls.__name__)

`beancount.ingest.importer.ImporterProtocol.extract(self, file, existing_entries=None)` 

Extract transactions from a file.

If the importer would like to flag a returned transaction as a known duplicate, it may opt to set the special flag "duplicate" to True, and the transaction should be treated as a duplicate by the extraction code. This is a way to let the importer use particular information about previously imported transactions in order to flag them as duplicates. For example, if an importer has a way to get a persistent unique id for each of the imported transactions. (See this discussion for context: https://groups.google.com/d/msg/beancount/0iV-ipBJb8g/-uk4wsH2AgAJ)

Parameters:	file – A cache.FileMemo instance. existing_entries – An optional list of existing directives loaded from the ledger which is intended to contain the extracted entries. This is only provided if the user provides them via a flag in the extractor program.

Returns:	A list of new, imported directives (usually mostly Transactions) extracted from the file.

Source code in beancount/ingest/importer.py

def extract(self, file, existing_entries=None):
    """Extract transactions from a file.

    If the importer would like to flag a returned transaction as a known
    duplicate, it may opt to set the special flag "__duplicate__" to True,
    and the transaction should be treated as a duplicate by the extraction
    code. This is a way to let the importer use particular information about
    previously imported transactions in order to flag them as duplicates.
    For example, if an importer has a way to get a persistent unique id for
    each of the imported transactions. (See this discussion for context:
    https://groups.google.com/d/msg/beancount/0iV-ipBJb8g/-uk4wsH2AgAJ)

    Args:
      file: A cache.FileMemo instance.
      existing_entries: An optional list of existing directives loaded from
        the ledger which is intended to contain the extracted entries. This
        is only provided if the user provides them via a flag in the
        extractor program.
    Returns:
      A list of new, imported directives (usually mostly Transactions)
      extracted from the file.
    """

`beancount.ingest.importer.ImporterProtocol.file_account(self, file)` 

Return an account associated with the given file.

Note: If you don't implement this method you won't be able to move the files into its preservation hierarchy; the bean-file command won't work.

Also, normally the returned account is not a function of the input file--just of the importer--but it is provided anyhow.

Parameters:	file – A cache.FileMemo instance.

Returns:	The name of the account that corresponds to this importer.

Source code in beancount/ingest/importer.py

def file_account(self, file):
    """Return an account associated with the given file.

    Note: If you don't implement this method you won't be able to move the
    files into its preservation hierarchy; the bean-file command won't
    work.

    Also, normally the returned account is not a function of the input
    file--just of the importer--but it is provided anyhow.

    Args:
      file: A cache.FileMemo instance.
    Returns:
      The name of the account that corresponds to this importer.
    """

`beancount.ingest.importer.ImporterProtocol.file_date(self, file)` 

Attempt to obtain a date that corresponds to the given file.

Parameters:	file – A cache.FileMemo instance.

Returns:	A date object, if successful, or None if a date could not be extracted. (If no date is returned, the file creation time is used. This is the default.)

Source code in beancount/ingest/importer.py

def file_date(self, file):
    """Attempt to obtain a date that corresponds to the given file.

    Args:
      file: A cache.FileMemo instance.
    Returns:
      A date object, if successful, or None if a date could not be extracted.
      (If no date is returned, the file creation time is used. This is the
      default.)
    """

`beancount.ingest.importer.ImporterProtocol.file_name(self, file)` 

A filter that optionally renames a file before filing.

This is used to make tidy filenames for filed/stored document files. If you don't implement this and return None, the same filename is used. Note that if you return a filename, a simple, RELATIVE filename must be returned, not an absolute filename.

Parameters:	file – A cache.FileMemo instance.

Returns:	The tidied up, new filename to store it as.

Source code in beancount/ingest/importer.py

def file_name(self, file):
    """A filter that optionally renames a file before filing.

    This is used to make tidy filenames for filed/stored document files. If
    you don't implement this and return None, the same filename is used.
    Note that if you return a filename, a simple, RELATIVE filename must be
    returned, not an absolute filename.

    Args:
      file: A cache.FileMemo instance.
    Returns:
      The tidied up, new filename to store it as.
    """

`beancount.ingest.importer.ImporterProtocol.identify(self, file)` 

Return true if this importer matches the given file.

Parameters:	file – A cache.FileMemo instance.

Returns:	A boolean, true if this importer can handle this file.

Source code in beancount/ingest/importer.py

def identify(self, file):
    """Return true if this importer matches the given file.

    Args:
      file: A cache.FileMemo instance.
    Returns:
      A boolean, true if this importer can handle this file.
    """

`beancount.ingest.importer.ImporterProtocol.name(self)` 

Return a unique id/name for this importer.

Returns:	A string which uniquely identifies this importer.

Source code in beancount/ingest/importer.py

def name(self):
    """Return a unique id/name for this importer.

    Returns:
      A string which uniquely identifies this importer.
    """
    cls = self.__class__
    return '{}.{}'.format(cls.__module__, cls.__name__)

`beancount.ingest.importers` `special` 

`beancount.ingest.importers.config` 

Mixin to add support for configuring importers with multiple accounts.

This importer implements some simple common functionality to create importers which accept a long number of account names or regular expressions on the set of account names. This is inspired by functionality in the importers in the previous iteration of the ingest code, which used to be its own project.

`beancount.ingest.importers.config.ConfigImporterMixin` 

A mixin class which supports configuration of account names.

Mix this into the implementation of a importer.ImporterProtocol.

`beancount.ingest.importers.config.ConfigImporterMixin.init(self, config)` `special` 

Provide a list of accounts and regexps as configuration to the importer.

Parameters:	config – A dict of configuration accounts, that must match the values declared in the class' REQUIRED_CONFIG.

Source code in beancount/ingest/importers/config.py

def __init__(self, config):
    """Provide a list of accounts and regexps as configuration to the importer.

    Args:
      config: A dict of configuration accounts, that must match the values
        declared in the class' REQUIRED_CONFIG.
    """
    super().__init__()

    # Check that the required configuration values are present.
    assert isinstance(config, dict), "Configuration must be a dict type"
    if not self._verify_config(config):
        raise ValueError("Invalid config {}, requires {}".format(
            config, self.REQUIRED_CONFIG))
    self.config = config

`beancount.ingest.importers.csv` 

CSV importer.

`beancount.ingest.importers.csv.Col (Enum)` 

The set of interpretable columns.

`beancount.ingest.importers.csv.Importer (IdentifyMixin, FilingMixin)` 

Importer for CSV files.

`beancount.ingest.importers.csv.Importer.init(self, config, account, currency, regexps=None, skip_lines=0, last4_map=None, categorizer=None, institution=None, debug=False, csv_dialect='excel', dateutil_kwds=None, narration_sep='; ', encoding=None, invert_sign=False, **kwds)` `special` 

Constructor.

Parameters:

config – A dict of Col enum types to the names or indexes of the columns.
account – An account string, the account to post this to.
currency – A currency string, the currency of this account.
regexps – A list of regular expression strings.
skip_lines (int) – Skip first x (garbage) lines of file.
last4_map (Optional[Dict]) – A dict that maps last 4 digits of the card to a friendly string.
categorizer (Optional[Callable]) – A callable that attaches the other posting (usually expenses) to a transaction with only single posting.
institution (Optional[str]) – An optional name of an institution to rename the files to.
debug (bool) – Whether or not to print debug information
csv_dialect (Union[str, csv.Dialect]) – A csv dialect given either as string or as instance or subclass of csv.Dialect.
dateutil_kwds (Optional[Dict]) – An optional dict defining the dateutil parser kwargs.
narration_sep (str) – A string, a separator to use for splitting up the payee and narration fields of a source field.
encoding (Optional[str]) – An optional encoding for the file. Typically useful for files encoded in 'latin1' instead of 'utf-8' (the default).
invert_sign (Optional[bool]) – If true, invert the amount's sign unconditionally.
**kwds – Extra keyword arguments to provide to the base mixins.

Source code in beancount/ingest/importers/csv.py

def __init__(self, config, account, currency,
             regexps=None,
             skip_lines: int = 0,
             last4_map: Optional[Dict] = None,
             categorizer: Optional[Callable] = None,
             institution: Optional[str] = None,
             debug: bool = False,
             csv_dialect: Union[str, csv.Dialect] = 'excel',
             dateutil_kwds: Optional[Dict] = None,
             narration_sep: str = '; ',
             encoding: Optional[str] = None,
             invert_sign: Optional[bool] = False,
             **kwds):
    """Constructor.

    Args:
      config: A dict of Col enum types to the names or indexes of the columns.
      account: An account string, the account to post this to.
      currency: A currency string, the currency of this account.
      regexps: A list of regular expression strings.
      skip_lines: Skip first x (garbage) lines of file.
      last4_map: A dict that maps last 4 digits of the card to a friendly string.
      categorizer: A callable that attaches the other posting (usually expenses)
        to a transaction with only single posting.
      institution: An optional name of an institution to rename the files to.
      debug: Whether or not to print debug information
      csv_dialect: A `csv` dialect given either as string or as instance or
        subclass of `csv.Dialect`.
      dateutil_kwds: An optional dict defining the dateutil parser kwargs.
      narration_sep: A string, a separator to use for splitting up the payee and
        narration fields of a source field.
      encoding: An optional encoding for the file. Typically useful for files
        encoded in 'latin1' instead of 'utf-8' (the default).
      invert_sign: If true, invert the amount's sign unconditionally.
      **kwds: Extra keyword arguments to provide to the base mixins.
    """
    assert isinstance(config, dict), "Invalid type: {}".format(config)
    self.config = config

    self.currency = currency
    assert isinstance(skip_lines, int)
    self.skip_lines = skip_lines
    self.last4_map = last4_map or {}
    self.debug = debug
    self.dateutil_kwds = dateutil_kwds
    self.csv_dialect = csv_dialect
    self.narration_sep = narration_sep
    self.encoding = encoding
    self.invert_sign = invert_sign

    self.categorizer = categorizer

    # Prepare kwds for filing mixin.
    kwds['filing'] = account
    if institution:
        prefix = kwds.get('prefix', None)
        assert prefix is None
        kwds['prefix'] = institution

    # Prepare kwds for identifier mixin.
    if isinstance(regexps, str):
        regexps = [regexps]
    matchers = kwds.setdefault('matchers', [])
    matchers.append(('mime', 'text/csv'))
    if regexps:
        for regexp in regexps:
            matchers.append(('content', regexp))

    super().__init__(**kwds)

`beancount.ingest.importers.csv.Importer.extract(self, file, existing_entries=None)` 

Extract transactions from a file.

If the importer would like to flag a returned transaction as a known duplicate, it may opt to set the special flag "duplicate" to True, and the transaction should be treated as a duplicate by the extraction code. This is a way to let the importer use particular information about previously imported transactions in order to flag them as duplicates. For example, if an importer has a way to get a persistent unique id for each of the imported transactions. (See this discussion for context: https://groups.google.com/d/msg/beancount/0iV-ipBJb8g/-uk4wsH2AgAJ)

Parameters:	file – A cache.FileMemo instance. existing_entries – An optional list of existing directives loaded from the ledger which is intended to contain the extracted entries. This is only provided if the user provides them via a flag in the extractor program.

Returns:	A list of new, imported directives (usually mostly Transactions) extracted from the file.

Source code in beancount/ingest/importers/csv.py

def extract(self, file, existing_entries=None):
    account = self.file_account(file)
    entries = []

    # Normalize the configuration to fetch by index.
    iconfig, has_header = normalize_config(
        self.config, file.head(), self.csv_dialect, self.skip_lines)

    reader = iter(csv.reader(open(file.name, encoding=self.encoding),
                             dialect=self.csv_dialect))

    # Skip garbage lines
    for _ in range(self.skip_lines):
        next(reader)

    # Skip header, if one was detected.
    if has_header:
        next(reader)

    def get(row, ftype):
        try:
            return row[iconfig[ftype]] if ftype in iconfig else None
        except IndexError:  # FIXME: this should not happen
            return None

    # Parse all the transactions.
    first_row = last_row = None
    for index, row in enumerate(reader, 1):
        if not row:
            continue
        if row[0].startswith('#'):
            continue

        # If debugging, print out the rows.
        if self.debug:
            print(row)

        if first_row is None:
            first_row = row
        last_row = row

        # Extract the data we need from the row, based on the configuration.
        date = get(row, Col.DATE)
        txn_date = get(row, Col.TXN_DATE)
        txn_time = get(row, Col.TXN_TIME)

        payee = get(row, Col.PAYEE)
        if payee:
            payee = payee.strip()

        fields = filter(None, [get(row, field)
                               for field in (Col.NARRATION1,
                                             Col.NARRATION2,
                                             Col.NARRATION3)])
        narration = self.narration_sep.join(
            field.strip() for field in fields).replace('\n', '; ')

        tag = get(row, Col.TAG)
        tags = {tag} if tag is not None else data.EMPTY_SET

        link = get(row, Col.REFERENCE_ID)
        links = {link} if link is not None else data.EMPTY_SET

        last4 = get(row, Col.LAST4)

        balance = get(row, Col.BALANCE)

        # Create a transaction
        meta = data.new_metadata(file.name, index)
        if txn_date is not None:
            meta['date'] = parse_date_liberally(txn_date,
                                                self.dateutil_kwds)
        if txn_time is not None:
            meta['time'] = str(dateutil.parser.parse(txn_time).time())
        if balance is not None:
            meta['balance'] = D(balance)
        if last4:
            last4_friendly = self.last4_map.get(last4.strip())
            meta['card'] = last4_friendly if last4_friendly else last4
        date = parse_date_liberally(date, self.dateutil_kwds)
        txn = data.Transaction(meta, date, self.FLAG, payee, narration,
                               tags, links, [])

        # Attach one posting to the transaction
        amount_debit, amount_credit = self.get_amounts(iconfig, row)

        # Skip empty transactions
        if amount_debit is None and amount_credit is None:
            continue

        for amount in [amount_debit, amount_credit]:
            if amount is None:
                continue
            if self.invert_sign:
                amount = -amount
            units = Amount(amount, self.currency)
            txn.postings.append(
                data.Posting(account, units, None, None, None, None))

        # Attach the other posting(s) to the transaction.
        if isinstance(self.categorizer, collections.abc.Callable):
            txn = self.categorizer(txn)

        # Add the transaction to the output list
        entries.append(txn)

    # Figure out if the file is in ascending or descending order.
    first_date = parse_date_liberally(get(first_row, Col.DATE),
                                      self.dateutil_kwds)
    last_date = parse_date_liberally(get(last_row, Col.DATE),
                                     self.dateutil_kwds)
    is_ascending = first_date < last_date

    # Reverse the list if the file is in descending order
    if not is_ascending:
        entries = list(reversed(entries))

    # Add a balance entry if possible
    if Col.BALANCE in iconfig and entries:
        entry = entries[-1]
        date = entry.date + datetime.timedelta(days=1)
        balance = entry.meta.get('balance', None)
        if balance is not None:
            meta = data.new_metadata(file.name, index)
            entries.append(
                data.Balance(meta, date,
                             account, Amount(balance, self.currency),
                             None, None))

    # Remove the 'balance' metadata.
    for entry in entries:
        entry.meta.pop('balance', None)

    return entries

`beancount.ingest.importers.csv.Importer.file_date(self, file)` 

Get the maximum date from the file.

Source code in beancount/ingest/importers/csv.py

def file_date(self, file):
    "Get the maximum date from the file."
    iconfig, has_header = normalize_config(
        self.config, file.head(), self.csv_dialect, self.skip_lines)
    if Col.DATE in iconfig:
        reader = iter(csv.reader(open(file.name), dialect=self.csv_dialect))
        for _ in range(self.skip_lines):
            next(reader)
        if has_header:
            next(reader)
        max_date = None
        for row in reader:
            if not row:
                continue
            if row[0].startswith('#'):
                continue
            date_str = row[iconfig[Col.DATE]]
            date = parse_date_liberally(date_str, self.dateutil_kwds)
            if max_date is None or date > max_date:
                max_date = date
        return max_date

`beancount.ingest.importers.csv.Importer.get_amounts(self, iconfig, row, allow_zero_amounts=False)` 

See function get_amounts() for details.

This method is present to allow clients to override it in order to deal with special cases, e.g., columns with currency symbols in them.

Source code in beancount/ingest/importers/csv.py

def get_amounts(self, iconfig, row, allow_zero_amounts=False):
    """See function get_amounts() for details.

    This method is present to allow clients to override it in order to deal
    with special cases, e.g., columns with currency symbols in them.
    """
    return get_amounts(iconfig, row, allow_zero_amounts)

`beancount.ingest.importers.csv.get_amounts(iconfig, row, allow_zero_amounts=False)` 

Get the amount columns of a row.

Parameters:	iconfig – A dict of Col to row index. row – A row array containing the values of the given row. allow_zero_amounts – Is a transaction with amount D('0.00') okay? If not, return (None, None).

Returns:	A pair of (debit-amount, credit-amount), both of which are either an instance of Decimal or None, or not available.

Source code in beancount/ingest/importers/csv.py

def get_amounts(iconfig, row, allow_zero_amounts=False):
    """Get the amount columns of a row.

    Args:
      iconfig: A dict of Col to row index.
      row: A row array containing the values of the given row.
      allow_zero_amounts: Is a transaction with amount D('0.00') okay? If not,
        return (None, None).
    Returns:
      A pair of (debit-amount, credit-amount), both of which are either an
      instance of Decimal or None, or not available.
    """
    debit, credit = None, None
    if Col.AMOUNT in iconfig:
        credit = row[iconfig[Col.AMOUNT]]
    else:
        debit, credit = [row[iconfig[col]] if col in iconfig else None
                         for col in [Col.AMOUNT_DEBIT, Col.AMOUNT_CREDIT]]

    # If zero amounts aren't allowed, return null value.
    is_zero_amount = ((credit is not None and D(credit) == ZERO) and
                      (debit is not None and D(debit) == ZERO))
    if not allow_zero_amounts and is_zero_amount:
        return (None, None)

    return (-D(debit) if debit else None,
            D(credit) if credit else None)

`beancount.ingest.importers.csv.normalize_config(config, head, dialect='excel', skip_lines=0)` 

Using the header line, convert the configuration field name lookups to int indexes.

Parameters:	config – A dict of Col types to string or indexes. head – A string, some decent number of bytes of the head of the file. dialect – A dialect definition to parse the header skip_lines (`int`) – Skip first x (garbage) lines of file.

Returns:	A pair of A dict of Col types to integer indexes of the fields, and a boolean, true if the file has a header.

Exceptions:	`ValueError` – If there is no header and the configuration does not consist entirely of integer indexes.

Source code in beancount/ingest/importers/csv.py

def normalize_config(config, head, dialect='excel', skip_lines: int = 0):
    """Using the header line, convert the configuration field name lookups to int indexes.

    Args:
      config: A dict of Col types to string or indexes.
      head: A string, some decent number of bytes of the head of the file.
      dialect: A dialect definition to parse the header
      skip_lines: Skip first x (garbage) lines of file.
    Returns:
      A pair of
        A dict of Col types to integer indexes of the fields, and
        a boolean, true if the file has a header.
    Raises:
      ValueError: If there is no header and the configuration does not consist
        entirely of integer indexes.
    """
    # Skip garbage lines before sniffing the header
    assert isinstance(skip_lines, int)
    assert skip_lines >= 0
    for _ in range(skip_lines):
        head = head[head.find('\n')+1:]

    has_header = csv.Sniffer().has_header(head)
    if has_header:
        header = next(csv.reader(io.StringIO(head), dialect=dialect))
        field_map = {field_name.strip(): index
                     for index, field_name in enumerate(header)}
        index_config = {}
        for field_type, field in config.items():
            if isinstance(field, str):
                field = field_map[field]
            index_config[field_type] = field
    else:
        if any(not isinstance(field, int)
               for field_type, field in config.items()):
            raise ValueError("CSV config without header has non-index fields: "
                             "{}".format(config))
        index_config = config
    return index_config, has_header

`beancount.ingest.importers.fileonly` 

A simplistic importer that can be used just to file away some download.

Sometimes you just want to save and accumulate data

`beancount.ingest.importers.fileonly.Importer (FilingMixin, IdentifyMixin)` 

An importer that supports only matching (identification) and filing.

`beancount.ingest.importers.mixins` `special` 

`beancount.ingest.importers.mixins.config` 

Base class that implements configuration and a filing account.

`beancount.ingest.importers.mixins.config.ConfigMixin (ImporterProtocol)` 

`beancount.ingest.importers.mixins.config.ConfigMixin.init(self, **kwds)` `special` 

Pull 'config' from kwds.

Source code in beancount/ingest/importers/mixins/config.py

def __init__(self, **kwds):
    """Pull 'config' from kwds."""

    config = kwds.pop('config', None)
    schema = self.REQUIRED_CONFIG
    if config or schema:
        assert config is not None
        assert schema is not None
        self.config = validate_config(config, config, self)
    else:
        self.config = None

    super().__init__(**kwds)

`beancount.ingest.importers.mixins.config.validate_config(config, schema, importer)` 

Check the configuration account provided by the user against the accounts required by the source importer.

Parameters:	config – A config dict of actual values on an importer. schema – A dict of declarations of required values.

Exceptions:	`ValueError` – If the configuration is invalid.

Returns:	A validated configuration dict.

Source code in beancount/ingest/importers/mixins/config.py

def validate_config(config, schema, importer):
    """Check the configuration account provided by the user against the accounts
    required by the source importer.

    Args:
      config: A config dict of actual values on an importer.
      schema: A dict of declarations of required values.
    Raises:
      ValueError: If the configuration is invalid.
    Returns:
      A validated configuration dict.
    """
    provided_options = set(config)
    required_options = set(schema)

    for option in (required_options - provided_options):
        raise ValueError("Missing value from user configuration for importer {}: {}".format(
            importer.__class__.__name__, option))

    for option in (provided_options - required_options):
        raise ValueError("Unknown value in user configuration for importer {}: {}".format(
            importer.__class__.__name__, option))

    # FIXME: Validate types as well, including account type as a default.

    # FIXME: Here we could validate account names by looking them up from the
    # existing ledger.

    return config

`beancount.ingest.importers.mixins.filing` 

Base class that implements filing account.

It also sports an optional prefix to prepend to the renamed filename. Typically you can put the name of the institution there, so you get a renamed filename like this:

YYYY-MM-DD.institution.Original_File_Name.pdf

`beancount.ingest.importers.mixins.filing.FilingMixin (ImporterProtocol)` 

`beancount.ingest.importers.mixins.filing.FilingMixin.init(self, **kwds)` `special` 

Pull 'filing' and 'prefix' from kwds.

Parameters:	filing – The name of the account to file to. prefix – The name of the institution prefix to insert.

Source code in beancount/ingest/importers/mixins/filing.py

def __init__(self, **kwds):
    """Pull 'filing' and 'prefix' from kwds.

    Args:
      filing: The name of the account to file to.
      prefix: The name of the institution prefix to insert.
    """

    self.filing_account = kwds.pop('filing', None)
    assert account.is_valid(self.filing_account)

    self.prefix = kwds.pop('prefix', None)

    super().__init__(**kwds)

`beancount.ingest.importers.mixins.filing.FilingMixin.file_account(self, file)` 

Return an account associated with the given file.

Note: If you don't implement this method you won't be able to move the files into its preservation hierarchy; the bean-file command won't work.

Also, normally the returned account is not a function of the input file--just of the importer--but it is provided anyhow.

Parameters:	file – A cache.FileMemo instance.

Returns:	The name of the account that corresponds to this importer.

Source code in beancount/ingest/importers/mixins/filing.py

def file_account(self, file):
    return self.filing_account

`beancount.ingest.importers.mixins.filing.FilingMixin.file_name(self, file)` 

Return the optional renamed account filename.

Source code in beancount/ingest/importers/mixins/filing.py

def file_name(self, file):
    """Return the optional renamed account filename."""
    supername = super().file_name(file)
    if not self.prefix:
        return supername
    else:
        return '.'.join(filter(None, [self.prefix,
                                      supername or path.basename(file.name)]))

`beancount.ingest.importers.mixins.filing.FilingMixin.name(self)` 

Include the filing account in the name.

Source code in beancount/ingest/importers/mixins/filing.py

def name(self):
    """Include the filing account in the name."""
    return '{}: "{}"'.format(super().name(), self.filing_account)

`beancount.ingest.importers.mixins.identifier` 

Base class that implements identification using regular expressions.

`beancount.ingest.importers.mixins.identifier.IdentifyMixin (ImporterProtocol)` 

`beancount.ingest.importers.mixins.identifier.IdentifyMixin.init(self, **kwds)` `special` 

Pull 'matchers' and 'converter' from kwds.

Source code in beancount/ingest/importers/mixins/identifier.py

def __init__(self, **kwds):
    """Pull 'matchers' and 'converter' from kwds."""

    self.remap = collections.defaultdict(list)
    matchers = kwds.pop('matchers', [])
    cls_matchers = getattr(self, 'matchers', [])
    assert isinstance(matchers, list)
    assert isinstance(cls_matchers, list)
    for part, regexp in itertools.chain(matchers, cls_matchers):
        assert part in _PARTS, repr(part)
        assert isinstance(regexp, str), repr(regexp)
        self.remap[part].append(re.compile(regexp))

    # Converter is a fn(filename: Text) -> contents: Text.
    self.converter = kwds.pop('converter',
                              getattr(self, 'converter', None))

    super().__init__(**kwds)

`beancount.ingest.importers.mixins.identifier.IdentifyMixin.identify(self, file)` 

Return true if this importer matches the given file.

Parameters:	file – A cache.FileMemo instance.

Returns:	A boolean, true if this importer can handle this file.

Source code in beancount/ingest/importers/mixins/identifier.py

def identify(self, file):
    return identify(self.remap, self.converter, file)

`beancount.ingest.importers.mixins.identifier.identify(remap, converter, file)` 

Identify the contents of a file.

Parameters:	remap – A dict of 'part' to list-of-compiled-regexp objects, where each item is a specification to match against its part. The 'part' can be one of 'mime', 'filename' or 'content'. converter – A

Returns:	A boolean, true if the file is not rejected by the constraints.

Source code in beancount/ingest/importers/mixins/identifier.py

def identify(remap, converter, file):
    """Identify the contents of a file.

    Args:
      remap: A dict of 'part' to list-of-compiled-regexp objects, where each item is
        a specification to match against its part. The 'part' can be one of 'mime',
        'filename' or 'content'.
      converter: A
    Returns:
      A boolean, true if the file is not rejected by the constraints.
    """
    if remap.get('mime', None):
        mimetype = file.convert(cache.mimetype)
        if not all(regexp.search(mimetype)
                   for regexp in remap['mime']):
            return False

    if remap.get('filename', None):
        if not all(regexp.search(file.name)
                   for regexp in remap['filename']):
            return False

    if remap.get('content', None):
        # If this is a text file, read the whole thing in memory.
        text = file.convert(converter or cache.contents)
        if not all(regexp.search(text)
                   for regexp in remap['content']):
            return False

    return True

`beancount.ingest.importers.ofx` 

OFX file format importer for bank and credit card statements.

https://en.wikipedia.org/wiki/Open_Financial_Exchange

This importer will parse a single account in the OFX file. Instantiate it multiple times with different accounts if it has many accounts. It makes more sense to do it this way so that you can define your importer configuration account by account.

Note that this importer is provided as an example and with no guarantees. It's not really super great. On the other hand, I've been using it for more than five years over multiple accounts, so it has been useful to me (it works, by some measure of "works"). If you need a more powerful or compliant OFX importer please consider either writing one or contributing changes. Also, this importer does its own very basic parsing; a better one would probably use (and depend on) the ofxparse module (see https://sites.google.com/site/ofxparse/).

`beancount.ingest.importers.ofx.BalanceType (Enum)` 

Type of Balance directive to be inserted.

`beancount.ingest.importers.ofx.Importer (ImporterProtocol)` 

An importer for Open Financial Exchange files.