fuzzysearch

https://badge.fury.io/py/fuzzysearch.png https://travis-ci.org/taleinat/fuzzysearch.png?branch=master https://coveralls.io/repos/taleinat/fuzzysearch/badge.png?branch=master https://pypip.in/d/fuzzysearch/badge.png

fuzzysearch is useful for finding approximate subsequence matches

Installation

Just install using pip:

$ pip install fuzzysearch

Features

  • Fuzzy sub-sequence search: Find parts of a sequence which match a given sub-sequence up to a given maximum Levenshtein distance.
  • Set individual limits for the number of substitutions, insertions and/or deletions allowed for a near-match.
  • Includes optimized implementations for specific use-cases, e.g. only allowing substitutions in near-matches.

Simple Example

You can usually just use the find_near_matches() utility function, which chooses a suitable fuzzy search implementation according to the given parameters:

>>> from fuzzysearch import find_near_matches
>>> find_near_matches('PATTERN', 'aaaPATERNaaa', max_l_dist=1)
[Match(start=3, end=9, dist=1)]

Advanced Example

If needed you can choose a specific search implementation, such as find_near_matches_with_ngrams():

>>> sequence = '''\
GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
GGGATAGG'''
>>> subsequence = 'TGCACTGTAGGGATAACAAT' #distance 1
>>> max_distance = 2

>>> from fuzzysearch import find_near_matches_with_ngrams
>>> find_near_matches_with_ngrams(subsequence, sequence, max_distance)
[Match(start=3, end=24, dist=1)]