Swiss Army knife for fixing text - the NEW TextPipe®
Pro 8.6.7 - our multi-award winning, industrial strength
text conversion, transformation and extraction
TextPipe will save you time and frustration in fixing
text data, regardless of the number of changes required,
the size or number of files, and the complexity of the
text processing a snap! Whether it's a 30,000 page
website or the Clipboard, TextPipe automates common
tasks like search/replace (with pattern matching and
Sounds-like), end-of-line conversion, extract email
addresses, fixing HTML etc.
100 example filters are included to get you started,
and TextPipe's 60 internal filters make life easy.
Full HTML online help is provided. Multi-threading
handles streams in parallel.
is a multi-award winning, industrial strength text
transformation, conversion, cleansing and extraction
tool - One point of maintenance. With TextPipe you
specify all your text processing functions in one
place, rather than remembering and managing multiple
manual jobs across various text editors, command line
tools, custom scripts and Word and Excel macros.
does TextPipe do?
makes it fast and easy to convert, transform and
re-purpose data in text files, including
XML and other structured documents from the WWW
length or delimited files (CSV, Tab, Pipe, etc)
Mainframe and PC/Windows end-of-line formats
Zip files, and the new Microsoft Office 2007
formats DOCX, XLSX, PPTX
ANSI, Unicode and EBCDIC files
log files from firewalls, web servers etc
HL7, SWIFT and other structured formats
and unstructured reports of any size or dimension
also works with binary files, however for Word
documents (.doc) see WordPipe,
for Excel spreadsheets (.XLS) see ExcelPipe,
for PowerPoint presentations (.PPT) see PowerPointPipe
and for databases see DataPipe.
For mining of web sites using TextPipe, see WebPipe.
Reasons Why TextPipe is Different
is exceptionally fast
handles files of unlimited size, even files larger
than 2 Gigabytes! Other applications attempt to
load the entire file into memory (grinding your
system to a halt).
unique restrictions control precisely where
changes are made. Restrict to a range of lines or
columns, to specific Tab or CSV fields, between
HTML/XML tags, and inside custom ranges.
Restrictions can be combined, for example, to
columns 1-10 of lines matching a pattern.
Restrictions are essential for extensive but
controlled search and replace
performs multiple operations simultaneously. Other
applications offer only 1, up to 5, or require a
slow multi-pass approach
TextPipe's 100+ filters don't suit your needs, you
can use industry standard VBScript/JScript to
write your own. Other applications either don't
offer this facility, or force you to learn a
is unique in offering the EasyPattern pattern
matching language for those not familiar with text
pattern matching (regular expressions).
EasyPatterns are English-like and very easy to
can be scheduled for non-interactive use, and can
be controlled by an external program. Other
applications provide only a mouse interface.
Site Designers and Authors
Consulting/Systems Integrators/High Tech
houses/Print and Publishing
Department of Energy
will save you time, frustration and money. It will fix
text data, regardless of the number of changes
required, the size or number of files, and the
complexity of the transformations.
is trusted by over 1500 customers in 56
Huge Files Quickly and Easily
Mine Unstructured Mainframe Reports and Online
and Reformat Electronic Text
Data Warehouse ETL (Extract-Transfer-Load)
from Databases to XML, CSV, Tab
and Join Huge Files
Between a Variety of Mainframe, PC and
Unicode Data Formats and Encodings
training data for SMT (Statistical Machine
1001 other uses.
provides a single point of maintenance for all your
text processing tasks. You learn one tool, rather than
learning 4 or more - and their associated languages,
command line options, debugging schemes,
idiosyncrasies and operating system differences and
dependencies. TextPipe is far less costly to learn,
use, develop with and maintain than cobbling together
multiple generic tools and custom scripts to achieve
one end. It's a Swiss army knife combining the best of
perl, awk, grep, sed, and many other less common text
processing tools. You'll be productive with TextPipe
in minutes, not days.
TextPipe's unmatched power comes from its arsenal of
100+ manipulation filters, its unique
architecture and its tremendous flexibility in
combining these filters to suit each task. Intuitive
line, column, field, tag and attribute restrictions
make fixing data extracts simple. You can extract and
then modify data from databases, in delimited, XML
and SQL Insert Script formats. You can roll your own
custom filters using industry standard VBScript
and JScript. With TextPipe you can create your own
conversions, and deploy them for execution at remote
sites. A single click merges files (even those larger
than 10 GB), another click extracts emails addresses,
and another click sorts and removes duplicates. Try
doing that with less than 100 lines of code, in less
than 10 seconds!
Voir aussi :
downloads partial or entire web sites to your hard
disk for data mining with TextPipe Pro. WebPipe is a
custom version of Offline Explorer with specialized
extensions specifically for data mining work and for
working with TextPipe Pro.
TextPipe Pro, WebPipe can be used to data mine content
from part or all of any web site on a scheduled basis:
cleanse, sort and de-duplicate email address from
downloaded web sites
and de-duplicate web site URLs from downloaded web
data from your competitor's web sites and then
republish it, or use it for sales analysis.
advertising images and HTML from downloaded web
competitor prices into your sales database
allows you to download your favorite Web and FTP sites
for later offline viewing, editing or browsing. Then
use TextPipe Pro to data mine content or keywords from
your competitor's web sites!
Pro is a data extraction and text manipulation
application that updates your web site, extracts data
from databases, reformats and standardizes your
electronic text and program source code, data mines
unstructured text reports and your competitor's web
sites, cleanses data in legacy databases, converts
between a variety of mainframe and PC data formats -
the possibilities are simply endless.
to Data Mine Content using TextPipe Pro
order to effectively data mine content from web pages,
you first have to remove all the extraneous
information such as color and formatting, extra
spaces, graphics, forms, comments, styles, advertising
and embedded frames. To perform this step, we link to
a predefined TextPipe filter in html\data mine.fll.
To use it, in the File Menu, choose Link to
Filter, and then select the filter file. This
includes the filter without modifying it.
we need to simplify the html tags to change
padding="3" etc>" into just
"<table>". This will make it much
easier to search and replace later on. To do this, we
use a filter called html\simplify tags.fll.
Again, to use it, in the File Menu, choose Link
to Filter, and then select the filter file.
to convert data from html table format to a CSV
(comma-separated value) format that we can easily
import into Excel, we use the filter html\data mine
html tables.fll. Again, to use it, in the File
Menu, choose Link to Filter, and then
select the filter file.
this is done, just drag and drop the file onto
TextPipe's window, set the Output Filter to save the
result file somewhere like the Desktop, and then click
worth noting that you may need to remove other html
tables from headers and footers near your data. This
must be done manually, because there is no way the
software can determine what is junk data and what is
not. To remove a table, in the Special Menu,
choose Find and Replace (Find Pattern). A new
search and replace filter is added, ensure it has a
find type of Pattern (perl). The add text like
'<table>.*</table>'. This will find a
start and end table tag with anything in-between.
can use WebPipe to download all or part of web sites
on a scheduled basis and then feed them into TextPipe