
|
TextPipe
Pro
A
Swiss Army knife for fixing text - the NEW TextPipe®
Pro 8.6.7 - our multi-award winning, industrial strength
text conversion, transformation and extraction
workbench.
TextPipe will save you time and frustration in fixing
text data, regardless of the number of changes required,
the size or number of files, and the complexity of the
transformations.
Makes
text processing a snap! Whether it's a 30,000 page
website or the Clipboard, TextPipe automates common
tasks like search/replace (with pattern matching and
Sounds-like), end-of-line conversion, extract email
addresses, fixing HTML etc.
100 example filters are included to get you started,
and TextPipe's 60 internal filters make life easy.
Full HTML online help is provided. Multi-threading
handles streams in parallel.
What
is TextPipe?
TextPipe™
is a multi-award winning, industrial strength text
transformation, conversion, cleansing and extraction
workbench.
One
tool - One point of maintenance. With TextPipe you
specify all your text processing functions in one
place, rather than remembering and managing multiple
manual jobs across various text editors, command line
tools, custom scripts and Word and Excel macros.
What
does TextPipe do?
TextPipe
makes it fast and easy to convert, transform and
re-purpose data in text files, including
- HTML,
XML and other structured documents from the WWW
- Fixed
length or delimited files (CSV, Tab, Pipe, etc)
- Unix,
Mainframe and PC/Windows end-of-line formats
- Inside
Zip files, and the new Microsoft Office 2007
formats DOCX, XLSX, PPTX
- ASCII,
ANSI, Unicode and EBCDIC files
- Security
log files from firewalls, web servers etc
- EDIFACT,
HL7, SWIFT and other structured formats
- Spooled
print files
- Structured
and unstructured reports of any size or dimension
TextPipe
also works with binary files, however for Word
documents (.doc) see WordPipe,
for Excel spreadsheets (.XLS) see ExcelPipe,
for PowerPoint presentations (.PPT) see PowerPointPipe
and for databases see DataPipe.
For mining of web sites using TextPipe, see WebPipe.
Seven
Reasons Why TextPipe is Different
- TextPipe
is exceptionally fast
- TextPipe
handles files of unlimited size, even files larger
than 2 Gigabytes! Other applications attempt to
load the entire file into memory (grinding your
system to a halt).
- TextPipe's
unique restrictions control precisely where
changes are made. Restrict to a range of lines or
columns, to specific Tab or CSV fields, between
HTML/XML tags, and inside custom ranges.
Restrictions can be combined, for example, to
columns 1-10 of lines matching a pattern.
Restrictions are essential for extensive but
controlled search and replace
- TextPipe
performs multiple operations simultaneously. Other
applications offer only 1, up to 5, or require a
slow multi-pass approach
- If
TextPipe's 100+ filters don't suit your needs, you
can use industry standard VBScript/JScript to
write your own. Other applications either don't
offer this facility, or force you to learn a
proprietary language
- TextPipe
is unique in offering the EasyPattern pattern
matching language for those not familiar with text
pattern matching (regular expressions).
EasyPatterns are English-like and very easy to
learn
- TextPipe
can be scheduled for non-interactive use, and can
be controlled by an external program. Other
applications provide only a mouse interface.
TextPipe
Users Include:
- Business
Analysts (BAs)
- Database
Administrators (DBAs)
- Web
Site Designers and Authors
- Developers
|
TextPipe
Industries Include:
- Banking
and Finance
- Database
Consulting/Systems Integrators/High Tech
- Mail
houses/Print and Publishing
|
TextPipe
Customers Include:
- Bank
of America
- NASA
- US
Department of Energy
|
TextPipe
will save you time, frustration and money. It will fix
text data, regardless of the number of changes
required, the size or number of files, and the
complexity of the transformations.
|
It
is trusted by over 1500 customers in 56
countries to
- Convert
Huge Files Quickly and Easily
- Data
Mine Unstructured Mainframe Reports and Online
Web Data
- Cleanse
and Reformat Electronic Text
- Update
Web Sites
- Perform
Data Warehouse ETL (Extract-Transfer-Load)
Tasks
- Extract
from Databases to XML, CSV, Tab
- Split
and Join Huge Files
- Convert
Between a Variety of Mainframe, PC and
Unicode Data Formats and Encodings
- Pre-processing
training data for SMT (Statistical Machine
Translation)
- and
1001 other uses.
|
TextPipe
provides a single point of maintenance for all your
text processing tasks. You learn one tool, rather than
learning 4 or more - and their associated languages,
command line options, debugging schemes,
idiosyncrasies and operating system differences and
dependencies. TextPipe is far less costly to learn,
use, develop with and maintain than cobbling together
multiple generic tools and custom scripts to achieve
one end. It's a Swiss army knife combining the best of
perl, awk, grep, sed, and many other less common text
processing tools. You'll be productive with TextPipe
in minutes, not days.
TextPipe's unmatched power comes from its arsenal of
100+ manipulation filters, its unique
architecture and its tremendous flexibility in
combining these filters to suit each task. Intuitive
line, column, field, tag and attribute restrictions
make fixing data extracts simple. You can extract and
then modify data from databases, in delimited, XML
and SQL Insert Script formats. You can roll your own
custom filters using industry standard VBScript
and JScript. With TextPipe you can create your own
conversions, and deploy them for execution at remote
sites. A single click merges files (even those larger
than 10 GB), another click extracts emails addresses,
and another click sorts and removes duplicates. Try
doing that with less than 100 lines of code, in less
than 10 seconds!
Essayer
/ Acheter
Voir aussi :
WebPipe
WebPipe
downloads partial or entire web sites to your hard
disk for data mining with TextPipe Pro. WebPipe is a
custom version of Offline Explorer with specialized
extensions specifically for data mining work and for
working with TextPipe Pro.
Using
TextPipe Pro, WebPipe can be used to data mine content
from part or all of any web site on a scheduled basis:
- Extract,
cleanse, sort and de-duplicate email address from
downloaded web sites
- Extract
and de-duplicate web site URLs from downloaded web
sites
- Gather
data from your competitor's web sites and then
republish it, or use it for sales analysis.
- Remove
advertising images and HTML from downloaded web
sites
- Upload
competitor prices into your sales database
- and
much more!
Overview
WebPipe
allows you to download your favorite Web and FTP sites
for later offline viewing, editing or browsing. Then
use TextPipe Pro to data mine content or keywords from
your competitor's web sites!
TextPipe
Pro is a data extraction and text manipulation
application that updates your web site, extracts data
from databases, reformats and standardizes your
electronic text and program source code, data mines
unstructured text reports and your competitor's web
sites, cleanses data in legacy databases, converts
between a variety of mainframe and PC data formats -
the possibilities are simply endless.
How
to Data Mine Content using TextPipe Pro
In
order to effectively data mine content from web pages,
you first have to remove all the extraneous
information such as color and formatting, extra
spaces, graphics, forms, comments, styles, advertising
and embedded frames. To perform this step, we link to
a predefined TextPipe filter in html\data mine.fll.
To use it, in the File Menu, choose Link to
Filter, and then select the filter file. This
includes the filter without modifying it.
Next
we need to simplify the html tags to change
"<table border="3"
padding="3" etc>" into just
"<table>". This will make it much
easier to search and replace later on. To do this, we
use a filter called html\simplify tags.fll.
Again, to use it, in the File Menu, choose Link
to Filter, and then select the filter file.
Finally,
to convert data from html table format to a CSV
(comma-separated value) format that we can easily
import into Excel, we use the filter html\data mine
html tables.fll. Again, to use it, in the File
Menu, choose Link to Filter, and then
select the filter file.
Once
this is done, just drag and drop the file onto
TextPipe's window, set the Output Filter to save the
result file somewhere like the Desktop, and then click
'Go'.
It's
worth noting that you may need to remove other html
tables from headers and footers near your data. This
must be done manually, because there is no way the
software can determine what is junk data and what is
not. To remove a table, in the Special Menu,
choose Find and Replace (Find Pattern). A new
search and replace filter is added, ensure it has a
find type of Pattern (perl). The add text like
'<table>.*</table>'. This will find a
start and end table tag with anything in-between.
You
can use WebPipe to download all or part of web sites
on a scheduled basis and then feed them into TextPipe
Pro automatically.
|