Latest Lists

Our business requies software to find duplications in its database and marketing lists,what can you recommend?

Public Comments

  1. What sort of database is it? If it;s any kind of SQL based database, you should be able to write the SQL that will flush out any duplicates - depending on how you choose to define a duplicate! (not always as easy as it sounds!) ================ Feel free to e-mail me with further details if you want. I have done this many times before.
  2. Above poster is correct. You should be able to identify any duplicate database entries yourself with a SQL query. Of course, it depends on what you consider a duplicate. It sounds like you probably need to normalize your DB if your worried about redundant or non-synchronized data. I'd suggest contacting a tech/developer consultant/firm.
  3. It is possibly best to illustrate with an example. You talk about a marketing list, so let's suppose that you have a table MARKETING_LIST with the columns ID, NAME, EMAIL. Let's suppose that you would like to find out whether there are records with the same email. In an Oracle database an SQL statement like the following will give you a list of emails which occurs more than once in your marketing list table and the number of duplicates for each email: SELECT email, count(*) FROM marketing_list GROUP BY email HAVING count(*) > 1; That in turn will give you an idea of how badly your MARKETING_LIST table is in need of deduplication in relation to the EMAIL column. You can apply the same methodology to other tables and columns. This is a job which can be rather labour intensive depending on the size and structure of your database. Also you will need to repeat this work periodically to assure that new duplicates do not accumulate again. Both Oracle and IBM produce good profiling tools which automate this kind of data quality procedure, namely Oracle Warehouse Builder and IBM Websphere ProfileStage.
  4. I have worked with databases for over 20 years. Without knowing the data basic contents nor the software used, this is a very hard question to answer. I mean is it DB2, SQL, ACCESS, Excel? Is it for red hat or an inhouse made up system? Is it indexed? Without it being indexed it will be very hard to find duplicates? Is it only in ASCII format? Do you need to construct a database from raw data? Is the database complete or incomplete? I need to know all these before I can suggest solutions.
Powered by Yahoo! Answers