Questions tagged [data-scrubbing]

The process of detecting and correcting (or removing) corrupt or inaccurate records from a data set

Data cleansing, data cleaning or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant parts of the data and then resolving the issue by either replacing, modifying or deleting the errant data

http://en.wikipedia.org/wiki/Data_cleansing

65 questions
1
vote
3 answers

How to scrape data from a web page using SAS

Problem statement: I am required to get the data from web and put it into SAS dataset by using SAS Program. Worked well: I am able to fetch the contents of the target web page by SAS. Not working (Need Help): I am not able to process the source…
1
vote
2 answers

Scrub email address from Mysql

I have a MySQL database full of user information. I'd like to give it to a contractor to do some analysis, but I don't want to expose all of my user information. My biggest concern now are the email addresses. I would like to keep the email address…
Tom Hazel
  • 3,232
  • 2
  • 20
  • 20
1
vote
3 answers

How to scrub date of birth in java in a random way which results in same random number generated when i feed the same original date of birth

I am trying to do data scrubbing, where I am trying to scrub date of birth field, but I want it to be consistent in a way, that the same random number or date of birth be generated for the same input date. Kindly help me regarding this. I have…
Shaun
  • 11
  • 2
1
vote
2 answers

Shiny App R - Scrubbing and Error

I'm building a Shiny App in R and I'm trying to scrub off the web information about the user's selected Pokemon, but I keep running into the problem of 'Error: SLL certificate problem' when trying to use read_html() ui: sidebarPanel( …
1
vote
1 answer

Group duplicate columns and sum the corresponding column values using pandas

I am preprocessing apache server log data. I have 3 columns ID, TIME, and BYTES. Example: ID     TIME     BYTES 1     13:00     10 2     13:02     30 3  …
user5843394
1
vote
0 answers

Changing values of numbers in CodeMirror by using scrubbing addon

I'm having trouble with CodeMirror. I'm trying to add in live number scrubbing, similar to Brett Victor's example, and Khan Academy's capability, but I am not having too much luck. I can't post links, but I found this library which kind of gets the…
TorranceY
  • 11
  • 3
1
vote
1 answer

SQL Update on Azure ML Not Working?

I'm trying to clean some data in Azure ML. I have an Apply SQL Transform block with the following code in it: UPDATE t1 SET CreditScore = -1 WHERE CreditScore>900; It is a numeric column. When I visualize the output, there are 0 rows and 0…
1
vote
2 answers

Get an article's title/author/date info with Javascript

I'm trying to build a bookmarklet that will get the current page/article's author and date information, for referencing purposes. I know that I can get the Page title and url with document.title and document.URL but I'm drawing a blank when it comes…
1
vote
1 answer

How to load dynamically generated webpage?

I am trying to load the webpage, http://www.artstation.com/artist/nicotine, so I can scrub the page, unfortunately the page seems to be generated via code so the tags that I am looking for aren't available. Loading it with the following isn't…
Chris L
  • 103
  • 1
  • 12
1
vote
1 answer

How to display preview thumbnail while scrubbing the video.

I am trying to display the preview thumbnail when user move his finger over video scrubber. The only solution I m finding is to extract thumbnails using some 3rd party tool and save it to server or pass it to app via some JSON. What I m trying to…
Umair
  • 1,206
  • 1
  • 13
  • 28
1
vote
0 answers

Alternatives to Coding Downloader Programs

At my job we regularly need to grab data from external sources, whether that be via ftp, sftp, e-mail scraping, web services, or web scraping. The formats vary from screen scraping/parsing, to CSV, XML, JSON, or XLS. A new leader has now entered the…
1
vote
3 answers

HTML parsing for a certain part of div

I am trying to access a HTML page and get a certain number from a div that is generated dynamically. I want to retrieve the "XX" as a variable, which will be different for each page. Is this done with…
626
  • 1,159
  • 2
  • 16
  • 27
1
vote
1 answer

php scrubbing a website for icecast listeners

can anyone help extract the current listener count from the link below using php I have attached phph code below as well but it need to be modified http://209.105.250.69:8382/ and the source is below
Ossama
  • 2,401
  • 7
  • 46
  • 83
1
vote
2 answers

Unit Testing data?

Our software manages a lot of data feeds from various sources: real time replicated databases, files FTPed automatically, scheduled running of database stored procedures to cache snapshots of data from linked servers and numerous other methods of…
Unsliced
  • 10,404
  • 8
  • 51
  • 81
0
votes
2 answers

Facebook Graph API extensive data scrape. Client or server side?

I'm building a application using PHP, HTML & JavaScript that accesses a users Facebook data and does some analysis on the information returned. It requires making approx 15 to 30 requests to the Graph API depending on how much data a user has in…
gfte
  • 110
  • 1
  • 11