How To Compare Two Rows In A Data Frame

Compare 2 data.frames to find the rows in information.frame i that are not present in data.frame two

Solution 1: ^[1]

sqldf provides a nice solution

                a1 <- data.frame(a = ane:5, b=letters[ane:v]) a2 <- data.frame(a = 1:iii, b=letters[one:3])  require(sqldf)  a1NotIna2 <- sqldf('SELECT * FROM a1 EXCEPT SELECT * FROM a2')

And the rows which are in both data frames:

                a1Ina2 <- sqldf('SELECT * FROM a1 INTERSECT SELECT * FROM a2')

The new version of dplyr has a part, anti_join, for exactly these kinds of comparisons

                require(dplyr)  anti_join(a1,a2)

And semi_join to filter rows in a1 that are besides in a2

                semi_join(a1,a2)

Solution 2: ^[2]

This doesn't answer your question direct, but it volition give you lot the elements that are in common. This can exist done with Paul Murrell's packet compare:

                library(compare) a1 <- data.frame(a = 1:5, b = messages[ane:5]) a2 <- data.frame(a = 1:iii, b = letters[1:three]) comparison <- compare(a1,a2,allowAll=TRUE) comparing$tM #  a b #1 1 a #2 2 b #iii iii c

The role compare gives yous a lot of flexibility in terms of what kind of comparisons are allowed (east.g. irresolute order of elements of each vector, irresolute order and names of variables, shortening variables, changing case of strings). From this, you lot should be able to figure out what was missing from one or the other. For example (this is not very elegant):

                difference <-    data.frame(lapply(1:ncol(a1),function(i)setdiff(a1[,i],comparison$tM[,i]))) colnames(difference) <- colnames(a1) difference #  a b #one four d #2 5 e

Solution three: ^[3]

Solution 4: ^[4]

Information technology is certainly not efficient for this detail purpose, but what I often do in these situations is to insert indicator variables in each data.frame and so merge:

                a1$included_a1 <- TRUE a2$included_a2 <- TRUE res <- merge(a1, a2, all=TRUE)

missing values in included_a1 will note which rows are missing in a1. similarly for a2.

I problem with your solution is that the column orders must lucifer. Another trouble is that it is piece of cake to imagine situations where the rows are coded as the same when in fact are dissimilar. The reward of using merge is that y'all go for free all fault checking that is necessary for a good solution.

Solution five: ^[five]

I wrote a packet (https://github.com/alexsanjoseph/compareDF) since I had the same issue.

                                  > df1 <- data.frame(a = ane:v, b=messages[1:v], row = 1:v)   > df2 <- data.frame(a = 1:3, b=letters[i:3], row = 1:3)   > df_compare = compare_df(df1, df2, "row")    > df_compare$comparison_df     row chng_type a b   1   4         + 4 d   2   v         + 5 east

A more complicated example:

                library(compareDF) df1 = data.frame(id1 = c("Mazda RX4", "Mazda RX4 Wag", "Datsun 710",                          "Hornet 4 Bulldoze", "Duster 360", "Merc 240D"),                  id2 = c("Maz", "Maz", "Dat", "Hor", "Dus", "Mer"),                  hp = c(110, 110, 181, 110, 245, 62),                  cyl = c(6, 6, 4, 6, 8, 4),                  qsec = c(16.46, 17.02, 33.00, 19.44, xv.84, twenty.00))  df2 = information.frame(id1 = c("Mazda RX4", "Mazda RX4 Wag", "Datsun 710",                          "Hornet 4 Drive", " Hornet Sportabout", "Valiant"),                  id2 = c("Maz", "Maz", "Dat", "Hor", "Dus", "Val"),                  hp = c(110, 110, 93, 110, 175, 105),                  cyl = c(half dozen, half dozen, 4, six, eight, half dozen),                  qsec = c(16.46, 17.02, 18.61, 19.44, 17.02, twenty.22))  > df_compare$comparison_df     grp chng_type                id1 id2  hp cyl  qsec   ane   1         -  Hornet Sportabout Dus 175   8 17.02   2   2         +         Datsun 710 Dat 181   4 33.00   three   2         -         Datsun 710 Dat  93   4 18.61   iv   iii         +         Duster 360 Dus 245   viii xv.84   v   7         +          Merc 240D Mer  62   4 20.00   6   8         -            Valiant Val 105   6 20.22

The package also has an html_output control for quick checking

df_compare$html_output